INFO: task reiserfs/0:1322 blocked for more than 120 seconds

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* INFO: task reiserfs/0:1322 blocked for more than 120 seconds
@ 2008-08-17  4:36 Greg Donald
  2008-08-20  6:52 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Greg Donald @ 2008-08-17  4:36 UTC (permalink / raw)
  To: linux-kernel

I got this while rsync'ng an NFS share onto a local disk:

[42374.151062] INFO: task reiserfs/0:1322 blocked for more than 120 seconds.
[42374.186295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[42374.229433] reiserfs/0    D c1f36180     0  1322      2
[42374.265246]        f5dbdedc 00000046 c1f36180 c1f36180 f5e932c0
1c823428 00002669 f5e932c0
[42374.273706]        f5e93514 c1f36180 00000000 f5dbc000 f62cc780
f5e932c0 00000002 00000001
[42374.313709]        00000000 00000000 f5e932c0 c013cc01 00000246
f5dbded4 c013cbce e31e12ec
[42374.356837] Call Trace:
[42374.417842]  [<c013cc01>] ? trace_hardirqs_on+0xb/0xd
[42374.451201]  [<c013cbce>] ? trace_hardirqs_on_caller+0xe9/0x111
[42374.489735]  [<c02e876b>] mutex_lock_nested+0x14b/0x22b
[42374.525760]  [<c01c9727>] ? flush_commit_list+0x119/0x505
[42374.560839]  [<c01c9727>] flush_commit_list+0x119/0x505
[42374.594183]  [<c01cca8e>] flush_async_commits+0x41/0x4b
[42374.629770]  [<c012ec1a>] run_workqueue+0xc3/0x18e
[42374.662893]  [<c012ebfe>] ? run_workqueue+0xa7/0x18e
[42374.697814]  [<c01cca4d>] ? flush_async_commits+0x0/0x4b
[42374.732504]  [<c012f609>] ? worker_thread+0x0/0x8a
[42374.765765]  [<c012f688>] worker_thread+0x7f/0x8a
[42374.797749]  [<c0131d61>] ? autoremove_wake_function+0x0/0x38
[42374.833713]  [<c0131c93>] kthread+0x40/0x69
[42374.865772]  [<c0131c53>] ? kthread+0x0/0x69
[42374.897774]  [<c010392f>] kernel_thread_helper+0x7/0x10
[42374.929777]  =======================
[42374.957001] 3 locks held by reiserfs/0/1322:
[42374.990140]  #0:  (reiserfs){--..}, at: [<c012ebe1>] run_workqueue+0x8a/0x18e
[42375.025754]  #1:  (&(&journal->j_work)->work){--..}, at:
[<c012ebfe>] run_workqueue+0xa7/0x18e
[42375.062963]  #2:  (&jl->j_commit_mutex){--..}, at: [<c01c9727>]
flush_commit_list+0x119/0x505


I deleted a few GBs of data and ran it again but was unable to
reproduce it.  This was on 2.6.27-rc3.

I don't see any corruption.  Fluke?


-- 
Greg Donald

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: INFO: task reiserfs/0:1322 blocked for more than 120 seconds
  2008-08-17  4:36 INFO: task reiserfs/0:1322 blocked for more than 120 seconds Greg Donald
@ 2008-08-20  6:52 ` Andrew Morton
  2008-08-20  9:19   ` Ingo Molnar
  2008-08-20  9:59   ` Andi Kleen
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Morton @ 2008-08-20  6:52 UTC (permalink / raw)
  To: Greg Donald; +Cc: linux-kernel, Ingo Molnar, Arjan van de Ven

On Sat, 16 Aug 2008 23:36:03 -0500 "Greg Donald" <gdonald@gmail.com> wrote:

> I got this while rsync'ng an NFS share onto a local disk:
> 
> [42374.151062] INFO: task reiserfs/0:1322 blocked for more than 120 seconds.
> [42374.186295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [42374.229433] reiserfs/0    D c1f36180     0  1322      2
> [42374.265246]        f5dbdedc 00000046 c1f36180 c1f36180 f5e932c0
> 1c823428 00002669 f5e932c0
> [42374.273706]        f5e93514 c1f36180 00000000 f5dbc000 f62cc780
> f5e932c0 00000002 00000001
> [42374.313709]        00000000 00000000 f5e932c0 c013cc01 00000246
> f5dbded4 c013cbce e31e12ec
> [42374.356837] Call Trace:
> [42374.417842]  [<c013cc01>] ? trace_hardirqs_on+0xb/0xd
> [42374.451201]  [<c013cbce>] ? trace_hardirqs_on_caller+0xe9/0x111
> [42374.489735]  [<c02e876b>] mutex_lock_nested+0x14b/0x22b
> [42374.525760]  [<c01c9727>] ? flush_commit_list+0x119/0x505
> [42374.560839]  [<c01c9727>] flush_commit_list+0x119/0x505
> [42374.594183]  [<c01cca8e>] flush_async_commits+0x41/0x4b
> [42374.629770]  [<c012ec1a>] run_workqueue+0xc3/0x18e
> [42374.662893]  [<c012ebfe>] ? run_workqueue+0xa7/0x18e
> [42374.697814]  [<c01cca4d>] ? flush_async_commits+0x0/0x4b
> [42374.732504]  [<c012f609>] ? worker_thread+0x0/0x8a
> [42374.765765]  [<c012f688>] worker_thread+0x7f/0x8a
> [42374.797749]  [<c0131d61>] ? autoremove_wake_function+0x0/0x38
> [42374.833713]  [<c0131c93>] kthread+0x40/0x69
> [42374.865772]  [<c0131c53>] ? kthread+0x0/0x69
> [42374.897774]  [<c010392f>] kernel_thread_helper+0x7/0x10
> [42374.929777]  =======================
> [42374.957001] 3 locks held by reiserfs/0/1322:
> [42374.990140]  #0:  (reiserfs){--..}, at: [<c012ebe1>] run_workqueue+0x8a/0x18e
> [42375.025754]  #1:  (&(&journal->j_work)->work){--..}, at:
> [<c012ebfe>] run_workqueue+0xa7/0x18e
> [42375.062963]  #2:  (&jl->j_commit_mutex){--..}, at: [<c01c9727>]
> flush_commit_list+0x119/0x505
> 
> 
> I deleted a few GBs of data and ran it again but was unable to
> reproduce it.  This was on 2.6.27-rc3.
> 
> I don't see any corruption.  Fluke?
> 

Seems that about 100% of the reports we get of this warning triggering
are sys_sync, transaction commit, etc.

Does kerneloops.org disagree with me?

If not, I vote we kill it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: INFO: task reiserfs/0:1322 blocked for more than 120 seconds
  2008-08-20  6:52 ` Andrew Morton
@ 2008-08-20  9:19   ` Ingo Molnar
  2008-08-20 10:00     ` Andi Kleen
  2008-08-20  9:59   ` Andi Kleen
  1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2008-08-20  9:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Greg Donald, linux-kernel, Arjan van de Ven


* Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sat, 16 Aug 2008 23:36:03 -0500 "Greg Donald" <gdonald@gmail.com> wrote:
> 
> > I got this while rsync'ng an NFS share onto a local disk:
> > 
> > [42374.151062] INFO: task reiserfs/0:1322 blocked for more than 120 seconds.
> > [42374.186295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [42374.229433] reiserfs/0    D c1f36180     0  1322      2
> > [42374.265246]        f5dbdedc 00000046 c1f36180 c1f36180 f5e932c0
> > 1c823428 00002669 f5e932c0
> > [42374.273706]        f5e93514 c1f36180 00000000 f5dbc000 f62cc780
> > f5e932c0 00000002 00000001
> > [42374.313709]        00000000 00000000 f5e932c0 c013cc01 00000246
> > f5dbded4 c013cbce e31e12ec
> > [42374.356837] Call Trace:
> > [42374.417842]  [<c013cc01>] ? trace_hardirqs_on+0xb/0xd
> > [42374.451201]  [<c013cbce>] ? trace_hardirqs_on_caller+0xe9/0x111
> > [42374.489735]  [<c02e876b>] mutex_lock_nested+0x14b/0x22b
> > [42374.525760]  [<c01c9727>] ? flush_commit_list+0x119/0x505
> > [42374.560839]  [<c01c9727>] flush_commit_list+0x119/0x505
> > [42374.594183]  [<c01cca8e>] flush_async_commits+0x41/0x4b
> > [42374.629770]  [<c012ec1a>] run_workqueue+0xc3/0x18e
> > [42374.662893]  [<c012ebfe>] ? run_workqueue+0xa7/0x18e
> > [42374.697814]  [<c01cca4d>] ? flush_async_commits+0x0/0x4b
> > [42374.732504]  [<c012f609>] ? worker_thread+0x0/0x8a
> > [42374.765765]  [<c012f688>] worker_thread+0x7f/0x8a
> > [42374.797749]  [<c0131d61>] ? autoremove_wake_function+0x0/0x38
> > [42374.833713]  [<c0131c93>] kthread+0x40/0x69
> > [42374.865772]  [<c0131c53>] ? kthread+0x0/0x69
> > [42374.897774]  [<c010392f>] kernel_thread_helper+0x7/0x10
> > [42374.929777]  =======================
> > [42374.957001] 3 locks held by reiserfs/0/1322:
> > [42374.990140]  #0:  (reiserfs){--..}, at: [<c012ebe1>] run_workqueue+0x8a/0x18e
> > [42375.025754]  #1:  (&(&journal->j_work)->work){--..}, at:
> > [<c012ebfe>] run_workqueue+0xa7/0x18e
> > [42375.062963]  #2:  (&jl->j_commit_mutex){--..}, at: [<c01c9727>]
> > flush_commit_list+0x119/0x505
> > 
> > 
> > I deleted a few GBs of data and ran it again but was unable to
> > reproduce it.  This was on 2.6.27-rc3.
> > 
> > I don't see any corruption.  Fluke?
> > 
> 
> Seems that about 100% of the reports we get of this warning triggering 
> are sys_sync, transaction commit, etc.
> 
> Does kerneloops.org disagree with me?
> 
> If not, I vote we kill it.

ok. How about quadrupling the timeout, as per the patch below?

more than 8 minutes uninterruptible wait, is that a reasonable limit?

I had this warning trigger a couple of times during development, 
alerting me to hung tasks.

	Ingo

------------------>
>From 3fb4198766c38aa03492cc3996475076073c22ea Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Wed, 20 Aug 2008 11:17:40 +0200
Subject: [PATCH] softlockup: increase hung tasks check from 2 minutes to 8 minutes

Andrew says:

> Seems that about 100% of the reports we get of this warning triggering
> are sys_sync, transaction commit, etc.

increase the timeout. If it still triggers for people, we can kill it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/softlockup.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/softlockup.c b/kernel/softlockup.c
index b75b492..17a0580 100644
--- a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -164,7 +164,7 @@ unsigned long __read_mostly sysctl_hung_task_check_count = 1024;
 /*
  * Zero means infinite timeout - no checking done:
  */
-unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
+unsigned long __read_mostly sysctl_hung_task_timeout_secs = 480;
 
 unsigned long __read_mostly sysctl_hung_task_warnings = 10;
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: INFO: task reiserfs/0:1322 blocked for more than 120 seconds
  2008-08-20  6:52 ` Andrew Morton
  2008-08-20  9:19   ` Ingo Molnar
@ 2008-08-20  9:59   ` Andi Kleen
  1 sibling, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2008-08-20  9:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Greg Donald, linux-kernel, Ingo Molnar, Arjan van de Ven

Andrew Morton <akpm@linux-foundation.org> writes:
>
> Seems that about 100% of the reports we get of this warning triggering
> are sys_sync, transaction commit, etc.

And NFS -- i just had the kernel log on one of my nfsroot test systems
flooded recently with them when the ethernet cable was disconnected
for some time and nfs blocked.  Scared me first, but then after
analysis didn't seem very useful. I imagine it would scare normal
users far more.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: INFO: task reiserfs/0:1322 blocked for more than 120 seconds
  2008-08-20  9:19   ` Ingo Molnar
@ 2008-08-20 10:00     ` Andi Kleen
  0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2008-08-20 10:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Andrew Morton, Greg Donald, linux-kernel, Arjan van de Ven

Ingo Molnar <mingo@elte.hu> writes:
>
> ok. How about quadrupling the timeout, as per the patch below?
>
> more than 8 minutes uninterruptible wait, is that a reasonable limit?

There should be a way to disable them for NFS and other network
file systems at least. Having network issues is not that uncommon
and flooding the log with backtraces every time they happen
when a network fs is mounted is not very useful.

-Andi


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-08-20 10:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-17  4:36 INFO: task reiserfs/0:1322 blocked for more than 120 seconds Greg Donald
2008-08-20  6:52 ` Andrew Morton
2008-08-20  9:19   ` Ingo Molnar
2008-08-20 10:00     ` Andi Kleen
2008-08-20  9:59   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox