* task imap:2958 blocked for more than 120 seconds
@ 2010-01-10 20:05 Johannes Hirte
2010-01-10 20:19 ` Chris Mason
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Johannes Hirte @ 2010-01-10 20:05 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs, linux-kernel
I've observed this hanging task now several times. Not sure when this started,
but 2.6.32 is affected too, IIRC. I don't have a test pattern for this. Dovecot
imap triggers this from time to time. I've enabled CONFIG_DETECT_HUNG_TASK now
and got this two tasks which hang:
INFO: task imap:2958 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
imap D 0000000000000000 0 2958 2653 0x00000000
ffff88008caf5a28 0000000000000046 0000000000000000 ffffffff810544cf
ffff88008caf5998 0000000000000001 ffff88008caf5fd8 ffff88008caf9530
000000000000de78 00000000001d2700 00000000001d2700 ffff88008caf9530
Call Trace:
[<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
[<ffffffff81071262>] ? sync_page+0x0/0x48
[<ffffffff8138b46a>] io_schedule+0x38/0x4d
[<ffffffff810712a6>] sync_page+0x44/0x48
[<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a
[<ffffffff8107124e>] __lock_page+0x61/0x68
[<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e
[<ffffffff810718c6>] filemap_fault+0xea/0x345
[<ffffffff81084022>] __do_fault+0x50/0x3d3
[<ffffffff81086077>] handle_mm_fault+0x32f/0x65d
[<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f
[<ffffffff811cacb4>] ? __down_read_trylock+0x46/0x4e
[<ffffffff8104b0f3>] ? down_read_trylock+0x3f/0x49
[<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f
[<ffffffff81020500>] do_page_fault+0x257/0x26f
[<ffffffff8138e2ff>] page_fault+0x1f/0x30
[<ffffffff81083a13>] ? might_fault+0x57/0xa7
[<ffffffff811746e2>] ? btrfs_copy_from_user+0x4f/0x113
[<ffffffff81174771>] ? btrfs_copy_from_user+0xde/0x113
[<ffffffff81174cd0>] btrfs_file_write+0x439/0x6fe
[<ffffffff810a5c34>] vfs_write+0xad/0x14e
[<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
[<ffffffff810a5d2a>] sys_pwrite64+0x55/0x74
[<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
2 locks held by imap/2958:
#0: (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<ffffffff81174a00>]
btrfs_file_write+0x169/0x6fe
#1: (&mm->mmap_sem){++++++}, at: [<ffffffff8102039d>] do_page_fault+0xf4/0x26f
INFO: task flush-btrfs-2:2783 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-btrfs-2 D 0000000000000000 0 2783 2 0x00000000
ffff88010cdcf9d0 0000000000000046 0000000000000000 ffffffff810544cf
ffff88010cdcf940 0000000000000000 ffff88010cdcffd8 ffff88010cd18290
000000000000de78 00000000001d2700 00000000001d2700 ffff88010cd18290
Call Trace:
[<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
[<ffffffff81071262>] ? sync_page+0x0/0x48
[<ffffffff8138b46a>] io_schedule+0x38/0x4d
[<ffffffff810712a6>] sync_page+0x44/0x48
[<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a
[<ffffffff81070ce1>] ? find_get_pages_tag+0x0/0x130
[<ffffffff8107124e>] __lock_page+0x61/0x68
[<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e
[<ffffffff811871f6>] T.858+0xf1/0x2cd
[<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4
[<ffffffff8104c3cd>] ? sched_clock_local+0x1c/0x82
[<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4
[<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff811874d3>] extent_writepages+0x3f/0x54
[<ffffffff8116dbd5>] ? btrfs_get_extent+0x0/0x7ee
[<ffffffff8116da7e>] btrfs_writepages+0x22/0x24
[<ffffffff81077d12>] do_writepages+0x1f/0x28
[<ffffffff810c0233>] writeback_single_inode+0xf1/0x2f0
[<ffffffff810c0f57>] writeback_inodes_wb+0x3a9/0x4b2
[<ffffffff810c118b>] wb_writeback+0x12b/0x1af
[<ffffffff810c1436>] wb_do_writeback+0x17f/0x195
[<ffffffff810c1342>] ? wb_do_writeback+0x8b/0x195
[<ffffffff810c1477>] bdi_writeback_task+0x2b/0x84
[<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf
[<ffffffff81082ab6>] bdi_start_fn+0x71/0xcf
[<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf
[<ffffffff81047197>] kthread+0x7a/0x82
[<ffffffff81002c94>] kernel_thread_helper+0x4/0x10
[<ffffffff8138e0fc>] ? restore_args+0x0/0x30
[<ffffffff8104711d>] ? kthread+0x0/0x82
[<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10
1 lock held by flush-btrfs-2/2783:
#0: (&type->s_umount_key#20){++++..}, at: [<ffffffff810c0e82>]
writeback_inodes_wb+0x2d4/0x4b2
regards,
Johannes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: task imap:2958 blocked for more than 120 seconds
2010-01-10 20:05 task imap:2958 blocked for more than 120 seconds Johannes Hirte
@ 2010-01-10 20:19 ` Chris Mason
2010-01-13 22:37 ` Johannes Hirte
2010-01-11 7:34 ` Adrian von Bidder
2010-03-07 15:52 ` Johannes Hirte
2 siblings, 1 reply; 9+ messages in thread
From: Chris Mason @ 2010-01-10 20:19 UTC (permalink / raw)
To: Johannes Hirte; +Cc: linux-btrfs, linux-kernel
On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte wrote:
> I've observed this hanging task now several times. Not sure when this started,
> but 2.6.32 is affected too, IIRC. I don't have a test pattern for this. Dovecot
> imap triggers this from time to time. I've enabled CONFIG_DETECT_HUNG_TASK now
> and got this two tasks which hang:
You're stuck on a read, could you please do a sysrq-w when this happens?
Also, do you eventually recover or are you stuck forever?
-chris
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: task imap:2958 blocked for more than 120 seconds
2010-01-10 20:05 task imap:2958 blocked for more than 120 seconds Johannes Hirte
2010-01-10 20:19 ` Chris Mason
@ 2010-01-11 7:34 ` Adrian von Bidder
2010-01-11 8:45 ` Adrian von Bidder
2010-03-07 15:52 ` Johannes Hirte
2 siblings, 1 reply; 9+ messages in thread
From: Adrian von Bidder @ 2010-01-11 7:34 UTC (permalink / raw)
To: Johannes Hirte; +Cc: Chris Mason, linux-btrfs, linux-kernel
[-- Attachment #1: Type: Text/Plain, Size: 808 bytes --]
On Sunday 10 January 2010 21.05:46 Johannes Hirte wrote:
> I've observed this hanging task now several times. Not sure when this
> started, but 2.6.32 is affected too, IIRC. I don't have a test pattern
> for this. Dovecot imap triggers this from time to time. I've enabled
> CONFIG_DETECT_HUNG_TASK now and got this two tasks which hang:
I've had vey similar hangs occasionally; don't remember if they were the
exact same code path or not.
"btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm CPU
has triggered it several times, so it seems a reliable way to reproduce
this.
Unfortunately log rotation has already killed the messages. Do you want me
to re-run? 2.6.32 kernel, btrfs-progs from git about 4 weeks ago.
cheers
-- vbi
--
Place stamp here.
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 389 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: task imap:2958 blocked for more than 120 seconds
2010-01-11 7:34 ` Adrian von Bidder
@ 2010-01-11 8:45 ` Adrian von Bidder
2010-01-14 1:08 ` Chris Mason
0 siblings, 1 reply; 9+ messages in thread
From: Adrian von Bidder @ 2010-01-11 8:45 UTC (permalink / raw)
To: Johannes Hirte; +Cc: Chris Mason, linux-btrfs, linux-kernel
[-- Attachment #1: Type: Text/Plain, Size: 6655 bytes --]
On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote:
> "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm
> CPU has triggered it several times, so it seems a reliable way to
> reproduce this.
>
Found it (Debian kernel 2.6.32 on ARM):
[78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120 seconds.
[78260.386306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[78260.386331] btrfs-vol D c02b080c 0 10979 1 0x00000001
[78260.386373] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>] (schedule_timeout+0x1c/0x244)
[78260.386408] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from [<c02b0b10>] (wait_for_common+0xdc/0x178)
[78260.386611] [<c02b0b10>] (wait_for_common+0xdc/0x178) from [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs])
[78260.386940] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])
[78260.387258] [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs]) from [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs])
[78260.387564] [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) from [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs])
[78260.387856] [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) from [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs])
[78260.388148] [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs]) from [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs])
[78260.388319] [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs]) from [<c00d8788>] (vfs_ioctl+0x2c/0x70)
[78260.388357] [<c00d8788>] (vfs_ioctl+0x2c/0x70) from [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c)
[78260.388390] [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c) from [<c00d8f44>] (sys_ioctl+0x50/0x74)
[78260.388423] [<c00d8f44>] (sys_ioctl+0x50/0x74) from [<c0026ea0>] (ret_fast_syscall+0x0/0x28)
[78380.381159] INFO: task btrfs-vol:10979 blocked for more than 120 seconds.
[78380.381194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[78380.381219] btrfs-vol D c02b080c 0 10979 1 0x00000001
[78380.381262] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>] (schedule_timeout+0x1c/0x244)
[78380.381297] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from [<c02b0b10>] (wait_for_common+0xdc/0x178)
[78380.381501] [<c02b0b10>] (wait_for_common+0xdc/0x178) from [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs])
[78380.381830] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])
[78380.382232] [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs]) from [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs])
[78380.382545] [<bf2a4434>] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) from [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs])
[78380.382839] [<bf2868e0>] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) from [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs])
[78380.383131] [<bf2879f4>] (btrfs_balance+0x370/0x424 [btrfs]) from [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs])
[78380.383302] [<bf28d3a8>] (btrfs_ioctl+0x754/0x968 [btrfs]) from [<c00d8788>] (vfs_ioctl+0x2c/0x70)
[78380.383341] [<c00d8788>] (vfs_ioctl+0x2c/0x70) from [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c)
[78380.383374] [<c00d8e8c>] (do_vfs_ioctl+0x4f4/0x55c) from [<c00d8f44>] (sys_ioctl+0x50/0x74)
[78380.383408] [<c00d8f44>] (sys_ioctl+0x50/0x74) from [<c0026ea0>] (ret_fast_syscall+0x0/0x28)
umount right after some big fs action (not sure, it was either lots of
file deletions, a big rsync of some tree, or right after the btrfs-vol
stuff) manages to trigger a btrfs related hang, too:
[97460.345446] INFO: task umount:12765 blocked for more than 120 seconds.
[97460.345481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[97460.345505] umount D c02b080c 0 12765 12681 0x00000000
[97460.345554] [<c02b080c>] (schedule+0x424/0x488) from [<c00e719c>] (bdi_sched_wait+0xc/0x18)
[97460.345592] [<c00e719c>] (bdi_sched_wait+0xc/0x18) from [<c02b0f68>] (__wait_on_bit+0x5c/0xa8)
[97460.345625] [<c02b0f68>] (__wait_on_bit+0x5c/0xa8) from [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4)
[97460.345661] [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4) from [<c00e7210>] (sync_inodes_sb+0x68/0x100)
[97460.345699] [<c00e7210>] (sync_inodes_sb+0x68/0x100) from [<c00eb340>] (__sync_filesystem+0x64/0x94)
[97460.345737] [<c00eb340>] (__sync_filesystem+0x64/0x94) from [<c00cdc74>] (generic_shutdown_super+0x28/0x110)
[97460.345776] [<c00cdc74>] (generic_shutdown_super+0x28/0x110) from [<c00cdda8>] (kill_anon_super+0x14/0x3c)
[97460.345813] [<c00cdda8>] (kill_anon_super+0x14/0x3c) from [<c00ce46c>] (deactivate_super+0x6c/0x90)
[97460.345849] [<c00ce46c>] (deactivate_super+0x6c/0x90) from [<c00e2310>] (sys_umount+0x2bc/0x2e8)
[97460.345883] [<c00e2310>] (sys_umount+0x2bc/0x2e8) from [<c0026ea0>] (ret_fast_syscall+0x0/0x28)
[97580.340641] INFO: task umount:12765 blocked for more than 120 seconds.
[97580.340674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[97580.340699] umount D c02b080c 0 12765 12681 0x00000000
[97580.340749] [<c02b080c>] (schedule+0x424/0x488) from [<c00e719c>] (bdi_sched_wait+0xc/0x18)
[97580.340787] [<c00e719c>] (bdi_sched_wait+0xc/0x18) from [<c02b0f68>] (__wait_on_bit+0x5c/0xa8)
[97580.340821] [<c02b0f68>] (__wait_on_bit+0x5c/0xa8) from [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4)
[97580.340857] [<c02b1060>] (out_of_line_wait_on_bit+0xac/0xc4) from [<c00e7210>] (sync_inodes_sb+0x68/0x100)
[97580.340894] [<c00e7210>] (sync_inodes_sb+0x68/0x100) from [<c00eb340>] (__sync_filesystem+0x64/0x94)
[97580.340932] [<c00eb340>] (__sync_filesystem+0x64/0x94) from [<c00cdc74>] (generic_shutdown_super+0x28/0x110)
[97580.340970] [<c00cdc74>] (generic_shutdown_super+0x28/0x110) from [<c00cdda8>] (kill_anon_super+0x14/0x3c)
[97580.341008] [<c00cdda8>] (kill_anon_super+0x14/0x3c) from [<c00ce46c>] (deactivate_super+0x6c/0x90)
[97580.341044] [<c00ce46c>] (deactivate_super+0x6c/0x90) from [<c00e2310>] (sys_umount+0x2bc/0x2e8)
[97580.341079] [<c00e2310>] (sys_umount+0x2bc/0x2e8) from [<c0026ea0>] (ret_fast_syscall+0x0/0x28)
I've never had the system or even the affected processes die on me, the
end result was always ok. Just took ages. (Ok, btrfs-vol -b taking ages
on a big fs is ok. umount taking 10min is a bit over the top, especially
since the machine only has 1G ram, so there can't be that many dirty caches
in any case...
cheers
-- vbi
--
featured product: PostgreSQL - http://postgresql.org
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 389 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: task imap:2958 blocked for more than 120 seconds
2010-01-10 20:19 ` Chris Mason
@ 2010-01-13 22:37 ` Johannes Hirte
2010-01-14 1:07 ` Chris Mason
0 siblings, 1 reply; 9+ messages in thread
From: Johannes Hirte @ 2010-01-13 22:37 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs, linux-kernel
Am Sonntag 10 Januar 2010 21:19:26 schrieb Chris Mason:
> On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte wrote:
> > I've observed this hanging task now several times. Not sure when this
> > started, but 2.6.32 is affected too, IIRC. I don't have a test pattern
> > for this. Dovecot imap triggers this from time to time. I've enabled
> > CONFIG_DETECT_HUNG_TASK now
>
> > and got this two tasks which hang:
> You're stuck on a read, could you please do a sysrq-w when this happens?
Will do so when it happens again.
> Also, do you eventually recover or are you stuck forever?
I didn't wait too long when it happened, so I'm not sure. The longest time
I've waited was 20-30min, until reboot without recover. So either it's stuck
forever or it takes really long to recover.
And one question I have: How do you identify a read in this call trace?
regards,
Johannes
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: task imap:2958 blocked for more than 120 seconds
2010-01-13 22:37 ` Johannes Hirte
@ 2010-01-14 1:07 ` Chris Mason
0 siblings, 0 replies; 9+ messages in thread
From: Chris Mason @ 2010-01-14 1:07 UTC (permalink / raw)
To: Johannes Hirte; +Cc: linux-btrfs, linux-kernel
On Wed, Jan 13, 2010 at 11:37:33PM +0100, Johannes Hirte wrote:
> Am Sonntag 10 Januar 2010 21:19:26 schrieb Chris Mason:
> > On Sun, Jan 10, 2010 at 09:05:46PM +0100, Johannes Hirte wrote:
> > > I've observed this hanging task now several times. Not sure when this
> > > started, but 2.6.32 is affected too, IIRC. I don't have a test pattern
> > > for this. Dovecot imap triggers this from time to time. I've enabled
> > > CONFIG_DETECT_HUNG_TASK now
> >
> > > and got this two tasks which hang:
> > You're stuck on a read, could you please do a sysrq-w when this happens?
>
> Will do so when it happens again.
No need, I reread the trace, you're stuck in copy_from_user and probably
hitting a deadlock in the btrfs page faulting path. This is pretty
rare, but I'll put it into the queue to fix up.
>
> > Also, do you eventually recover or are you stuck forever?
>
> I didn't wait too long when it happened, so I'm not sure. The longest time
> I've waited was 20-30min, until reboot without recover. So either it's stuck
> forever or it takes really long to recover.
>
> And one question I have: How do you identify a read in this call trace?
Lock page followed by sync page is usually for reading.
-chris
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: task imap:2958 blocked for more than 120 seconds
2010-01-11 8:45 ` Adrian von Bidder
@ 2010-01-14 1:08 ` Chris Mason
2010-01-14 7:03 ` Adrian von Bidder
0 siblings, 1 reply; 9+ messages in thread
From: Chris Mason @ 2010-01-14 1:08 UTC (permalink / raw)
To: Adrian von Bidder; +Cc: Johannes Hirte, linux-btrfs, linux-kernel
On Mon, Jan 11, 2010 at 09:45:31AM +0100, Adrian von Bidder wrote:
> On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote:
> > "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm
> > CPU has triggered it several times, so it seems a reliable way to
> > reproduce this.
> >
>
> Found it (Debian kernel 2.6.32 on ARM):
>
> [78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120 seconds.
> [78260.386306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [78260.386331] btrfs-vol D c02b080c 0 10979 1 0x00000001
> [78260.386373] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>] (schedule_timeout+0x1c/0x244)
> [78260.386408] [<c02b0c9c>] (schedule_timeout+0x1c/0x244) from [<c02b0b10>] (wait_for_common+0xdc/0x178)
> [78260.386611] [<c02b0b10>] (wait_for_common+0xdc/0x178) from [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs])
> [78260.386940] [<bf29b880>] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>] (relocate_block_group+0x548/0x5c8 [btrfs])
Blocking here isn't a huge surprise, relocation can involve some long
held locks. Does this recover?
-chris
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: task imap:2958 blocked for more than 120 seconds
2010-01-14 1:08 ` Chris Mason
@ 2010-01-14 7:03 ` Adrian von Bidder
0 siblings, 0 replies; 9+ messages in thread
From: Adrian von Bidder @ 2010-01-14 7:03 UTC (permalink / raw)
To: Chris Mason; +Cc: Johannes Hirte, linux-btrfs, linux-kernel
[-- Attachment #1: Type: Text/Plain, Size: 1708 bytes --]
Heyho!
On Thursday 14 January 2010 02.08:29 Chris Mason wrote:
> On Mon, Jan 11, 2010 at 09:45:31AM +0100, Adrian von Bidder wrote:
> > On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote:
> > > "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm
> > > CPU has triggered it several times, so it seems a reliable way to
> > > reproduce this.
> >
> > Found it (Debian kernel 2.6.32 on ARM):
> >
> > [78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120
> > seconds. [78260.386306] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [78260.386331] btrfs-vol D c02b080c 0 10979 1 0x00000001
> > [78260.386373] [<c02b080c>] (schedule+0x424/0x488) from [<c02b0c9c>]
> > (schedule_timeout+0x1c/0x244) [78260.386408] [<c02b0c9c>]
> > (schedule_timeout+0x1c/0x244) from [<c02b0b10>]
> > (wait_for_common+0xdc/0x178) [78260.386611] [<c02b0b10>]
> > (wait_for_common+0xdc/0x178) from [<bf29b880>]
> > (merge_reloc_roots+0x15c/0x1a4 [btrfs]) [78260.386940] [<bf29b880>]
> > (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [<bf2a3fd8>]
> > (relocate_block_group+0x548/0x5c8 [btrfs])
>
> Blocking here isn't a huge surprise, relocation can involve some long
> held locks. Does this recover?
Yes, always so far.
cheers
-- vbi
--
Tomorrow we have important delivery and this afternoon out Japanese
colleagues decided that 2000 pages describing our software solution is
not enough. They need at least 500 extra pages. Fortunately it could be
any existing documentation, so we randomly choose POSIX standard Part 1
(535 pages) and the Eclipse basic tutorial (592 pages).
-- Michal Čihař
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 389 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: task imap:2958 blocked for more than 120 seconds
2010-01-10 20:05 task imap:2958 blocked for more than 120 seconds Johannes Hirte
2010-01-10 20:19 ` Chris Mason
2010-01-11 7:34 ` Adrian von Bidder
@ 2010-03-07 15:52 ` Johannes Hirte
2 siblings, 0 replies; 9+ messages in thread
From: Johannes Hirte @ 2010-03-07 15:52 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-btrfs, linux-kernel
Am Sonntag 10 Januar 2010 21:05:46 schrieb Johannes Hirte:
> I've observed this hanging task now several times. Not sure when this
> started, but 2.6.32 is affected too, IIRC. I don't have a test pattern for
> this. Dovecot imap triggers this from time to time. I've enabled
> CONFIG_DETECT_HUNG_TASK now and got this two tasks which hang:
>
> INFO: task imap:2958 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> imap D 0000000000000000 0 2958 2653 0x00000000
> ffff88008caf5a28 0000000000000046 0000000000000000 ffffffff810544cf
> ffff88008caf5998 0000000000000001 ffff88008caf5fd8 ffff88008caf9530
> 000000000000de78 00000000001d2700 00000000001d2700 ffff88008caf9530
> Call Trace:
> [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
> [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
> [<ffffffff81071262>] ? sync_page+0x0/0x48
> [<ffffffff8138b46a>] io_schedule+0x38/0x4d
> [<ffffffff810712a6>] sync_page+0x44/0x48
> [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a
> [<ffffffff8107124e>] __lock_page+0x61/0x68
> [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e
> [<ffffffff810718c6>] filemap_fault+0xea/0x345
> [<ffffffff81084022>] __do_fault+0x50/0x3d3
> [<ffffffff81086077>] handle_mm_fault+0x32f/0x65d
> [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f
> [<ffffffff811cacb4>] ? __down_read_trylock+0x46/0x4e
> [<ffffffff8104b0f3>] ? down_read_trylock+0x3f/0x49
> [<ffffffff8102039d>] ? do_page_fault+0xf4/0x26f
> [<ffffffff81020500>] do_page_fault+0x257/0x26f
> [<ffffffff8138e2ff>] page_fault+0x1f/0x30
> [<ffffffff81083a13>] ? might_fault+0x57/0xa7
> [<ffffffff811746e2>] ? btrfs_copy_from_user+0x4f/0x113
> [<ffffffff81174771>] ? btrfs_copy_from_user+0xde/0x113
> [<ffffffff81174cd0>] btrfs_file_write+0x439/0x6fe
> [<ffffffff810a5c34>] vfs_write+0xad/0x14e
> [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
> [<ffffffff810a5d2a>] sys_pwrite64+0x55/0x74
> [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
> 2 locks held by imap/2958:
> #0: (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<ffffffff81174a00>]
> btrfs_file_write+0x169/0x6fe
> #1: (&mm->mmap_sem){++++++}, at: [<ffffffff8102039d>]
> do_page_fault+0xf4/0x26f INFO: task flush-btrfs-2:2783 blocked for more
> than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> flush-btrfs-2 D 0000000000000000 0 2783 2 0x00000000
> ffff88010cdcf9d0 0000000000000046 0000000000000000 ffffffff810544cf
> ffff88010cdcf940 0000000000000000 ffff88010cdcffd8 ffff88010cd18290
> 000000000000de78 00000000001d2700 00000000001d2700 ffff88010cd18290
> Call Trace:
> [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
> [<ffffffff8105613b>] ? trace_hardirqs_on_caller+0x10c/0x130
> [<ffffffff81071262>] ? sync_page+0x0/0x48
> [<ffffffff8138b46a>] io_schedule+0x38/0x4d
> [<ffffffff810712a6>] sync_page+0x44/0x48
> [<ffffffff8138b99a>] __wait_on_bit_lock+0x41/0x8a
> [<ffffffff81070ce1>] ? find_get_pages_tag+0x0/0x130
> [<ffffffff8107124e>] __lock_page+0x61/0x68
> [<ffffffff810475a3>] ? wake_bit_function+0x0/0x2e
> [<ffffffff811871f6>] T.858+0xf1/0x2cd
> [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4
> [<ffffffff8104c3cd>] ? sched_clock_local+0x1c/0x82
> [<ffffffff8104c4f9>] ? sched_clock_cpu+0xc6/0xd4
> [<ffffffff810544cf>] ? trace_hardirqs_off+0xd/0xf
> [<ffffffff811874d3>] extent_writepages+0x3f/0x54
> [<ffffffff8116dbd5>] ? btrfs_get_extent+0x0/0x7ee
> [<ffffffff8116da7e>] btrfs_writepages+0x22/0x24
> [<ffffffff81077d12>] do_writepages+0x1f/0x28
> [<ffffffff810c0233>] writeback_single_inode+0xf1/0x2f0
> [<ffffffff810c0f57>] writeback_inodes_wb+0x3a9/0x4b2
> [<ffffffff810c118b>] wb_writeback+0x12b/0x1af
> [<ffffffff810c1436>] wb_do_writeback+0x17f/0x195
> [<ffffffff810c1342>] ? wb_do_writeback+0x8b/0x195
> [<ffffffff810c1477>] bdi_writeback_task+0x2b/0x84
> [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf
> [<ffffffff81082ab6>] bdi_start_fn+0x71/0xcf
> [<ffffffff81082a45>] ? bdi_start_fn+0x0/0xcf
> [<ffffffff81047197>] kthread+0x7a/0x82
> [<ffffffff81002c94>] kernel_thread_helper+0x4/0x10
> [<ffffffff8138e0fc>] ? restore_args+0x0/0x30
> [<ffffffff8104711d>] ? kthread+0x0/0x82
> [<ffffffff81002c90>] ? kernel_thread_helper+0x0/0x10
> 1 lock held by flush-btrfs-2/2783:
> #0: (&type->s_umount_key#20){++++..}, at: [<ffffffff810c0e82>]
> writeback_inodes_wb+0x2d4/0x4b2
>
> regards,
> Johannes
It happend again, today I found in the logs:
INFO: task imap:2590 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
imap D ffff88002828ddf8 0 2590 2472 0x00000000
ffff88011fa53000 0000000000000086 0000000000000000 ffff8800282923a0
ffff88011b7cc480 0000000000000000 ffff880028294c58 ffff880117a5e000
00000000fffdcf81 ffff880117a5e280 0000000000012340 0000000000012340
Call Trace:
[<ffffffff81061de2>] ? sync_page+0x0/0x46
[<ffffffff8134acaa>] ? io_schedule+0x35/0x48
[<ffffffff81061e23>] ? sync_page+0x41/0x46
[<ffffffff8134b185>] ? __wait_on_bit_lock+0x3c/0x85
[<ffffffff81061dcf>] ? __lock_page+0x5d/0x63
[<ffffffff81042877>] ? wake_bit_function+0x0/0x2e
[<ffffffff810623da>] ? filemap_fault+0xcc/0x305
[<ffffffff810733e6>] ? __do_fault+0x52/0x3f0
[<ffffffff810753fa>] ? handle_mm_fault+0x346/0x6b6
[<ffffffff8101ecff>] ? do_page_fault+0x264/0x280
[<ffffffff8134c91f>] ? page_fault+0x1f/0x30
[<ffffffff81157b12>] ? btrfs_copy_from_user+0x50/0x109
[<ffffffff81157b94>] ? btrfs_copy_from_user+0xd2/0x109
[<ffffffff811580f5>] ? btrfs_file_write+0x448/0x6db
[<ffffffff81093aff>] ? vfs_write+0xa8/0x14c
[<ffffffff81093bf6>] ? sys_pwrite64+0x53/0x71
[<ffffffff81001eab>] ? system_call_fastpath+0x16/0x1b
and btrfsck said:
datengrab ~ # btrfsck /dev/mapper/sdb
root 5 inode 842742 errors 400
found 186575216640 bytes used err is 1
total csum bytes: 181076596
total tree bytes: 1147568128
total fs tree bytes: 884981760
btree space waste bytes: 289677575
file data blocks allocated: 191211343872
referenced 185262080000
Btrfs v0.19-4-gab8fb4c
I was able to "fix" this by copying the affected directory to another location
on the same disc, delete the old directory and rename the copy to the old one.
After this the deadlock and the FS corruption was gone.
regards,
Johannes
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-03-07 15:52 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-10 20:05 task imap:2958 blocked for more than 120 seconds Johannes Hirte
2010-01-10 20:19 ` Chris Mason
2010-01-13 22:37 ` Johannes Hirte
2010-01-14 1:07 ` Chris Mason
2010-01-11 7:34 ` Adrian von Bidder
2010-01-11 8:45 ` Adrian von Bidder
2010-01-14 1:08 ` Chris Mason
2010-01-14 7:03 ` Adrian von Bidder
2010-03-07 15:52 ` Johannes Hirte
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox