On Monday 11 January 2010 08.34:36 Adrian von Bidder wrote: > "btrfs-vol -b" on an 2T btrfs fs (raid 1 mode over 4 disks) on an arm > CPU has triggered it several times, so it seems a reliable way to > reproduce this. > Found it (Debian kernel 2.6.32 on ARM): [78260.386272] INFO: task btrfs-vol:10979 blocked for more than 120 seconds. [78260.386306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [78260.386331] btrfs-vol D c02b080c 0 10979 1 0x00000001 [78260.386373] [] (schedule+0x424/0x488) from [] (schedule_timeout+0x1c/0x244) [78260.386408] [] (schedule_timeout+0x1c/0x244) from [] (wait_for_common+0xdc/0x178) [78260.386611] [] (wait_for_common+0xdc/0x178) from [] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) [78260.386940] [] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [] (relocate_block_group+0x548/0x5c8 [btrfs]) [78260.387258] [] (relocate_block_group+0x548/0x5c8 [btrfs]) from [] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) [78260.387564] [] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) from [] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) [78260.387856] [] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) from [] (btrfs_balance+0x370/0x424 [btrfs]) [78260.388148] [] (btrfs_balance+0x370/0x424 [btrfs]) from [] (btrfs_ioctl+0x754/0x968 [btrfs]) [78260.388319] [] (btrfs_ioctl+0x754/0x968 [btrfs]) from [] (vfs_ioctl+0x2c/0x70) [78260.388357] [] (vfs_ioctl+0x2c/0x70) from [] (do_vfs_ioctl+0x4f4/0x55c) [78260.388390] [] (do_vfs_ioctl+0x4f4/0x55c) from [] (sys_ioctl+0x50/0x74) [78260.388423] [] (sys_ioctl+0x50/0x74) from [] (ret_fast_syscall+0x0/0x28) [78380.381159] INFO: task btrfs-vol:10979 blocked for more than 120 seconds. [78380.381194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [78380.381219] btrfs-vol D c02b080c 0 10979 1 0x00000001 [78380.381262] [] (schedule+0x424/0x488) from [] (schedule_timeout+0x1c/0x244) [78380.381297] [] (schedule_timeout+0x1c/0x244) from [] (wait_for_common+0xdc/0x178) [78380.381501] [] (wait_for_common+0xdc/0x178) from [] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) [78380.381830] [] (merge_reloc_roots+0x15c/0x1a4 [btrfs]) from [] (relocate_block_group+0x548/0x5c8 [btrfs]) [78380.382232] [] (relocate_block_group+0x548/0x5c8 [btrfs]) from [] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) [78380.382545] [] (btrfs_relocate_block_group+0x17c/0x3a4 [btrfs]) from [] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) [78380.382839] [] (btrfs_relocate_chunk+0x70/0x7c0 [btrfs]) from [] (btrfs_balance+0x370/0x424 [btrfs]) [78380.383131] [] (btrfs_balance+0x370/0x424 [btrfs]) from [] (btrfs_ioctl+0x754/0x968 [btrfs]) [78380.383302] [] (btrfs_ioctl+0x754/0x968 [btrfs]) from [] (vfs_ioctl+0x2c/0x70) [78380.383341] [] (vfs_ioctl+0x2c/0x70) from [] (do_vfs_ioctl+0x4f4/0x55c) [78380.383374] [] (do_vfs_ioctl+0x4f4/0x55c) from [] (sys_ioctl+0x50/0x74) [78380.383408] [] (sys_ioctl+0x50/0x74) from [] (ret_fast_syscall+0x0/0x28) umount right after some big fs action (not sure, it was either lots of file deletions, a big rsync of some tree, or right after the btrfs-vol stuff) manages to trigger a btrfs related hang, too: [97460.345446] INFO: task umount:12765 blocked for more than 120 seconds. [97460.345481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [97460.345505] umount D c02b080c 0 12765 12681 0x00000000 [97460.345554] [] (schedule+0x424/0x488) from [] (bdi_sched_wait+0xc/0x18) [97460.345592] [] (bdi_sched_wait+0xc/0x18) from [] (__wait_on_bit+0x5c/0xa8) [97460.345625] [] (__wait_on_bit+0x5c/0xa8) from [] (out_of_line_wait_on_bit+0xac/0xc4) [97460.345661] [] (out_of_line_wait_on_bit+0xac/0xc4) from [] (sync_inodes_sb+0x68/0x100) [97460.345699] [] (sync_inodes_sb+0x68/0x100) from [] (__sync_filesystem+0x64/0x94) [97460.345737] [] (__sync_filesystem+0x64/0x94) from [] (generic_shutdown_super+0x28/0x110) [97460.345776] [] (generic_shutdown_super+0x28/0x110) from [] (kill_anon_super+0x14/0x3c) [97460.345813] [] (kill_anon_super+0x14/0x3c) from [] (deactivate_super+0x6c/0x90) [97460.345849] [] (deactivate_super+0x6c/0x90) from [] (sys_umount+0x2bc/0x2e8) [97460.345883] [] (sys_umount+0x2bc/0x2e8) from [] (ret_fast_syscall+0x0/0x28) [97580.340641] INFO: task umount:12765 blocked for more than 120 seconds. [97580.340674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [97580.340699] umount D c02b080c 0 12765 12681 0x00000000 [97580.340749] [] (schedule+0x424/0x488) from [] (bdi_sched_wait+0xc/0x18) [97580.340787] [] (bdi_sched_wait+0xc/0x18) from [] (__wait_on_bit+0x5c/0xa8) [97580.340821] [] (__wait_on_bit+0x5c/0xa8) from [] (out_of_line_wait_on_bit+0xac/0xc4) [97580.340857] [] (out_of_line_wait_on_bit+0xac/0xc4) from [] (sync_inodes_sb+0x68/0x100) [97580.340894] [] (sync_inodes_sb+0x68/0x100) from [] (__sync_filesystem+0x64/0x94) [97580.340932] [] (__sync_filesystem+0x64/0x94) from [] (generic_shutdown_super+0x28/0x110) [97580.340970] [] (generic_shutdown_super+0x28/0x110) from [] (kill_anon_super+0x14/0x3c) [97580.341008] [] (kill_anon_super+0x14/0x3c) from [] (deactivate_super+0x6c/0x90) [97580.341044] [] (deactivate_super+0x6c/0x90) from [] (sys_umount+0x2bc/0x2e8) [97580.341079] [] (sys_umount+0x2bc/0x2e8) from [] (ret_fast_syscall+0x0/0x28) I've never had the system or even the affected processes die on me, the end result was always ok. Just took ages. (Ok, btrfs-vol -b taking ages on a big fs is ok. umount taking 10min is a bit over the top, especially since the machine only has 1G ram, so there can't be that many dirty caches in any case... cheers -- vbi -- featured product: PostgreSQL - http://postgresql.org