public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfs task blocked for more than 120 seconds
@ 2012-01-30  0:20 Sami Liedes
  2012-01-30  1:05 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Sami Liedes @ 2012-01-30  0:20 UTC (permalink / raw)
  To: xfs

Hi!

Following the LWN coverage of LCA talk on XFS, I decided to give XFS a
try again. So I decided to try XFS on my backup partition.

My setup is as follows:

* The relevant disk, containing both the ext4 partition
  (/media/backup) and the xfs partition (/media/xfstest) are on the
  same 2 TB disk, which is a Western Digital Caviar Green SATA3 drive
  (SMART reports the device model to be WDC WD20EARX-00PASB0). The
  spindle speed of the model is reported to be "closer to 5400 than
  7200 rpm".

* The ext4 partition is roughly 1 TB in size and the xfs partition
  roughly 800 GB

* I'm using LVM; the logical volumes are /dev/rootvg/backup and
  /dev/rootvg/xfstest

* These LVs are both LUKS (dm-crypt) partitions, encrypted with
  cbc-essiv:sha256

* The computer is a Core i7 2600 3.4 GHz with 4 cores and HT
  (therefore shows as 8 cores) with 8 GiB main memory. AES-NI
  instructions are supported and disk crypto generally (with ext4)
  works at transparent speeds.

* I'm running Debian unstable amd64 with vanilla (non-Debian) 3.2.2
  kernel

* I believe both the disk and the main memory are in good health;
  particularly with ext4 there hasn't been any problems

* The backup data is generated by BackupPC. It generally consists of a
  large pool of files named by sha-1 hashes (~105 GB of files), plus
  lots of trees of hardlinks to these in a structure not unlike
  typical home directories and / and /usr trees. I *think* the hard
  links may be a red herring and I suspect the problem has something
  to do with xfs--dm-crypt interaction, but I'm not sure.

So, to copy the contents of the backup partition -- BackupPC data --
to the XFS drive, I did the following (the tar line is from shell
history, the lines before it are from my memory and might be slightly
wrong):

# cryptsetup luksFormat /dev/rootvg/xfstest
# cryptsetup luksOpen /dev/rootvg/xfstest rootvg-xfstest_crypt
# mkfs.xfs /dev/mapper/rootvg-xfstest_crypt
# mount /dev/mapper/rootvg-xfstest_crypt /media/xfstest
# cd /media/backup
# tar cf - backuppc/ | (cd ../xfstest && tar xf -)

During this operation, I got the following two warning messages in
dmesg:

------------------------------------------------------------
[    0.000000] Linux version 3.2.2 (sliedes@lh) (gcc version 4.6.2 (Debian 4.6.2-12) ) #1 SMP Thu Jan 26 03:51:41 EET 2012
[...]
[19419.407855] XFS (dm-7): Mounting Filesystem
[19419.726935] XFS (dm-7): Ending clean mount
[...]
[35380.253189] INFO: task tar:3312 blocked for more than 120 seconds.
[35380.253192] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[35380.253194] tar             D ffff880220adc678     0  3312   3311 0x00000000
[35380.253198]  ffff880107a1d778 0000000000000082 0000000000000000 ffff8800c552be30
[35380.253202]  ffff880220adc2f0 ffff880107a1dfd8 ffff880107a1dfd8 ffff880107a1dfd8
[35380.253206]  ffff880223d01650 ffff880220adc2f0 0000000000000000 ffffffff00000000
[35380.253210] Call Trace:
[35380.253217]  [<ffffffff813ed17a>] schedule+0x3a/0x50
[35380.253220]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
[35380.253237]  [<ffffffffa0a85ea5>] ? _xfs_buf_find+0xb5/0x240 [xfs]
[35380.253240]  [<ffffffff813ee257>] __down_common+0x95/0xe5
[35380.253249]  [<ffffffffa0a85ebe>] ? _xfs_buf_find+0xce/0x240 [xfs]
[35380.253252]  [<ffffffff813ee306>] __down+0x18/0x1a
[35380.253255]  [<ffffffff8107214c>] down+0x3c/0x50
[35380.253263]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
[35380.253272]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
[35380.253280]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
[35380.253289]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
[35380.253303]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
[35380.253316]  [<ffffffffa0ae1345>] ? xfs_trans_brelse+0xd5/0x160 [xfs]
[35380.253329]  [<ffffffffa0ac9732>] xfs_imap_to_bp.isra.9+0x42/0x120 [xfs]
[35380.253341]  [<ffffffffa0acd441>] xfs_iread+0x81/0x1c0 [xfs]
[35380.253352]  [<ffffffffa0a9bfc7>] ? kmem_zone_alloc+0x67/0xe0 [xfs]
[35380.253362]  [<ffffffffa0a8d043>] xfs_iget+0x2a3/0x820 [xfs]
[35380.253371]  [<ffffffffa0a8cd1c>] ? xfs_iunlock+0xbc/0x130 [xfs]
[35380.253382]  [<ffffffffa0a99f96>] xfs_lookup+0xb6/0x130 [xfs]
[35380.253392]  [<ffffffffa0a90c89>] xfs_vn_lookup+0x49/0x90 [xfs]
[35380.253395]  [<ffffffff811362b4>] ? d_alloc+0x24/0x70
[35380.253399]  [<ffffffff81129d90>] d_alloc_and_lookup+0x40/0x80
[35380.253401]  [<ffffffff81136670>] ? d_lookup+0x30/0x50
[35380.253403]  [<ffffffff8112c0ad>] do_lookup+0x2bd/0x3c0
[35380.253406]  [<ffffffff8112d1d4>] path_lookupat+0x114/0x740
[35380.253408]  [<ffffffff8112d82c>] do_path_lookup+0x2c/0xc0
[35380.253410]  [<ffffffff8112ebc4>] user_path_at_empty+0x54/0xa0
[35380.253414]  [<ffffffff8113c63f>] ? mntput_no_expire+0x1f/0x100
[35380.253416]  [<ffffffff8112ec1c>] user_path_at+0xc/0x10
[35380.253418]  [<ffffffff8112f249>] sys_linkat+0x59/0x150
[35380.253421]  [<ffffffff8111fc35>] ? sys_read+0x45/0x90
[35380.253425]  [<ffffffff813f6812>] system_call_fastpath+0x16/0x1b
[36339.636777] INFO: task tar:3312 blocked for more than 120 seconds.
[36339.636779] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[36339.636782] tar             D ffff880220adc678     0  3312   3311 0x00000000
[36339.636786]  ffff880107a1d6f8 0000000000000082 ffff880107a1d6d8 ffffffff8100177a
[36339.636790]  ffff880220adc2f0 ffff880107a1dfd8 ffff880107a1dfd8 ffff880107a1dfd8
[36339.636793]  ffff88021e222ca0 ffff880220adc2f0 ffff88021d8a7380 ffff88022040fa80
[36339.636797] Call Trace:
[36339.636803]  [<ffffffff8100177a>] ? __switch_to+0x1ea/0x310
[36339.636808]  [<ffffffff813ed17a>] schedule+0x3a/0x50
[36339.636811]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
[36339.636813]  [<ffffffff813ed17a>] ? schedule+0x3a/0x50
[36339.636816]  [<ffffffff813ee257>] __down_common+0x95/0xe5
[36339.636832]  [<ffffffffa0a85ebe>] ? _xfs_buf_find+0xce/0x240 [xfs]
[36339.636835]  [<ffffffff813ee306>] __down+0x18/0x1a
[36339.636838]  [<ffffffff8107214c>] down+0x3c/0x50
[36339.636847]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
[36339.636856]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
[36339.636865]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
[36339.636874]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
[36339.636889]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
[36339.636901]  [<ffffffffa0abaf9e>] xfs_da_do_buf+0x24e/0x620 [xfs]
[36339.636904]  [<ffffffff8107218d>] ? up+0x2d/0x50
[36339.636916]  [<ffffffffa0abb3cc>] xfs_da_read_buf+0x1c/0x20 [xfs]
[36339.636918]  [<ffffffff8106ccd9>] ? wake_up_bit+0x29/0x40
[36339.636931]  [<ffffffffa0ac158b>] xfs_dir2_leaf_lookup_int+0x5b/0x2d0 [xfs]
[36339.636942]  [<ffffffffa0aa8d5e>] ? xfs_bmap_last_extent.constprop.23+0x6e/0xb0 [xfs]
[36339.636953]  [<ffffffffa0ac1cc4>] xfs_dir2_leaf_lookup+0x34/0x110 [xfs]
[36339.636965]  [<ffffffffa0abd7c1>] ? xfs_dir2_isleaf+0x21/0x60 [xfs]
[36339.636976]  [<ffffffffa0abdd3a>] xfs_dir_lookup+0x14a/0x160 [xfs]
[36339.636987]  [<ffffffffa0a99f5f>] xfs_lookup+0x7f/0x130 [xfs]
[36339.636997]  [<ffffffffa0a90c89>] xfs_vn_lookup+0x49/0x90 [xfs]
[36339.637000]  [<ffffffff811362b4>] ? d_alloc+0x24/0x70
[36339.637003]  [<ffffffff81129d90>] d_alloc_and_lookup+0x40/0x80
[36339.637006]  [<ffffffff81136670>] ? d_lookup+0x30/0x50
[36339.637008]  [<ffffffff8112a7cf>] __lookup_hash.part.28+0xbf/0xe0
[36339.637012]  [<ffffffff811a7847>] ? security_inode_permission+0x17/0x20
[36339.637014]  [<ffffffff8112ab88>] lookup_hash+0x48/0x60
[36339.637016]  [<ffffffff8112da03>] kern_path_create+0x83/0x120
[36339.637019]  [<ffffffff811deabc>] ? strncpy_from_user+0x2c/0x40
[36339.637022]  [<ffffffff8112a310>] ? getname_flags+0x140/0x260
[36339.637024]  [<ffffffff8112dae5>] user_path_create+0x45/0x70
[36339.637027]  [<ffffffff8112f279>] sys_linkat+0x89/0x150
[36339.637029]  [<ffffffff8111fc35>] ? sys_read+0x45/0x90
[36339.637033]  [<ffffffff813f6812>] system_call_fastpath+0x16/0x1b
------------------------------------------------------------

Afterwards when BackupPC was doing a backup, I got these additional
warnings:

------------------------------------------------------------
[91863.941820] INFO: task kworker/0:5:19584 blocked for more than 120 seconds.
[91863.941823] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[91863.941825] kworker/0:5     D ffff8802205699d8     0 19584      2 0x00000000
[91863.941829]  ffff88005fe9bb70 0000000000000046 ffff88005fe9bb00 0000000000000086
[91863.941833]  ffff880220569650 ffff88005fe9bfd8 ffff88005fe9bfd8 ffff88005fe9bfd8
[91863.941837]  ffff88021e4c8000 ffff880220569650 ffff88005fe9bb80 ffffffff81044ecd
[91863.941840] Call Trace:
[91863.941847]  [<ffffffff81044ecd>] ? try_to_wake_up+0x22d/0x290
[91863.941852]  [<ffffffff813ed17a>] schedule+0x3a/0x50
[91863.941855]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
[91863.941858]  [<ffffffff81115124>] ? kmem_cache_free+0x104/0x110
[91863.941863]  [<ffffffff810d7172>] ? mempool_free_slab+0x12/0x20
[91863.941866]  [<ffffffff813ee257>] __down_common+0x95/0xe5
[91863.941886]  [<ffffffffa0ad4e07>] ? xfs_getsb+0x37/0x60 [xfs]
[91863.941889]  [<ffffffff813ee306>] __down+0x18/0x1a
[91863.941892]  [<ffffffff8107214c>] down+0x3c/0x50
[91863.941902]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
[91863.941915]  [<ffffffffa0ad4e07>] xfs_getsb+0x37/0x60 [xfs]
[91863.941929]  [<ffffffffa0ae0d03>] xfs_trans_getsb+0x43/0x140 [xfs]
[91863.941942]  [<ffffffffa0ad4b36>] xfs_mod_sb+0x46/0xc0 [xfs]
[91863.941952]  [<ffffffffa0a96320>] ? xfs_sync_inode_attr+0xf0/0xf0 [xfs]
[91863.941962]  [<ffffffffa0a8c29f>] xfs_fs_log_dummy+0x4f/0x90 [xfs]
[91863.941975]  [<ffffffffa0adad03>] ? xfs_log_need_covered+0x83/0xb0 [xfs]
[91863.941985]  [<ffffffffa0a963a4>] xfs_sync_worker+0x84/0x90 [xfs]
[91863.941988]  [<ffffffff81066ec8>] process_one_work+0x118/0x470
[91863.941991]  [<ffffffff81067dee>] worker_thread+0x15e/0x350
[91863.941994]  [<ffffffff81067c90>] ? manage_workers.isra.29+0x220/0x220
[91863.941996]  [<ffffffff8106c7a7>] kthread+0x87/0x90
[91863.941999]  [<ffffffff813f8974>] kernel_thread_helper+0x4/0x10
[91863.942002]  [<ffffffff8106c720>] ? kthread_worker_fn+0x190/0x190
[91863.942004]  [<ffffffff813f8970>] ? gs_change+0x13/0x13
[92103.787573] INFO: task cp:18226 blocked for more than 120 seconds.
[92103.787576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[92103.787578] cp              D ffff8802078e19d8     0 18226  28931 0x00000000
[92103.787582]  ffff880100f116f8 0000000000000082 ffff880100f116d8 ffffffff8100177a
[92103.787586]  ffff8802078e1650 ffff880100f11fd8 ffff880100f11fd8 ffff880100f11fd8
[92103.787590]  ffff880103439650 ffff8802078e1650 ffff880220c06c80 ffff88021f968000
[92103.787593] Call Trace:
[92103.787599]  [<ffffffff8100177a>] ? __switch_to+0x1ea/0x310
[92103.787604]  [<ffffffff813ed17a>] schedule+0x3a/0x50
[92103.787607]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
[92103.787623]  [<ffffffffa0a85c38>] ? xfs_buf_trylock+0x28/0xd0 [xfs]
[92103.787626]  [<ffffffff813ed17a>] ? schedule+0x3a/0x50
[92103.787629]  [<ffffffff813ee257>] __down_common+0x95/0xe5
[92103.787637]  [<ffffffffa0a85ebe>] ? _xfs_buf_find+0xce/0x240 [xfs]
[92103.787640]  [<ffffffff813ee306>] __down+0x18/0x1a
[92103.787643]  [<ffffffff8107214c>] down+0x3c/0x50
[92103.787652]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
[92103.787661]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
[92103.787670]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
[92103.787678]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
[92103.787693]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
[92103.787706]  [<ffffffffa0abaf9e>] xfs_da_do_buf+0x24e/0x620 [xfs]
[92103.787709]  [<ffffffff8107218d>] ? up+0x2d/0x50
[92103.787720]  [<ffffffffa0abb3cc>] xfs_da_read_buf+0x1c/0x20 [xfs]
[92103.787723]  [<ffffffff8106ccd9>] ? wake_up_bit+0x29/0x40
[92103.787735]  [<ffffffffa0ac158b>] xfs_dir2_leaf_lookup_int+0x5b/0x2d0 [xfs]
[92103.787746]  [<ffffffffa0aa8d5e>] ? xfs_bmap_last_extent.constprop.23+0x6e/0xb0 [xfs]
[92103.787758]  [<ffffffffa0ac1cc4>] xfs_dir2_leaf_lookup+0x34/0x110 [xfs]
[92103.787770]  [<ffffffffa0abd7c1>] ? xfs_dir2_isleaf+0x21/0x60 [xfs]
[92103.787781]  [<ffffffffa0abdd3a>] xfs_dir_lookup+0x14a/0x160 [xfs]
[92103.787791]  [<ffffffffa0a99f5f>] xfs_lookup+0x7f/0x130 [xfs]
[92103.787801]  [<ffffffffa0a90c89>] xfs_vn_lookup+0x49/0x90 [xfs]
[92103.787804]  [<ffffffff811362b4>] ? d_alloc+0x24/0x70
[92103.787808]  [<ffffffff81129d90>] d_alloc_and_lookup+0x40/0x80
[92103.787810]  [<ffffffff81136670>] ? d_lookup+0x30/0x50
[92103.787813]  [<ffffffff8112a7cf>] __lookup_hash.part.28+0xbf/0xe0
[92103.787817]  [<ffffffff811a7847>] ? security_inode_permission+0x17/0x20
[92103.787819]  [<ffffffff8112ab88>] lookup_hash+0x48/0x60
[92103.787822]  [<ffffffff8112da03>] kern_path_create+0x83/0x120
[92103.787825]  [<ffffffff811deabc>] ? strncpy_from_user+0x2c/0x40
[92103.787828]  [<ffffffff8112a310>] ? getname_flags+0x140/0x260
[92103.787830]  [<ffffffff8112dae5>] user_path_create+0x45/0x70
[92103.787832]  [<ffffffff8112f279>] sys_linkat+0x89/0x150
[92103.787836]  [<ffffffff813f6812>] system_call_fastpath+0x16/0x1b
[92103.787842] INFO: task kworker/0:1:19995 blocked for more than 120 seconds.
[92103.787844] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[92103.787845] kworker/0:1     D ffff88021a6719d8     0 19995      2 0x00000000
[92103.787848]  ffff8800bfbd1b70 0000000000000046 ffff8800bfbd1b10 ffffffff811d2106
[92103.787852]  ffff88021a671650 ffff8800bfbd1fd8 ffff8800bfbd1fd8 ffff8800bfbd1fd8
[92103.787855]  ffff880226185940 ffff88021a671650 ffff8800bfbd1d90 ffff8800bfbd1e2c
[92103.787858] Call Trace:
[92103.787863]  [<ffffffff811d2106>] ? cpumask_next_and+0x36/0x50
[92103.787866]  [<ffffffff813ed17a>] schedule+0x3a/0x50
[92103.787868]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
[92103.787872]  [<ffffffffa09fd200>] ? cpufreq_stat_notifier_trans+0xa0/0xc0 [cpufreq_stats]
[92103.787875]  [<ffffffff813f289d>] ? notifier_call_chain+0x4d/0x70
[92103.787879]  [<ffffffff813ee257>] __down_common+0x95/0xe5
[92103.787892]  [<ffffffffa0ad4e07>] ? xfs_getsb+0x37/0x60 [xfs]
[92103.787895]  [<ffffffff813ee306>] __down+0x18/0x1a
[92103.787897]  [<ffffffff8107214c>] down+0x3c/0x50
[92103.787906]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
[92103.787918]  [<ffffffffa0ad4e07>] xfs_getsb+0x37/0x60 [xfs]
[92103.787931]  [<ffffffffa0ae0d03>] xfs_trans_getsb+0x43/0x140 [xfs]
[92103.787944]  [<ffffffffa0ad4b36>] xfs_mod_sb+0x46/0xc0 [xfs]
[92103.787954]  [<ffffffffa0a96320>] ? xfs_sync_inode_attr+0xf0/0xf0 [xfs]
[92103.787964]  [<ffffffffa0a8c29f>] xfs_fs_log_dummy+0x4f/0x90 [xfs]
[92103.787977]  [<ffffffffa0adad03>] ? xfs_log_need_covered+0x83/0xb0 [xfs]
[92103.787987]  [<ffffffffa0a963a4>] xfs_sync_worker+0x84/0x90 [xfs]
[92103.787991]  [<ffffffff81066ec8>] process_one_work+0x118/0x470
[92103.787994]  [<ffffffff81067dee>] worker_thread+0x15e/0x350
[92103.787996]  [<ffffffff81067c90>] ? manage_workers.isra.29+0x220/0x220
[92103.787999]  [<ffffffff8106c7a7>] kthread+0x87/0x90
[92103.788002]  [<ffffffff813f8974>] kernel_thread_helper+0x4/0x10
[92103.788004]  [<ffffffff8106c720>] ? kthread_worker_fn+0x190/0x190
[92103.788007]  [<ffffffff813f8970>] ? gs_change+0x13/0x13
------------------------------------------------------------

	Sami

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs task blocked for more than 120 seconds
  2012-01-30  0:20 xfs task blocked for more than 120 seconds Sami Liedes
@ 2012-01-30  1:05 ` Dave Chinner
  2012-01-30 22:35   ` Sami Liedes
       [not found]   ` <2504_1327964557_4F27218D_2504_92_2_20120130223527.GH10174@sli.dy.fi>
  0 siblings, 2 replies; 6+ messages in thread
From: Dave Chinner @ 2012-01-30  1:05 UTC (permalink / raw)
  To: Sami Liedes; +Cc: xfs

On Mon, Jan 30, 2012 at 02:20:26AM +0200, Sami Liedes wrote:
> Hi!
> 
> Following the LWN coverage of LCA talk on XFS, I decided to give XFS a
> try again. So I decided to try XFS on my backup partition.
> 
> My setup is as follows:
> 
> * The relevant disk, containing both the ext4 partition
>   (/media/backup) and the xfs partition (/media/xfstest) are on the
>   same 2 TB disk, which is a Western Digital Caviar Green SATA3 drive
>   (SMART reports the device model to be WDC WD20EARX-00PASB0). The
>   spindle speed of the model is reported to be "closer to 5400 than
>   7200 rpm".

Ok, so your workload is seeking repeatedly between the inner and out
portions of the disk, so will be seek bound and quite slow.

> * The ext4 partition is roughly 1 TB in size and the xfs partition
>   roughly 800 GB
> 
> * I'm using LVM; the logical volumes are /dev/rootvg/backup and
>   /dev/rootvg/xfstest
> 
> * These LVs are both LUKS (dm-crypt) partitions, encrypted with
>   cbc-essiv:sha256
> 
> * The computer is a Core i7 2600 3.4 GHz with 4 cores and HT
>   (therefore shows as 8 cores) with 8 GiB main memory. AES-NI
>   instructions are supported and disk crypto generally (with ext4)
>   works at transparent speeds.

That's not to say that ext4 doesn't have long IO hold-offs - it just
doesn't trigger the hang-check code.

> 
> * I'm running Debian unstable amd64 with vanilla (non-Debian) 3.2.2
>   kernel
> 
> * I believe both the disk and the main memory are in good health;
>   particularly with ext4 there hasn't been any problems
> 
> * The backup data is generated by BackupPC. It generally consists of a
>   large pool of files named by sha-1 hashes (~105 GB of files), plus
>   lots of trees of hardlinks to these in a structure not unlike
>   typical home directories and / and /usr trees. I *think* the hard
>   links may be a red herring and I suspect the problem has something
>   to do with xfs--dm-crypt interaction, but I'm not sure.

It could very well be a dm-crypt/xfs interaction, though I don't
have enough information yet.

> So, to copy the contents of the backup partition -- BackupPC data --
> to the XFS drive, I did the following (the tar line is from shell
> history, the lines before it are from my memory and might be slightly
> wrong):
> 
> # cryptsetup luksFormat /dev/rootvg/xfstest
> # cryptsetup luksOpen /dev/rootvg/xfstest rootvg-xfstest_crypt
> # mkfs.xfs /dev/mapper/rootvg-xfstest_crypt
> # mount /dev/mapper/rootvg-xfstest_crypt /media/xfstest
> # cd /media/backup
> # tar cf - backuppc/ | (cd ../xfstest && tar xf -)
> 
> During this operation, I got the following two warning messages in
> dmesg:
> 
> ------------------------------------------------------------
> [    0.000000] Linux version 3.2.2 (sliedes@lh) (gcc version 4.6.2 (Debian 4.6.2-12) ) #1 SMP Thu Jan 26 03:51:41 EET 2012
> [...]
> [19419.407855] XFS (dm-7): Mounting Filesystem
> [19419.726935] XFS (dm-7): Ending clean mount
> [...]
> [35380.253189] INFO: task tar:3312 blocked for more than 120 seconds.
> [35380.253192] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [35380.253194] tar             D ffff880220adc678     0  3312   3311 0x00000000
> [35380.253198]  ffff880107a1d778 0000000000000082 0000000000000000 ffff8800c552be30
> [35380.253202]  ffff880220adc2f0 ffff880107a1dfd8 ffff880107a1dfd8 ffff880107a1dfd8
> [35380.253206]  ffff880223d01650 ffff880220adc2f0 0000000000000000 ffffffff00000000
> [35380.253210] Call Trace:
> [35380.253217]  [<ffffffff813ed17a>] schedule+0x3a/0x50
> [35380.253220]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
> [35380.253237]  [<ffffffffa0a85ea5>] ? _xfs_buf_find+0xb5/0x240 [xfs]
> [35380.253240]  [<ffffffff813ee257>] __down_common+0x95/0xe5
> [35380.253249]  [<ffffffffa0a85ebe>] ? _xfs_buf_find+0xce/0x240 [xfs]
> [35380.253252]  [<ffffffff813ee306>] __down+0x18/0x1a
> [35380.253255]  [<ffffffff8107214c>] down+0x3c/0x50
> [35380.253263]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
> [35380.253272]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
> [35380.253280]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
> [35380.253289]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
> [35380.253303]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
> [35380.253316]  [<ffffffffa0ae1345>] ? xfs_trans_brelse+0xd5/0x160 [xfs]
> [35380.253329]  [<ffffffffa0ac9732>] xfs_imap_to_bp.isra.9+0x42/0x120 [xfs]
> [35380.253341]  [<ffffffffa0acd441>] xfs_iread+0x81/0x1c0 [xfs]

Waiting for IO completion reading an inode buffer.

> [36339.636779] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [36339.636782] tar             D ffff880220adc678     0  3312   3311 0x00000000
.....
> [36339.636838]  [<ffffffff8107214c>] down+0x3c/0x50
> [36339.636847]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
> [36339.636856]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
> [36339.636865]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
> [36339.636874]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
> [36339.636889]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
> [36339.636901]  [<ffffffffa0abaf9e>] xfs_da_do_buf+0x24e/0x620 [xfs]
> [36339.636904]  [<ffffffff8107218d>] ? up+0x2d/0x50
> [36339.636916]  [<ffffffffa0abb3cc>] xfs_da_read_buf+0x1c/0x20 [xfs]

Waiting for IO completion reading a directory buffer.

This typically happens when the buffer is already under IO (read or
write).

> ------------------------------------------------------------
> 
> Afterwards when BackupPC was doing a backup, I got these additional
> warnings:
> 
> ------------------------------------------------------------
> [91863.941820] INFO: task kworker/0:5:19584 blocked for more than 120 seconds.
> [91863.941823] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [91863.941825] kworker/0:5     D ffff8802205699d8     0 19584      2 0x00000000
> [91863.941829]  ffff88005fe9bb70 0000000000000046 ffff88005fe9bb00 0000000000000086
> [91863.941833]  ffff880220569650 ffff88005fe9bfd8 ffff88005fe9bfd8 ffff88005fe9bfd8
> [91863.941837]  ffff88021e4c8000 ffff880220569650 ffff88005fe9bb80 ffffffff81044ecd
> [91863.941840] Call Trace:
> [91863.941847]  [<ffffffff81044ecd>] ? try_to_wake_up+0x22d/0x290
> [91863.941852]  [<ffffffff813ed17a>] schedule+0x3a/0x50
> [91863.941855]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
> [91863.941858]  [<ffffffff81115124>] ? kmem_cache_free+0x104/0x110
> [91863.941863]  [<ffffffff810d7172>] ? mempool_free_slab+0x12/0x20
> [91863.941866]  [<ffffffff813ee257>] __down_common+0x95/0xe5
> [91863.941886]  [<ffffffffa0ad4e07>] ? xfs_getsb+0x37/0x60 [xfs]
> [91863.941889]  [<ffffffff813ee306>] __down+0x18/0x1a
> [91863.941892]  [<ffffffff8107214c>] down+0x3c/0x50
> [91863.941902]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
> [91863.941915]  [<ffffffffa0ad4e07>] xfs_getsb+0x37/0x60 [xfs]
> [91863.941929]  [<ffffffffa0ae0d03>] xfs_trans_getsb+0x43/0x140 [xfs]
> [91863.941942]  [<ffffffffa0ad4b36>] xfs_mod_sb+0x46/0xc0 [xfs]
> [91863.941952]  [<ffffffffa0a96320>] ? xfs_sync_inode_attr+0xf0/0xf0 [xfs]
> [91863.941962]  [<ffffffffa0a8c29f>] xfs_fs_log_dummy+0x4f/0x90 [xfs]
> [91863.941975]  [<ffffffffa0adad03>] ? xfs_log_need_covered+0x83/0xb0 [xfs]
> [91863.941985]  [<ffffffffa0a963a4>] xfs_sync_worker+0x84/0x90 [xfs]

Ok, that implies that the problem is dm-crypt. xfs_sync_worker()
calls xfs_fs_log_dummy() once every 30s. The first time it dirties
the superblock, logs it, then pushes it out for I/O. 30s later we've
come back and the superblock is still under IO. That directly
implies that dm-crypt hasn't finished the IO yet.

> [92103.787573] INFO: task cp:18226 blocked for more than 120 seconds.
> [92103.787576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [92103.787578] cp              D ffff8802078e19d8     0 18226  28931 0x00000000
....
> [92103.787640]  [<ffffffff813ee306>] __down+0x18/0x1a
> [92103.787643]  [<ffffffff8107214c>] down+0x3c/0x50
> [92103.787652]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
> [92103.787661]  [<ffffffffa0a85ebe>] _xfs_buf_find+0xce/0x240 [xfs]
> [92103.787670]  [<ffffffffa0a8629f>] xfs_buf_get+0x2f/0x1d0 [xfs]
> [92103.787678]  [<ffffffffa0a86b38>] xfs_buf_read+0x28/0x120 [xfs]
> [92103.787693]  [<ffffffffa0ae0f87>] xfs_trans_read_buf+0x187/0x470 [xfs]
> [92103.787706]  [<ffffffffa0abaf9e>] xfs_da_do_buf+0x24e/0x620 [xfs]

And that's another directory read waiting on IO completion.

It is definitely a possibility that dm-crypt is not keeping up with
the IO that XFS is sending it and the way XFS blocks waiting for it
to complete triggers the hang-check code. However, it is possible
that XFS is stalling due to long IO completion latencies. Do the
workloads actually complete, or does the system hang? Also, does the
IO to the disk appear to stop for long periods, or is the disk 100%
busy the whole time? If the disk goes idle, can you get a dump of
the stalled processes via "echo w > /proc/sysrq-trigger" and post
that?


Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs task blocked for more than 120 seconds
  2012-01-30  1:05 ` Dave Chinner
@ 2012-01-30 22:35   ` Sami Liedes
       [not found]   ` <2504_1327964557_4F27218D_2504_92_2_20120130223527.GH10174@sli.dy.fi>
  1 sibling, 0 replies; 6+ messages in thread
From: Sami Liedes @ 2012-01-30 22:35 UTC (permalink / raw)
  To: xfs

On Mon, Jan 30, 2012 at 12:05:30PM +1100, Dave Chinner wrote:
> > * The computer is a Core i7 2600 3.4 GHz with 4 cores and HT
> >   (therefore shows as 8 cores) with 8 GiB main memory. AES-NI
> >   instructions are supported and disk crypto generally (with ext4)
> >   works at transparent speeds.
> 
> That's not to say that ext4 doesn't have long IO hold-offs - it just
> doesn't trigger the hang-check code.

Hmm, maybe. Yet 120 seconds of a blocking syscall somehow sounds quite
long to me. With ext3 I remember seeing those every now and then with
dm-crypt.

> It is definitely a possibility that dm-crypt is not keeping up with
> the IO that XFS is sending it and the way XFS blocks waiting for it
> to complete triggers the hang-check code. However, it is possible
> that XFS is stalling due to long IO completion latencies. Do the
> workloads actually complete, or does the system hang? Also, does the
> IO to the disk appear to stop for long periods, or is the disk 100%
> busy the whole time? If the disk goes idle, can you get a dump of
> the stalled processes via "echo w > /proc/sysrq-trigger" and post
> that?

The workloads do eventually complete. I tried the tar extraction again
but this time extracting the tar from a different disk and saw no such
warnings (and the time taken seems reasonable at 96 minutes).

The blocked syscalls during BackupPC backupping seems weirder to me. I
don't think the ext4 partition was even mounted at that point, and if
it was, there certainly was no activity, i.e. the XFS partition was
the only partition on that disk that saw any I/O. I'll see if I can
figure out some way to repeat that and to figure out if the disk goes
idle.

	Sami

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs task blocked for more than 120 seconds
@ 2012-01-31 10:54 Brian Candler
  0 siblings, 0 replies; 6+ messages in thread
From: Brian Candler @ 2012-01-31 10:54 UTC (permalink / raw)
  To: xfs

Aside: I saw what seems to be the same error twice yesterday in a setup
without dcrypt, but running bonnie++ on a software RAID6 array (12 drives)

The tasks affected were "kworker" and "xfsaild" - full logs below. OS is
stock ubuntu 11.10 x86_64 server.

Write performance to the RAID6 is admittedly pretty terrible, so maybe it
was just some backlog building up.

Regards,

Brian.

...
Jan 30 10:36:40 storage2 kernel: [  149.840035] XFS (md127): Mounting Filesystem
Jan 30 10:36:40 storage2 kernel: [  150.083318] XFS (md127): Ending clean mount
Jan 30 12:00:11 storage2 kernel: [ 5159.737134] INFO: task kworker/0:3:1644 blocked for more than 120 seconds.
Jan 30 12:00:11 storage2 kernel: [ 5159.737507] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 30 12:00:11 storage2 kernel: [ 5159.746598] kworker/0:3     D ffffffff81805120     0  1644      2 0x00000000
Jan 30 12:00:11 storage2 kernel: [ 5159.746603]  ffff8802078abb00 0000000000000046 0000000000000004 0000000000000000
Jan 30 12:00:11 storage2 kernel: [ 5159.746608]  ffff8802078abfd8 ffff8802078abfd8 ffff8802078abfd8 0000000000012a40
Jan 30 12:00:11 storage2 kernel: [ 5159.746613]  ffffffff81c0b020 ffff880212bc4560 ffff8802078abaf0 ffff8802129c2998
Jan 30 12:00:11 storage2 kernel: [ 5159.746618] Call Trace:
Jan 30 12:00:11 storage2 kernel: [ 5159.746625]  [<ffffffff8160492f>] schedule+0x3f/0x60
Jan 30 12:00:11 storage2 kernel: [ 5159.746642]  [<ffffffffa029f60c>] xlog_wait+0x6c/0x90 [xfs]
Jan 30 12:00:11 storage2 kernel: [ 5159.746646]  [<ffffffff810573a0>] ? try_to_wake_up+0x200/0x200
Jan 30 12:00:11 storage2 kernel: [ 5159.746658]  [<ffffffffa029f93f>] xlog_cil_push+0x30f/0x380 [xfs]
Jan 30 12:00:11 storage2 kernel: [ 5159.746661]  [<ffffffff81086b92>] ? up+0x32/0x50
Jan 30 12:00:11 storage2 kernel: [ 5159.746672]  [<ffffffffa02a015e>] xlog_cil_force_lsn+0x10e/0x120 [xfs]
Jan 30 12:00:11 storage2 kernel: [ 5159.746676]  [<ffffffff811532df>] ? kmem_cache_free+0x2f/0x120
Jan 30 12:00:11 storage2 kernel: [ 5159.746687]  [<ffffffffa029e797>] _xfs_log_force_lsn+0x57/0x2e0 [xfs]
Jan 30 12:00:11 storage2 kernel: [ 5159.746690]  [<ffffffff811532df>] ? kmem_cache_free+0x2f/0x120
Jan 30 12:00:11 storage2 kernel: [ 5159.746702]  [<ffffffffa02abf8d>] _xfs_trans_commit+0x29d/0x2b0 [xfs]
Jan 30 12:00:11 storage2 kernel: [ 5159.746715]  [<ffffffffa02bfec0>] ? xfs_sync_inode_attr+0xf0/0xf0 [xfs]
Jan 30 12:00:11 storage2 kernel: [ 5159.746726]  [<ffffffffa028f0c4>] xfs_fs_log_dummy+0x64/0x90 [xfs]
Jan 30 12:00:11 storage2 kernel: [ 5159.746738]  [<ffffffffa02bff44>] xfs_sync_worker+0x84/0x90 [xfs]
Jan 30 12:00:11 storage2 kernel: [ 5159.746742]  [<ffffffff8107b9ca>] process_one_work+0x11a/0x480
Jan 30 12:00:11 storage2 kernel: [ 5159.746745]  [<ffffffff8107c775>] worker_thread+0x165/0x370
Jan 30 12:00:11 storage2 kernel: [ 5159.746748]  [<ffffffff8107c610>] ? manage_workers.isra.30+0x130/0x130
Jan 30 12:00:11 storage2 kernel: [ 5159.746752]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 30 12:00:11 storage2 kernel: [ 5159.746755]  [<ffffffff8160fca4>] kernel_thread_helper+0x4/0x10
Jan 30 12:00:11 storage2 kernel: [ 5159.746759]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 30 12:00:11 storage2 kernel: [ 5159.746761]  [<ffffffff8160fca0>] ? gs_change+0x13/0x13
Jan 30 12:56:11 storage2 kernel: [ 8519.003667] INFO: task xfsaild/md127:1559 blocked for more than 120 seconds.
Jan 30 12:56:11 storage2 kernel: [ 8519.008584] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 30 12:56:11 storage2 kernel: [ 8519.018149] xfsaild/md127   D ffffffff81805120     0  1559      2 0x00000000
Jan 30 12:56:11 storage2 kernel: [ 8519.018153]  ffff88020789dcb0 0000000000000046 0000000000000000 0000000000000000
Jan 30 12:56:11 storage2 kernel: [ 8519.018155]  ffff88020789dfd8 ffff88020789dfd8 ffff88020789dfd8 0000000000012a40
Jan 30 12:56:11 storage2 kernel: [ 8519.018158]  ffff880212d28000 ffff8802117dc560 ffff88020789dca0 000000000000007e
Jan 30 12:56:11 storage2 kernel: [ 8519.018160] Call Trace:
Jan 30 12:56:11 storage2 kernel: [ 8519.018167]  [<ffffffff8160492f>] schedule+0x3f/0x60
Jan 30 12:56:11 storage2 kernel: [ 8519.018183]  [<ffffffffa02a0147>] xlog_cil_force_lsn+0xf7/0x120 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.018185]  [<ffffffff810573a0>] ? try_to_wake_up+0x200/0x200
Jan 30 12:56:11 storage2 kernel: [ 8519.018196]  [<ffffffffa029e4c8>] _xfs_log_force+0x68/0x2a0 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.018208]  [<ffffffffa02b5fdb>] ? xfs_buf_unlock+0x3b/0xc0 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.018218]  [<ffffffffa029e718>] xfs_log_force+0x18/0x40 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.018229]  [<ffffffffa02ac804>] xfsaild_push+0x394/0x3a0 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.018239]  [<ffffffffa02ac872>] xfsaild+0x62/0xc0 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.018249]  [<ffffffffa02ac810>] ? xfsaild_push+0x3a0/0x3a0 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.018252]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 30 12:56:11 storage2 kernel: [ 8519.018254]  [<ffffffff8160fca4>] kernel_thread_helper+0x4/0x10
Jan 30 12:56:11 storage2 kernel: [ 8519.018256]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 30 12:56:11 storage2 kernel: [ 8519.018258]  [<ffffffff8160fca0>] ? gs_change+0x13/0x13
Jan 30 12:56:11 storage2 kernel: [ 8519.018260] INFO: task kworker/0:0:1796 blocked for more than 120 seconds.
Jan 30 12:56:11 storage2 kernel: [ 8519.023397] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 30 12:56:11 storage2 kernel: [ 8519.033551] kworker/0:0     D ffffffff81805120     0  1796      2 0x00000000
Jan 30 12:56:11 storage2 kernel: [ 8519.033553]  ffff88010c9ebb00 0000000000000046 0000000000000004 0000000000000000
Jan 30 12:56:11 storage2 kernel: [ 8519.033555]  ffff88010c9ebfd8 ffff88010c9ebfd8 ffff88010c9ebfd8 0000000000012a40
Jan 30 12:56:11 storage2 kernel: [ 8519.033557]  ffffffff81c0b020 ffff880212cedc80 ffff88010c9ebaf0 ffff8802129c2998
Jan 30 12:56:11 storage2 kernel: [ 8519.033560] Call Trace:
Jan 30 12:56:11 storage2 kernel: [ 8519.033562]  [<ffffffff8160492f>] schedule+0x3f/0x60
Jan 30 12:56:11 storage2 kernel: [ 8519.033572]  [<ffffffffa029f60c>] xlog_wait+0x6c/0x90 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.033574]  [<ffffffff810573a0>] ? try_to_wake_up+0x200/0x200
Jan 30 12:56:11 storage2 kernel: [ 8519.033583]  [<ffffffffa029f93f>] xlog_cil_push+0x30f/0x380 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.033586]  [<ffffffff81086b92>] ? up+0x32/0x50
Jan 30 12:56:11 storage2 kernel: [ 8519.033595]  [<ffffffffa02a015e>] xlog_cil_force_lsn+0x10e/0x120 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.033598]  [<ffffffff811532df>] ? kmem_cache_free+0x2f/0x120
Jan 30 12:56:11 storage2 kernel: [ 8519.033607]  [<ffffffffa029e797>] _xfs_log_force_lsn+0x57/0x2e0 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.033609]  [<ffffffff811532df>] ? kmem_cache_free+0x2f/0x120
Jan 30 12:56:11 storage2 kernel: [ 8519.033619]  [<ffffffffa02abf8d>] _xfs_trans_commit+0x29d/0x2b0 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.033630]  [<ffffffffa02bfec0>] ? xfs_sync_inode_attr+0xf0/0xf0 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.033640]  [<ffffffffa028f0c4>] xfs_fs_log_dummy+0x64/0x90 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.033651]  [<ffffffffa02bff44>] xfs_sync_worker+0x84/0x90 [xfs]
Jan 30 12:56:11 storage2 kernel: [ 8519.033654]  [<ffffffff8107b9ca>] process_one_work+0x11a/0x480
Jan 30 12:56:11 storage2 kernel: [ 8519.033656]  [<ffffffff8107c775>] worker_thread+0x165/0x370
Jan 30 12:56:11 storage2 kernel: [ 8519.033657]  [<ffffffff8107c610>] ? manage_workers.isra.30+0x130/0x130
Jan 30 12:56:11 storage2 kernel: [ 8519.033660]  [<ffffffff81080bbc>] kthread+0x8c/0xa0
Jan 30 12:56:11 storage2 kernel: [ 8519.033661]  [<ffffffff8160fca4>] kernel_thread_helper+0x4/0x10
Jan 30 12:56:11 storage2 kernel: [ 8519.033664]  [<ffffffff81080b30>] ? flush_kthread_worker+0xa0/0xa0
Jan 30 12:56:11 storage2 kernel: [ 8519.033665]  [<ffffffff8160fca0>] ? gs_change+0x13/0x13

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs task blocked for more than 120 seconds
       [not found]   ` <2504_1327964557_4F27218D_2504_92_2_20120130223527.GH10174@sli.dy.fi>
@ 2012-01-31 23:30     ` Sami Liedes
  2012-02-02  0:26       ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Sami Liedes @ 2012-01-31 23:30 UTC (permalink / raw)
  To: xfs

On Tue, Jan 31, 2012 at 12:35:28AM +0200, Sami Liedes wrote:
> The workloads do eventually complete. I tried the tar extraction again
> but this time extracting the tar from a different disk and saw no such
> warnings (and the time taken seems reasonable at 96 minutes).

Extracting a 105 GB tar that resides on the same (target) XFS
partition on an otherwise idle disk does trigger warnings, but only
for kworker, not the tar process. The time taken to extract was a
reasonable-sounding 140 minutes, so maybe these are just harmless
warnings. Perhaps something should/could be done to silence them in
that case?

	Sami

------------------------------------------------------------
[438920.782686] INFO: task kworker/0:1:7593 blocked for more than 120 seconds.
[438920.782689] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[438920.782691] kworker/0:1     D ffff8802228519d8     0  7593      2 0x00000000
[438920.782695]  ffff88003627fb70 0000000000000046 ffff88003627fb10 ffffffff811d2106
[438920.782699]  ffff880222851650 ffff88003627ffd8 ffff88003627ffd8 ffff88003627ffd8
[438920.782702]  ffff880220adc2f0 ffff880222851650 ffff88003627fd90 ffff88003627fe2c
[438920.782706] Call Trace:
[438920.782713]  [<ffffffff811d2106>] ? cpumask_next_and+0x36/0x50
[438920.782718]  [<ffffffff813ed17a>] schedule+0x3a/0x50
[438920.782721]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
[438920.782724]  [<ffffffff81035a34>] ? __enqueue_entity+0x74/0x80
[438920.782727]  [<ffffffff8103c785>] ? enqueue_entity+0xe5/0x4d0
[438920.782730]  [<ffffffff813ee257>] __down_common+0x95/0xe5
[438920.782752]  [<ffffffffa0ad4e07>] ? xfs_getsb+0x37/0x60 [xfs]
[438920.782755]  [<ffffffff813ee306>] __down+0x18/0x1a
[438920.782759]  [<ffffffff8107214c>] down+0x3c/0x50
[438920.782768]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
[438920.782781]  [<ffffffffa0ad4e07>] xfs_getsb+0x37/0x60 [xfs]
[438920.782795]  [<ffffffffa0ae0d03>] xfs_trans_getsb+0x43/0x140 [xfs]
[438920.782809]  [<ffffffffa0ad4b36>] xfs_mod_sb+0x46/0xc0 [xfs]
[438920.782820]  [<ffffffffa0a96320>] ? xfs_sync_inode_attr+0xf0/0xf0 [xfs]
[438920.782829]  [<ffffffffa0a8c29f>] xfs_fs_log_dummy+0x4f/0x90 [xfs]
[438920.782842]  [<ffffffffa0adad03>] ? xfs_log_need_covered+0x83/0xb0 [xfs]
[438920.782853]  [<ffffffffa0a963a4>] xfs_sync_worker+0x84/0x90 [xfs]
[438920.782856]  [<ffffffff81066ec8>] process_one_work+0x118/0x470
[438920.782859]  [<ffffffff81067dee>] worker_thread+0x15e/0x350
[438920.782862]  [<ffffffff81067c90>] ? manage_workers.isra.29+0x220/0x220
[438920.782864]  [<ffffffff8106c7a7>] kthread+0x87/0x90
[438920.782867]  [<ffffffff813f8974>] kernel_thread_helper+0x4/0x10
[438920.782870]  [<ffffffff8106c720>] ? kthread_worker_fn+0x190/0x190
[438920.782872]  [<ffffffff813f8970>] ? gs_change+0x13/0x13
[439640.359882] INFO: task kworker/0:3:7681 blocked for more than 120 seconds.
[439640.359885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[439640.359887] kworker/0:3     D ffff8801086499d8     0  7681      2 0x00000000
[439640.359891]  ffff88013591db70 0000000000000046 ffff88013591db00 0000000000000086
[439640.359895]  ffff880108649650 ffff88013591dfd8 ffff88013591dfd8 ffff88013591dfd8
[439640.359898]  ffff880222335940 ffff880108649650 ffff88013591db80 ffffffff81044ecd
[439640.359902] Call Trace:
[439640.359909]  [<ffffffff81044ecd>] ? try_to_wake_up+0x22d/0x290
[439640.359914]  [<ffffffff813ed17a>] schedule+0x3a/0x50
[439640.359916]  [<ffffffff813ed6a5>] schedule_timeout+0x1c5/0x300
[439640.359920]  [<ffffffff81115124>] ? kmem_cache_free+0x104/0x110
[439640.359924]  [<ffffffff810d7172>] ? mempool_free_slab+0x12/0x20
[439640.359927]  [<ffffffff813ee257>] __down_common+0x95/0xe5
[439640.359947]  [<ffffffffa0ad4e07>] ? xfs_getsb+0x37/0x60 [xfs]
[439640.359950]  [<ffffffff813ee306>] __down+0x18/0x1a
[439640.359954]  [<ffffffff8107214c>] down+0x3c/0x50
[439640.359962]  [<ffffffffa0a85d24>] xfs_buf_lock+0x44/0x110 [xfs]
[439640.359975]  [<ffffffffa0ad4e07>] xfs_getsb+0x37/0x60 [xfs]
[439640.359987]  [<ffffffffa0ae0d03>] xfs_trans_getsb+0x43/0x140 [xfs]
[439640.360000]  [<ffffffffa0ad4b36>] xfs_mod_sb+0x46/0xc0 [xfs]
[439640.360009]  [<ffffffffa0a96320>] ? xfs_sync_inode_attr+0xf0/0xf0 [xfs]
[439640.360018]  [<ffffffffa0a8c29f>] xfs_fs_log_dummy+0x4f/0x90 [xfs]
[439640.360030]  [<ffffffffa0adad03>] ? xfs_log_need_covered+0x83/0xb0 [xfs]
[439640.360039]  [<ffffffffa0a963a4>] xfs_sync_worker+0x84/0x90 [xfs]
[439640.360042]  [<ffffffff81066ec8>] process_one_work+0x118/0x470
[439640.360045]  [<ffffffff81067dee>] worker_thread+0x15e/0x350
[439640.360048]  [<ffffffff81067c90>] ? manage_workers.isra.29+0x220/0x220
[439640.360050]  [<ffffffff8106c7a7>] kthread+0x87/0x90
[439640.360053]  [<ffffffff813f8974>] kernel_thread_helper+0x4/0x10
[439640.360055]  [<ffffffff8106c720>] ? kthread_worker_fn+0x190/0x190
[439640.360057]  [<ffffffff813f8970>] ? gs_change+0x13/0x13
------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs task blocked for more than 120 seconds
  2012-01-31 23:30     ` Sami Liedes
@ 2012-02-02  0:26       ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2012-02-02  0:26 UTC (permalink / raw)
  To: xfs

On Wed, Feb 01, 2012 at 01:30:34AM +0200, Sami Liedes wrote:
> On Tue, Jan 31, 2012 at 12:35:28AM +0200, Sami Liedes wrote:
> > The workloads do eventually complete. I tried the tar extraction again
> > but this time extracting the tar from a different disk and saw no such
> > warnings (and the time taken seems reasonable at 96 minutes).
> 
> Extracting a 105 GB tar that resides on the same (target) XFS
> partition on an otherwise idle disk does trigger warnings, but only
> for kworker, not the tar process. The time taken to extract was a
> reasonable-sounding 140 minutes, so maybe these are just harmless
> warnings.

It is indicating that superblock IO is taking more than 120s to
complete. That's an indication that the IO subsystem is not
throttling the IO queue (and so queuing up minutes worth of IO) or
not processing the IO queue in a fair/timely manner. IO should never
take more than a couple of seconds to complete unless there is
something wrong, so the warnings are quite valid at the default
timeout of 120s....

I'd suggest reporting the problem to the dm-crypt folks and see if
they have any ideas on why IO completion is taking so long.

> Perhaps something should/could be done to silence them in
> that case?

You can turn the hangcheck timer off or adjust it's timeout yourself
via these proc files:

$ ls /proc/sys/kernel/ |grep hung
hung_task_check_count
hung_task_panic
hung_task_timeout_secs
hung_task_warnings

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-02-02  0:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-30  0:20 xfs task blocked for more than 120 seconds Sami Liedes
2012-01-30  1:05 ` Dave Chinner
2012-01-30 22:35   ` Sami Liedes
     [not found]   ` <2504_1327964557_4F27218D_2504_92_2_20120130223527.GH10174@sli.dy.fi>
2012-01-31 23:30     ` Sami Liedes
2012-02-02  0:26       ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2012-01-31 10:54 Brian Candler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox