* xfs warnings on kernel 3.2.15
@ 2012-08-09 11:01 Michael Monnerie
2012-08-09 22:15 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Michael Monnerie @ 2012-08-09 11:01 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 15554 bytes --]
I've noticed one of our postgresql servers having a high load, so I looked into it.
Kernel = 3.2.15, disk space is free:
# df
/dev/sda2 6281216 457888 5823328 8% /disks/work
/dev/sdc1 52402180 10135176 42267004 20% /disks/pg-db
And I got these warnings since Jul 12 until today:
Jul 12 00:31:41 db24 kernel: XFS (sda2): xlog_space_left: head behind tail
Jul 12 00:31:41 db24 kernel: tail_cycle = 4, tail_bytes = 248320
Jul 12 00:31:41 db24 kernel: GH cycle = 4, GH bytes = 248312
[snip]
Aug 8 03:45:48 db24 kernel: XFS (sdc1): xlog_space_left: head behind tail
Aug 8 03:45:48 db24 kernel: tail_cycle = 607, tail_bytes = 21702144
Aug 8 03:45:48 db24 kernel: GH cycle = 607, GH bytes = 21701856
And this morning it changed into warnings:
Aug 9 10:34:57 db24 kernel: INFO: task vmtoolsd:1426 blocked for more than 120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:34:57 db24 kernel: ffff880059e0fc78 0000000000000082 ffff880059af3ba0 0000000000000001
Aug 9 10:34:57 db24 kernel: ffff880059e0e010 ffff880058ea5cc0 0000000000010940 ffff880059e0ffd8
Aug 9 10:34:57 db24 kernel: ffff880059e0ffd8 0000000000010940 ffff880056264f80 ffff880058ea5cc0
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8136fb1e>] rwsem_down_failed_common+0xe4/0x116
Aug 9 10:34:57 db24 kernel: [<ffffffff8136fb63>] rwsem_down_write_failed+0x13/0x15
Aug 9 10:34:57 db24 kernel: [<ffffffff811c4503>] call_rwsem_down_write_failed+0x13/0x20
Aug 9 10:34:57 db24 kernel: [<ffffffff8136f35c>] ? down_write+0x20/0x22
Aug 9 10:34:57 db24 kernel: [<ffffffff810d1e44>] thaw_super+0x1c/0xae
Aug 9 10:34:57 db24 kernel: [<ffffffff810f6b5e>] thaw_bdev+0x45/0x74
Aug 9 10:34:57 db24 kernel: [<ffffffffa002019c>] VmSyncThawDevices+0x56/0x9e [vmsync]
Aug 9 10:34:57 db24 kernel: [<ffffffffa0020542>] VmSyncUnlockedIoctl+0x1b4/0x208 [vmsync]
Aug 9 10:34:57 db24 kernel: [<ffffffff81115dd1>] proc_reg_unlocked_ioctl+0x82/0xa1
Aug 9 10:34:57 db24 kernel: [<ffffffff8102c05d>] ? pick_next_task_fair+0xfc/0x10c
Aug 9 10:34:57 db24 kernel: [<ffffffffa002038e>] ? VmSyncAddPath+0xff/0xff [vmsync]
Aug 9 10:34:57 db24 kernel: [<ffffffff810dcf9f>] do_vfs_ioctl+0x3f6/0x47b
Aug 9 10:34:57 db24 kernel: [<ffffffff810dd066>] sys_ioctl+0x42/0x65
Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:34:57 db24 kernel: INFO: task postmaster:3112 blocked for more than 120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:34:57 db24 kernel: ffff8800563f5d18 0000000000000082 0000000000000000 0000000000000000
Aug 9 10:34:57 db24 kernel: ffff8800563f4010 ffff8800590d93e0 0000000000010940 ffff8800563f5fd8
Aug 9 10:34:57 db24 kernel: ffff8800563f5fd8 0000000000010940 ffff88005ab9c240 ffff8800590d93e0
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:34:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106
Aug 9 10:34:57 db24 kernel: [<ffffffff8100875a>] ? read_tsc+0x9/0x1b
Aug 9 10:34:57 db24 kernel: [<ffffffff810528b0>] ? timekeeping_get_ns+0x15/0x34
Aug 9 10:34:57 db24 kernel: [<ffffffff81053370>] ? ktime_get_ts+0x80/0x89
Aug 9 10:34:57 db24 kernel: [<ffffffff810dda04>] ? poll_select_copy_remaining+0xc6/0xea
Aug 9 10:34:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105
Aug 9 10:34:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69
Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:34:57 db24 kernel: INFO: task postmaster:3115 blocked for more than 120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:34:57 db24 kernel: ffff8800563edb38 0000000000000082 0000000000000000 ffff880000000000
Aug 9 10:34:57 db24 kernel: ffff8800563ec010 ffff8800590dcf80 0000000000010940 ffff8800563edfd8
Aug 9 10:34:57 db24 kernel: ffff8800563edfd8 0000000000010940 ffff88005ab9c240 ffff8800590dcf80
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:34:57 db24 kernel: [<ffffffff8115b99a>] xfs_create+0x18c/0x4c7
Aug 9 10:34:57 db24 kernel: [<ffffffff81150b56>] ? xfs_ilock+0x59/0x5e
Aug 9 10:34:57 db24 kernel: [<ffffffff810df065>] ? spin_lock+0x9/0xb
Aug 9 10:34:57 db24 kernel: [<ffffffff810df877>] ? _d_rehash+0x1f/0x22
Aug 9 10:34:57 db24 kernel: [<ffffffff811541d1>] xfs_vn_mknod+0xca/0x15e
Aug 9 10:34:57 db24 kernel: [<ffffffff81154280>] xfs_vn_create+0xb/0xd
Aug 9 10:34:57 db24 kernel: [<ffffffff810da7ad>] vfs_create+0x4e/0x6e
Aug 9 10:34:57 db24 kernel: [<ffffffff810db68c>] do_last+0x201/0x502
Aug 9 10:34:57 db24 kernel: [<ffffffff810dba61>] path_openat+0xd4/0x31a
Aug 9 10:34:57 db24 kernel: [<ffffffff810c13f4>] ? kmem_cache_free+0x15/0x6e
Aug 9 10:34:57 db24 kernel: [<ffffffff810dbd96>] do_filp_open+0x38/0x86
Aug 9 10:34:57 db24 kernel: [<ffffffff810e507a>] ? alloc_fd+0x116/0x129
Aug 9 10:34:57 db24 kernel: [<ffffffff810ce147>] do_sys_open+0x6e/0x107
Aug 9 10:34:57 db24 kernel: [<ffffffff810ce209>] sys_open+0x1b/0x1d
Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:34:57 db24 kernel: INFO: task flush-259:26214:19297 blocked for more than 120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:34:57 db24 kernel: ffff880052039b10 0000000000000046 ffff880052039aa0 ffff880047536078
Aug 9 10:34:57 db24 kernel: ffff880052038010 ffff880056263500 0000000000010940 ffff880052039fd8
Aug 9 10:34:57 db24 kernel: ffff880052039fd8 0000000000010940 ffff880059620000 ffff880056263500
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c605>] ? spin_unlock_irqrestore+0x9/0xb
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:34:57 db24 kernel: [<ffffffff8109180a>] ? generic_writepages+0x4b/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8115906b>] xfs_log_dirty_inode+0x2b/0x9f
Aug 9 10:34:57 db24 kernel: [<ffffffff8115716d>] xfs_fs_write_inode+0x47/0x107
Aug 9 10:34:57 db24 kernel: [<ffffffff810ec01b>] writeback_single_inode+0x1b8/0x2ef
Aug 9 10:34:57 db24 kernel: [<ffffffff810ec705>] writeback_sb_inodes+0x168/0x201
Aug 9 10:34:57 db24 kernel: [<ffffffff810eccfe>] __writeback_inodes_wb+0x6d/0xab
Aug 9 10:34:57 db24 kernel: [<ffffffff810ecea6>] wb_writeback+0xf1/0x18b
Aug 9 10:34:57 db24 kernel: [<ffffffff8101f406>] ? default_spin_lock_flags+0x9/0xf
Aug 9 10:34:57 db24 kernel: [<ffffffff8136fbc4>] ? _raw_spin_lock_irqsave+0x22/0x2b
Aug 9 10:34:57 db24 kernel: [<ffffffff810ed07f>] wb_do_writeback+0x13f/0x1a0
Aug 9 10:34:57 db24 kernel: [<ffffffff8103fdbe>] ? del_timer+0x81/0x81
Aug 9 10:34:57 db24 kernel: [<ffffffff810ed170>] bdi_writeback_thread+0x90/0x164
Aug 9 10:34:57 db24 kernel: [<ffffffff810ed0e0>] ? wb_do_writeback+0x1a0/0x1a0
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c17c>] kthread+0x7d/0x85
Aug 9 10:34:57 db24 kernel: [<ffffffff81371eb4>] kernel_thread_helper+0x4/0x10
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c0ff>] ? kthread_worker_fn+0x153/0x153
Aug 9 10:34:57 db24 kernel: [<ffffffff81371eb0>] ? gs_change+0x13/0x13
Aug 9 10:34:57 db24 kernel: INFO: task postmaster:26077 blocked for more than 120 seconds.
Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:34:57 db24 kernel: ffff8800257e7d18 0000000000000086 ffffea00014d6680 0000000000000000
Aug 9 10:34:57 db24 kernel: ffff8800257e6010 ffff880056261a80 0000000000010940 ffff8800257e7fd8
Aug 9 10:34:57 db24 kernel: ffff8800257e7fd8 0000000000010940 ffffffff8160d020 ffff880056261a80
Aug 9 10:34:57 db24 kernel: Call Trace:
Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:34:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224
Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:34:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106
Aug 9 10:34:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105
Aug 9 10:34:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69
Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:36:57 db24 kernel: INFO: task vmtoolsd:1426 blocked for more than 120 seconds.
Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:36:57 db24 kernel: ffff880059e0fc78 0000000000000082 ffff880059af3ba0 0000000000000001
Aug 9 10:36:57 db24 kernel: ffff880059e0e010 ffff880058ea5cc0 0000000000010940 ffff880059e0ffd8
Aug 9 10:36:57 db24 kernel: ffff880059e0ffd8 0000000000010940 ffff880056264f80 ffff880058ea5cc0
Aug 9 10:36:57 db24 kernel: Call Trace:
Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:36:57 db24 kernel: [<ffffffff8136fb1e>] rwsem_down_failed_common+0xe4/0x116
Aug 9 10:36:57 db24 kernel: [<ffffffff8136fb63>] rwsem_down_write_failed+0x13/0x15
Aug 9 10:36:57 db24 kernel: [<ffffffff811c4503>] call_rwsem_down_write_failed+0x13/0x20
Aug 9 10:36:57 db24 kernel: [<ffffffff8136f35c>] ? down_write+0x20/0x22
Aug 9 10:36:57 db24 kernel: [<ffffffff810d1e44>] thaw_super+0x1c/0xae
Aug 9 10:36:57 db24 kernel: [<ffffffff810f6b5e>] thaw_bdev+0x45/0x74
Aug 9 10:36:57 db24 kernel: [<ffffffffa002019c>] VmSyncThawDevices+0x56/0x9e [vmsync]
Aug 9 10:36:57 db24 kernel: [<ffffffffa0020542>] VmSyncUnlockedIoctl+0x1b4/0x208 [vmsync]
Aug 9 10:36:57 db24 kernel: [<ffffffff81115dd1>] proc_reg_unlocked_ioctl+0x82/0xa1
Aug 9 10:36:57 db24 kernel: [<ffffffff8102c05d>] ? pick_next_task_fair+0xfc/0x10c
Aug 9 10:36:57 db24 kernel: [<ffffffffa002038e>] ? VmSyncAddPath+0xff/0xff [vmsync]
Aug 9 10:36:57 db24 kernel: [<ffffffff810dcf9f>] do_vfs_ioctl+0x3f6/0x47b
Aug 9 10:36:57 db24 kernel: [<ffffffff810dd066>] sys_ioctl+0x42/0x65
Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:36:57 db24 kernel: INFO: task postmaster:3112 blocked for more than 120 seconds.
Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:36:57 db24 kernel: ffff8800563f5d18 0000000000000082 0000000000000000 0000000000000000
Aug 9 10:36:57 db24 kernel: ffff8800563f4010 ffff8800590d93e0 0000000000010940 ffff8800563f5fd8
Aug 9 10:36:57 db24 kernel: ffff8800563f5fd8 0000000000010940 ffff88005ab9c240 ffff8800590d93e0
Aug 9 10:36:57 db24 kernel: Call Trace:
Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:36:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224
Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:36:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106
Aug 9 10:36:57 db24 kernel: [<ffffffff8100875a>] ? read_tsc+0x9/0x1b
Aug 9 10:36:57 db24 kernel: [<ffffffff810528b0>] ? timekeeping_get_ns+0x15/0x34
Aug 9 10:36:57 db24 kernel: [<ffffffff81053370>] ? ktime_get_ts+0x80/0x89
Aug 9 10:36:57 db24 kernel: [<ffffffff810dda04>] ? poll_select_copy_remaining+0xc6/0xea
Aug 9 10:36:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105
Aug 9 10:36:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69
Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:36:57 db24 kernel: INFO: task postmaster:3115 blocked for more than 120 seconds.
Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:36:57 db24 kernel: ffff8800563edb38 0000000000000082 0000000000000000 ffff880000000000
Aug 9 10:36:57 db24 kernel: ffff8800563ec010 ffff8800590dcf80 0000000000010940 ffff8800563edfd8
Aug 9 10:36:57 db24 kernel: ffff8800563edfd8 0000000000010940 ffff88005ab9c240 ffff8800590dcf80
Aug 9 10:36:57 db24 kernel: Call Trace:
Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:36:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7
Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:36:57 db24 kernel: [<ffffffff8115b99a>] xfs_create+0x18c/0x4c7
Aug 9 10:36:57 db24 kernel: [<ffffffff81150b56>] ? xfs_ilock+0x59/0x5e
Aug 9 10:36:57 db24 kernel: [<ffffffff810df065>] ? spin_lock+0x9/0xb
Aug 9 10:36:57 db24 kernel: [<ffffffff810df877>] ? _d_rehash+0x1f/0x22
Aug 9 10:36:57 db24 kernel: [<ffffffff811541d1>] xfs_vn_mknod+0xca/0x15e
Aug 9 10:36:57 db24 kernel: [<ffffffff81154280>] xfs_vn_create+0xb/0xd
Aug 9 10:36:57 db24 kernel: [<ffffffff810da7ad>] vfs_create+0x4e/0x6e
Aug 9 10:36:57 db24 kernel: [<ffffffff810db68c>] do_last+0x201/0x502
Aug 9 10:36:57 db24 kernel: [<ffffffff810dba61>] path_openat+0xd4/0x31a
Aug 9 10:36:57 db24 kernel: [<ffffffff810c13f4>] ? kmem_cache_free+0x15/0x6e
Aug 9 10:36:57 db24 kernel: [<ffffffff810dbd96>] do_filp_open+0x38/0x86
Aug 9 10:36:57 db24 kernel: [<ffffffff810e507a>] ? alloc_fd+0x116/0x129
Aug 9 10:36:57 db24 kernel: [<ffffffff810ce147>] do_sys_open+0x6e/0x107
Aug 9 10:36:57 db24 kernel: [<ffffffff810ce209>] sys_open+0x1b/0x1d
Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
Aug 9 10:36:57 db24 kernel: INFO: task postmaster:13762 blocked for more than 120 seconds.
Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 9 10:36:57 db24 kernel: ffff880029025d18 0000000000000082 ffff880000000000 0000000000000000
Aug 9 10:36:57 db24 kernel: ffff880029024010 ffff880056260d40 0000000000010940 ffff880029025fd8
Aug 9 10:36:57 db24 kernel: ffff880029025fd8 0000000000010940 ffffffff8160d020 ffff880056260d40
Aug 9 10:36:57 db24 kernel: Call Trace:
Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
Aug 9 10:36:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224
Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
Aug 9 10:36:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106
Aug 9 10:36:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105
Aug 9 10:36:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69
Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
I then tried
# echo 3 >/proc/sys/vm/drop_caches
but the system hung completely.
This is a VMware VM, and I have no idea how to send SYS-T or similar keys to it.
I had to reset it (reboot hung), and everything worked again. This VM is now on kernel 3.5
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: xfs warnings on kernel 3.2.15
2012-08-09 11:01 xfs warnings on kernel 3.2.15 Michael Monnerie
@ 2012-08-09 22:15 ` Dave Chinner
2012-08-11 10:50 ` Michael Monnerie
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2012-08-09 22:15 UTC (permalink / raw)
To: Michael Monnerie; +Cc: xfs
On Thu, Aug 09, 2012 at 01:01:01PM +0200, Michael Monnerie wrote:
> I've noticed one of our postgresql servers having a high load, so I looked into it.
> Kernel = 3.2.15, disk space is free:
>
> # df
> /dev/sda2 6281216 457888 5823328 8% /disks/work
> /dev/sdc1 52402180 10135176 42267004 20% /disks/pg-db
>
> And I got these warnings since Jul 12 until today:
>
> Jul 12 00:31:41 db24 kernel: XFS (sda2): xlog_space_left: head behind tail
> Jul 12 00:31:41 db24 kernel: tail_cycle = 4, tail_bytes = 248320
> Jul 12 00:31:41 db24 kernel: GH cycle = 4, GH bytes = 248312
> [snip]
> Aug 8 03:45:48 db24 kernel: XFS (sdc1): xlog_space_left: head behind tail
> Aug 8 03:45:48 db24 kernel: tail_cycle = 607, tail_bytes = 21702144
> Aug 8 03:45:48 db24 kernel: GH cycle = 607, GH bytes = 21701856
>
> And this morning it changed into warnings:
>
> Aug 9 10:34:57 db24 kernel: INFO: task vmtoolsd:1426 blocked for more than 120 seconds.
> Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 9 10:34:57 db24 kernel: ffff880059e0fc78 0000000000000082 ffff880059af3ba0 0000000000000001
> Aug 9 10:34:57 db24 kernel: ffff880059e0e010 ffff880058ea5cc0 0000000000010940 ffff880059e0ffd8
> Aug 9 10:34:57 db24 kernel: ffff880059e0ffd8 0000000000010940 ffff880056264f80 ffff880058ea5cc0
> Aug 9 10:34:57 db24 kernel: Call Trace:
> Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
> Aug 9 10:34:57 db24 kernel: [<ffffffff8136fb1e>] rwsem_down_failed_common+0xe4/0x116
> Aug 9 10:34:57 db24 kernel: [<ffffffff8136fb63>] rwsem_down_write_failed+0x13/0x15
> Aug 9 10:34:57 db24 kernel: [<ffffffff811c4503>] call_rwsem_down_write_failed+0x13/0x20
> Aug 9 10:34:57 db24 kernel: [<ffffffff8136f35c>] ? down_write+0x20/0x22
> Aug 9 10:34:57 db24 kernel: [<ffffffff810d1e44>] thaw_super+0x1c/0xae
> Aug 9 10:34:57 db24 kernel: [<ffffffff810f6b5e>] thaw_bdev+0x45/0x74
> Aug 9 10:34:57 db24 kernel: [<ffffffffa002019c>] VmSyncThawDevices+0x56/0x9e [vmsync]
> Aug 9 10:34:57 db24 kernel: [<ffffffffa0020542>] VmSyncUnlockedIoctl+0x1b4/0x208 [vmsync]
> Aug 9 10:34:57 db24 kernel: [<ffffffff81115dd1>] proc_reg_unlocked_ioctl+0x82/0xa1
> Aug 9 10:34:57 db24 kernel: [<ffffffff8102c05d>] ? pick_next_task_fair+0xfc/0x10c
> Aug 9 10:34:57 db24 kernel: [<ffffffffa002038e>] ? VmSyncAddPath+0xff/0xff [vmsync]
> Aug 9 10:34:57 db24 kernel: [<ffffffff810dcf9f>] do_vfs_ioctl+0x3f6/0x47b
> Aug 9 10:34:57 db24 kernel: [<ffffffff810dd066>] sys_ioctl+0x42/0x65
> Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b
That's a filesystem thaw that is blocked that has com through, by
the looks of it, the VMWare kernel module. That will be completely
unrelated to the above log messages because.....
> Aug 9 10:34:57 db24 kernel: INFO: task flush-259:26214:19297 blocked for more than 120 seconds.
> Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 9 10:34:57 db24 kernel: ffff880052039b10 0000000000000046 ffff880052039aa0 ffff880047536078
> Aug 9 10:34:57 db24 kernel: ffff880052038010 ffff880056263500 0000000000010940 ffff880052039fd8
> Aug 9 10:34:57 db24 kernel: ffff880052039fd8 0000000000010940 ffff880059620000 ffff880056263500
> Aug 9 10:34:57 db24 kernel: Call Trace:
> Aug 9 10:34:57 db24 kernel: [<ffffffff8104c605>] ? spin_unlock_irqrestore+0x9/0xb
> Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57
> Aug 9 10:34:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7
> Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25
> Aug 9 10:34:57 db24 kernel: [<ffffffff8109180a>] ? generic_writepages+0x4b/0x57
> Aug 9 10:34:57 db24 kernel: [<ffffffff8115906b>] xfs_log_dirty_inode+0x2b/0x9f
> Aug 9 10:34:57 db24 kernel: [<ffffffff8115716d>] xfs_fs_write_inode+0x47/0x107
> Aug 9 10:34:57 db24 kernel: [<ffffffff810ec01b>] writeback_single_inode+0x1b8/0x2ef
> Aug 9 10:34:57 db24 kernel: [<ffffffff810ec705>] writeback_sb_inodes+0x168/0x201
> Aug 9 10:34:57 db24 kernel: [<ffffffff810eccfe>] __writeback_inodes_wb+0x6d/0xab
> Aug 9 10:34:57 db24 kernel: [<ffffffff810ecea6>] wb_writeback+0xf1/0x18b
> Aug 9 10:34:57 db24 kernel: [<ffffffff8101f406>] ? default_spin_lock_flags+0x9/0xf
> Aug 9 10:34:57 db24 kernel: [<ffffffff8136fbc4>] ? _raw_spin_lock_irqsave+0x22/0x2b
> Aug 9 10:34:57 db24 kernel: [<ffffffff810ed07f>] wb_do_writeback+0x13f/0x1a0
> Aug 9 10:34:57 db24 kernel: [<ffffffff8103fdbe>] ? del_timer+0x81/0x81
> Aug 9 10:34:57 db24 kernel: [<ffffffff810ed170>] bdi_writeback_thread+0x90/0x164
> Aug 9 10:34:57 db24 kernel: [<ffffffff810ed0e0>] ? wb_do_writeback+0x1a0/0x1a0
> Aug 9 10:34:57 db24 kernel: [<ffffffff8104c17c>] kthread+0x7d/0x85
> Aug 9 10:34:57 db24 kernel: [<ffffffff81371eb4>] kernel_thread_helper+0x4/0x10
> Aug 9 10:34:57 db24 kernel: [<ffffffff8104c0ff>] ? kthread_worker_fn+0x153/0x153
> Aug 9 10:34:57 db24 kernel: [<ffffffff81371eb0>] ? gs_change+0x13/0x13
There's been a race in the the freeze and the writeback thread is
trying to write dirty inodes (which there shouldn't be any) and is
blocked waiting for the thaw to occur. Only problem is, the flusher
thread holds the s-umount lock in read, and the thaw is waiting for
it to be released. ABBA deadlock...
> This is a VMware VM, and I have no idea how to send SYS-T or
> similar keys to it. I had to reset it (reboot hung), and
> everything worked again. This VM is now on kernel 3.5
The freeze/thaw deadlock is not fixed in 3.5 and i think it missed
3.6 as well so you're going to be waiting until 3.7 for this to be
fixed, I suspect....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: xfs warnings on kernel 3.2.15
2012-08-09 22:15 ` Dave Chinner
@ 2012-08-11 10:50 ` Michael Monnerie
2012-08-11 22:57 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Michael Monnerie @ 2012-08-11 10:50 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 757 bytes --]
Am Freitag, 10. August 2012, 08:15:18 schrieb Dave Chinner:
> The freeze/thaw deadlock is not fixed in 3.5 and i think it missed
> 3.6 as well so you're going to be waiting until 3.7 for this to be
> fixed, I suspect....
OK, at least you guys have a plan for a fix. I'm happy to get that fixed
soon. Thank you.
But when I think about it: Isn't that worth a backport fix? VMware isn't
a seldom used piece of software, and we use NetApp Storage with
snapshots, which is also used a lot worldwide. I can imagine others have
the same problem, or is it just me?
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: xfs warnings on kernel 3.2.15
2012-08-11 10:50 ` Michael Monnerie
@ 2012-08-11 22:57 ` Dave Chinner
2012-08-13 15:47 ` Michael Monnerie
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2012-08-11 22:57 UTC (permalink / raw)
To: Michael Monnerie; +Cc: xfs
On Sat, Aug 11, 2012 at 12:50:45PM +0200, Michael Monnerie wrote:
> Am Freitag, 10. August 2012, 08:15:18 schrieb Dave Chinner:
> > The freeze/thaw deadlock is not fixed in 3.5 and i think it missed
> > 3.6 as well so you're going to be waiting until 3.7 for this to be
> > fixed, I suspect....
>
> OK, at least you guys have a plan for a fix. I'm happy to get that fixed
> soon. Thank you.
It got included late in the 3.6-rc1 cycle, so it will be in 3.6.
> But when I think about it: Isn't that worth a backport fix? VMware isn't
> a seldom used piece of software, and we use NetApp Storage with
> snapshots, which is also used a lot worldwide. I can imagine others have
> the same problem, or is it just me?
It's a problem that has been around for, well, since freeze was
first introduced many, many years ago. And the complexity of the fix
(20+ patch series, IIRC) makes it a difficult and time consuming fix
that requires a lot of testing - it touches the VFS and has changes
all through different filesystems, including XFS. I just can't see
anyone other than enterprise distro vendors spending that much
effort to back port such an intrusive fix....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: xfs warnings on kernel 3.2.15
2012-08-11 22:57 ` Dave Chinner
@ 2012-08-13 15:47 ` Michael Monnerie
0 siblings, 0 replies; 5+ messages in thread
From: Michael Monnerie @ 2012-08-13 15:47 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 925 bytes --]
Am Sonntag, 12. August 2012, 08:57:42 schrieb Dave Chinner:
> It's a problem that has been around for, well, since freeze was
> first introduced many, many years ago. And the complexity of the fix
> (20+ patch series, IIRC) makes it a difficult and time consuming fix
> that requires a lot of testing - it touches the VFS and has changes
> all through different filesystems, including XFS. I just can't see
> anyone other than enterprise distro vendors spending that much
> effort to back port such an intrusive fix....
Je compris. As it's such a longstandig bug, I guess it won't happen too
often worldwide. Or at least doesn't get reportet. I've had it twice in
several months, and I'll run 3.6 ASAP. Thanks for investigating!
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services: Protéger
http://proteger.at [gesprochen: Prot-e-schee]
Tel: +43 660 / 415 6531
[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-08-13 15:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-09 11:01 xfs warnings on kernel 3.2.15 Michael Monnerie
2012-08-09 22:15 ` Dave Chinner
2012-08-11 10:50 ` Michael Monnerie
2012-08-11 22:57 ` Dave Chinner
2012-08-13 15:47 ` Michael Monnerie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox