public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* I/O Performance Tips
@ 2010-12-09  8:10 Sebastian Nickel - Hetzner Online AG
  2010-12-09  8:55 ` RW
  2010-12-09 10:30 ` Stefan Hajnoczi
  0 siblings, 2 replies; 5+ messages in thread
From: Sebastian Nickel - Hetzner Online AG @ 2010-12-09  8:10 UTC (permalink / raw)
  To: kvm

Hello,
we have got some issues with I/O in our kvm environment. We are using
kernel version 2.6.32 (Ubuntu 10.04 LTS) to virtualise our hosts and we
are using ksm, too. Recently we noticed that sometimes the guest systems
(mainly OpenSuse guest systems) suddenly have a read only filesystem.
After some inspection we found out that the guest system generates some
ata errors due to timeouts (mostly in "flush cache" situations). On the
physical host there are always the same kernel messages when this
happens:

"""
[1508127.195469] INFO: task kjournald:497 blocked for more than 120
seconds.
[1508127.212828] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1508127.246841] kjournald     D 00000000ffffffff     0   497      2
0x00000000
[1508127.246848]  ffff88062128dba0 0000000000000046 0000000000015bc0
0000000000015bc0
[1508127.246855]  ffff880621089ab0 ffff88062128dfd8 0000000000015bc0
ffff8806210896f0
[1508127.246862]  0000000000015bc0 ffff88062128dfd8 0000000000015bc0
ffff880621089ab0
[1508127.246868] Call Trace:
[1508127.246880]  [<ffffffff8116e500>] ? sync_buffer+0x0/0x50
[1508127.246889]  [<ffffffff81557d87>] io_schedule+0x47/0x70
[1508127.246893]  [<ffffffff8116e545>] sync_buffer+0x45/0x50
[1508127.246897]  [<ffffffff8155825a>] __wait_on_bit_lock+0x5a/0xc0
[1508127.246901]  [<ffffffff8116e500>] ? sync_buffer+0x0/0x50
[1508127.246905]  [<ffffffff81558338>] out_of_line_wait_on_bit_lock
+0x78/0x90
[1508127.246911]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
[1508127.246915]  [<ffffffff8116e6c6>] __lock_buffer+0x36/0x40
[1508127.246920]  [<ffffffff81213d11>] journal_submit_data_buffers
+0x311/0x320
[1508127.246924]  [<ffffffff81213ff2>] journal_commit_transaction
+0x2d2/0xe40
[1508127.246931]  [<ffffffff810397a9>] ? default_spin_lock_flags
+0x9/0x10
[1508127.246935]  [<ffffffff81076c7c>] ? lock_timer_base+0x3c/0x70
[1508127.246939]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
[1508127.246943]  [<ffffffff81217f0d>] kjournald+0xed/0x250
[1508127.246947]  [<ffffffff81085090>] ? autoremove_wake_function
+0x0/0x40
[1508127.246951]  [<ffffffff81217e20>] ? kjournald+0x0/0x250
[1508127.246954]  [<ffffffff81084d16>] kthread+0x96/0xa0
[1508127.246959]  [<ffffffff810141ea>] child_rip+0xa/0x20
[1508127.246962]  [<ffffffff81084c80>] ? kthread+0x0/0xa0
[1508127.246966]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
[1508127.246969] INFO: task flush-251:0:505 blocked for more than 120
seconds.
[1508127.264076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[1508127.298343] flush-251:0   D ffffffff810f4370     0   505      2
0x00000000
[1508127.298349]  ffff880621fdba30 0000000000000046 0000000000015bc0
0000000000015bc0
[1508127.298354]  ffff88062108b1a0 ffff880621fdbfd8 0000000000015bc0
ffff88062108ade0
[1508127.298358]  0000000000015bc0 ffff880621fdbfd8 0000000000015bc0
ffff88062108b1a0
[1508127.298362] Call Trace:
[1508127.298370]  [<ffffffff810f4370>] ? sync_page+0x0/0x50
[1508127.298375]  [<ffffffff81557d87>] io_schedule+0x47/0x70
[1508127.298379]  [<ffffffff810f43ad>] sync_page+0x3d/0x50
[1508127.298383]  [<ffffffff8155825a>] __wait_on_bit_lock+0x5a/0xc0
[1508127.298391]  [<ffffffff810f4347>] __lock_page+0x67/0x70
[1508127.298395]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
[1508127.298402]  [<ffffffff810f4487>] ? unlock_page+0x27/0x30
[1508127.298410]  [<ffffffff810fd9dd>] write_cache_pages+0x3bd/0x4d0
[1508127.298417]  [<ffffffff810fc670>] ? __writepage+0x0/0x40
[1508127.298425]  [<ffffffff810fdb14>] generic_writepages+0x24/0x30
[1508127.298432]  [<ffffffff810fdb55>] do_writepages+0x35/0x40
[1508127.298439]  [<ffffffff811668a6>] writeback_single_inode+0xf6/0x3d0
[1508127.298449]  [<ffffffff812b81d6>] ? rb_erase+0xd6/0x160
[1508127.298455]  [<ffffffff8116750e>] writeback_inodes_wb+0x40e/0x5e0
[1508127.298462]  [<ffffffff811677ea>] wb_writeback+0x10a/0x1d0
[1508127.298469]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
[1508127.298477]  [<ffffffff8155803d>] ? schedule_timeout+0x19d/0x300
[1508127.298485]  [<ffffffff81167b1c>] wb_do_writeback+0x18c/0x1a0
[1508127.298493]  [<ffffffff81167b83>] bdi_writeback_task+0x53/0xe0
[1508127.298503]  [<ffffffff8110f546>] bdi_start_fn+0x86/0x100
[1508127.298510]  [<ffffffff8110f4c0>] ? bdi_start_fn+0x0/0x100
[1508127.298518]  [<ffffffff81084d16>] kthread+0x96/0xa0
[1508127.298526]  [<ffffffff810141ea>] child_rip+0xa/0x20
[1508127.298533]  [<ffffffff81084c80>] ? kthread+0x0/0xa0
[1508127.298541]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
... some more messages like those before...
"""
Sometimes the message "INFO: task kvm blocked for more than 120 seconds"
appears, too. I thought the error happens in cache writeback situations
so I started to adjust "/proc/sys/vm/dirty_background_ratio" to 5 and
"/proc/sys/vm/dirty_ratio" to 40. I thought this will write continously
smaller parts of cached memory to HDD (more often, but smaller chunks).
This did not help. There are still "readonly" filesystems in the guest
systems. Does anybody has some tips to regulate I/O on linux systems or
to stop those "readonly" filesystems?

I tried the "cfq" scheduler and did some ionice (best efforts method,
all kvm processes in the same class). I thought of cgroups, but I could
not find any I/O related properties to set.

We are using logical volumes as virtual guest HDDs. The volume group is
on top of a mdraid device.

Thank you for your help...


Best Regards

Sebastian


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-12-09 14:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-09  8:10 I/O Performance Tips Sebastian Nickel - Hetzner Online AG
2010-12-09  8:55 ` RW
2010-12-09 10:30 ` Stefan Hajnoczi
2010-12-09 12:52   ` Sebastian Nickel - Hetzner Online AG
2010-12-09 14:51     ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox