From mboxrd@z Thu Jan  1 00:00:00 1970
From: RW <kvm@tauceti.net>
Subject: Re: I/O Performance Tips
Date: Thu, 09 Dec 2010 09:55:35 +0100
Message-ID: <4D009987.6050705@tauceti.net>
References: <1291882216.21240.124.camel@nick-desktop>
Reply-To: kvm@tauceti.net
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
To: kvm@vger.kernel.org
Return-path: <kvm-owner@vger.kernel.org>
Received: from tauceti.net ([62.245.250.166]:41061 "EHLO www.tauceti.net"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754856Ab0LIJFW (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 9 Dec 2010 04:05:22 -0500
Received: from [127.0.0.1] (unknown [10.0.0.8])
	by www.tauceti.net (Postfix) with ESMTP id DE0D3859D40
	for <kvm@vger.kernel.org>; Thu,  9 Dec 2010 09:55:33 +0100 (CET)
In-Reply-To: <1291882216.21240.124.camel@nick-desktop>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

We've don't use Ubuntu (we use Gentoo/KVM 0.12.5) but we've had a 
similar problem with kernels <2.6.32-r11 (this is Gentoo specific and 
means update -r11 and not release candidate 11) and with the first three 
releases of 2.6.34 especially when NFS was involved. We're currently 
running 2.6.32-r11 and 2.6.32-r20 kernels without this problems. And 
we're using deadline scheduler in host and guests. The VM images are 
stored on a LVM volume. Maybe a kernel update will help (if available of 
course...). We're also running all VMs with "cache=none".

You should also send the qemu-kvm options you're using and version to 
the list. Maybe someone else could help further.

- Robert


On 12/09/10 09:10, Sebastian Nickel - Hetzner Online AG wrote:
> Hello,
> we have got some issues with I/O in our kvm environment. We are using
> kernel version 2.6.32 (Ubuntu 10.04 LTS) to virtualise our hosts and we
> are using ksm, too. Recently we noticed that sometimes the guest systems
> (mainly OpenSuse guest systems) suddenly have a read only filesystem.
> After some inspection we found out that the guest system generates some
> ata errors due to timeouts (mostly in "flush cache" situations). On the
> physical host there are always the same kernel messages when this
> happens:
>
> """
> [1508127.195469] INFO: task kjournald:497 blocked for more than 120
> seconds.
> [1508127.212828] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1508127.246841] kjournald     D 00000000ffffffff     0   497      2
> 0x00000000
> [1508127.246848]  ffff88062128dba0 0000000000000046 0000000000015bc0
> 0000000000015bc0
> [1508127.246855]  ffff880621089ab0 ffff88062128dfd8 0000000000015bc0
> ffff8806210896f0
> [1508127.246862]  0000000000015bc0 ffff88062128dfd8 0000000000015bc0
> ffff880621089ab0
> [1508127.246868] Call Trace:
> [1508127.246880]  [<ffffffff8116e500>] ? sync_buffer+0x0/0x50
> [1508127.246889]  [<ffffffff81557d87>] io_schedule+0x47/0x70
> [1508127.246893]  [<ffffffff8116e545>] sync_buffer+0x45/0x50
> [1508127.246897]  [<ffffffff8155825a>] __wait_on_bit_lock+0x5a/0xc0
> [1508127.246901]  [<ffffffff8116e500>] ? sync_buffer+0x0/0x50
> [1508127.246905]  [<ffffffff81558338>] out_of_line_wait_on_bit_lock
> +0x78/0x90
> [1508127.246911]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
> [1508127.246915]  [<ffffffff8116e6c6>] __lock_buffer+0x36/0x40
> [1508127.246920]  [<ffffffff81213d11>] journal_submit_data_buffers
> +0x311/0x320
> [1508127.246924]  [<ffffffff81213ff2>] journal_commit_transaction
> +0x2d2/0xe40
> [1508127.246931]  [<ffffffff810397a9>] ? default_spin_lock_flags
> +0x9/0x10
> [1508127.246935]  [<ffffffff81076c7c>] ? lock_timer_base+0x3c/0x70
> [1508127.246939]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
> [1508127.246943]  [<ffffffff81217f0d>] kjournald+0xed/0x250
> [1508127.246947]  [<ffffffff81085090>] ? autoremove_wake_function
> +0x0/0x40
> [1508127.246951]  [<ffffffff81217e20>] ? kjournald+0x0/0x250
> [1508127.246954]  [<ffffffff81084d16>] kthread+0x96/0xa0
> [1508127.246959]  [<ffffffff810141ea>] child_rip+0xa/0x20
> [1508127.246962]  [<ffffffff81084c80>] ? kthread+0x0/0xa0
> [1508127.246966]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
> [1508127.246969] INFO: task flush-251:0:505 blocked for more than 120
> seconds.
> [1508127.264076] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1508127.298343] flush-251:0   D ffffffff810f4370     0   505      2
> 0x00000000
> [1508127.298349]  ffff880621fdba30 0000000000000046 0000000000015bc0
> 0000000000015bc0
> [1508127.298354]  ffff88062108b1a0 ffff880621fdbfd8 0000000000015bc0
> ffff88062108ade0
> [1508127.298358]  0000000000015bc0 ffff880621fdbfd8 0000000000015bc0
> ffff88062108b1a0
> [1508127.298362] Call Trace:
> [1508127.298370]  [<ffffffff810f4370>] ? sync_page+0x0/0x50
> [1508127.298375]  [<ffffffff81557d87>] io_schedule+0x47/0x70
> [1508127.298379]  [<ffffffff810f43ad>] sync_page+0x3d/0x50
> [1508127.298383]  [<ffffffff8155825a>] __wait_on_bit_lock+0x5a/0xc0
> [1508127.298391]  [<ffffffff810f4347>] __lock_page+0x67/0x70
> [1508127.298395]  [<ffffffff810850d0>] ? wake_bit_function+0x0/0x40
> [1508127.298402]  [<ffffffff810f4487>] ? unlock_page+0x27/0x30
> [1508127.298410]  [<ffffffff810fd9dd>] write_cache_pages+0x3bd/0x4d0
> [1508127.298417]  [<ffffffff810fc670>] ? __writepage+0x0/0x40
> [1508127.298425]  [<ffffffff810fdb14>] generic_writepages+0x24/0x30
> [1508127.298432]  [<ffffffff810fdb55>] do_writepages+0x35/0x40
> [1508127.298439]  [<ffffffff811668a6>] writeback_single_inode+0xf6/0x3d0
> [1508127.298449]  [<ffffffff812b81d6>] ? rb_erase+0xd6/0x160
> [1508127.298455]  [<ffffffff8116750e>] writeback_inodes_wb+0x40e/0x5e0
> [1508127.298462]  [<ffffffff811677ea>] wb_writeback+0x10a/0x1d0
> [1508127.298469]  [<ffffffff81077719>] ? try_to_del_timer_sync+0x79/0xd0
> [1508127.298477]  [<ffffffff8155803d>] ? schedule_timeout+0x19d/0x300
> [1508127.298485]  [<ffffffff81167b1c>] wb_do_writeback+0x18c/0x1a0
> [1508127.298493]  [<ffffffff81167b83>] bdi_writeback_task+0x53/0xe0
> [1508127.298503]  [<ffffffff8110f546>] bdi_start_fn+0x86/0x100
> [1508127.298510]  [<ffffffff8110f4c0>] ? bdi_start_fn+0x0/0x100
> [1508127.298518]  [<ffffffff81084d16>] kthread+0x96/0xa0
> [1508127.298526]  [<ffffffff810141ea>] child_rip+0xa/0x20
> [1508127.298533]  [<ffffffff81084c80>] ? kthread+0x0/0xa0
> [1508127.298541]  [<ffffffff810141e0>] ? child_rip+0x0/0x20
> ... some more messages like those before...
> """
> Sometimes the message "INFO: task kvm blocked for more than 120 seconds"
> appears, too. I thought the error happens in cache writeback situations
> so I started to adjust "/proc/sys/vm/dirty_background_ratio" to 5 and
> "/proc/sys/vm/dirty_ratio" to 40. I thought this will write continously
> smaller parts of cached memory to HDD (more often, but smaller chunks).
> This did not help. There are still "readonly" filesystems in the guest
> systems. Does anybody has some tips to regulate I/O on linux systems or
> to stop those "readonly" filesystems?
>
> I tried the "cfq" scheduler and did some ionice (best efforts method,
> all kvm processes in the same class). I thought of cgroups, but I could
> not find any I/O related properties to set.
>
> We are using logical volumes as virtual guest HDDs. The volume group is
> on top of a mdraid device.
>
> Thank you for your help...
>
>
> Best Regards
>
> Sebastian
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html