From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: Re: writes to a virtio block device hungs Date: Tue, 23 Sep 2008 11:06:11 +0400 Message-ID: <48D89563.5050502@msgid.tls.msk.ru> References: <48D76286.5090203@msgid.tls.msk.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit To: kvm@vger.kernel.org Return-path: Received: from hobbit.corpit.ru ([81.13.33.150]:20703 "EHLO hobbit.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752336AbYIWHGO (ORCPT ); Tue, 23 Sep 2008 03:06:14 -0400 Received: from [192.168.1.200] (mjt.ppp.tls.msk.ru [192.168.1.200]) by hobbit.corpit.ru (Postfix) with ESMTP id A90F12B61D for ; Tue, 23 Sep 2008 11:06:11 +0400 (MSD) (envelope-from mjt@tls.msk.ru) In-Reply-To: <48D76286.5090203@msgid.tls.msk.ru> Sender: kvm-owner@vger.kernel.org List-ID: [Replying to my own email...] Michael Tokarev wrote: > Hello! It's my first email to this list.. ;) > > After experimenting for some time with KVM on linux > (both host and guests are linux machines), I placed > one virtual machine into production use, and almost > immediately come... issues. Here's how it looks like > from the guest: > > Sep 21 10:35:52 hobbit kernel: INFO: task cleanup:20535 blocked for more than 120 seconds. > Sep 21 10:35:52 hobbit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Sep 21 10:35:52 hobbit kernel: cleanup D 00000000 0 20535 1570 > Sep 21 10:35:52 hobbit kernel: f73b39c0 00200086 00000000 00000000 c3a2ba48 00000000 f7022e00 00000000 > Sep 21 10:35:52 hobbit kernel: dbc48ed4 f789c000 c0399080 c0157e48 0000000e 00000000 d05e1b80 d05e1ce4 > Sep 21 10:35:52 hobbit kernel: 00000002 00200286 c01322f7 d05e1ce4 c0131ef0 dbc48ec8 00200286 c0132486 > Sep 21 10:35:52 hobbit kernel: Call Trace: > Sep 21 10:35:52 hobbit kernel: [] find_get_pages_tag+0x38/0x80 > Sep 21 10:35:52 hobbit kernel: [] lock_timer_base+0x27/0x60 > Sep 21 10:35:52 hobbit kernel: [] process_timeout+0x0/0x10 > Sep 21 10:35:52 hobbit kernel: [] __mod_timer+0x86/0xa0 > Sep 21 10:35:52 hobbit kernel: [] schedule_timeout+0x58/0xb0 > Sep 21 10:35:52 hobbit kernel: [] process_timeout+0x0/0x10 > Sep 21 10:35:52 hobbit kernel: [] journal_stop+0xa4/0x1b0 [jbd] > Sep 21 10:35:52 hobbit kernel: [] journal_start+0x88/0xc0 [jbd] > Sep 21 10:35:52 hobbit kernel: [] ext3_write_inode+0x0/0x40 [ext3] > Sep 21 10:35:52 hobbit kernel: [] ext3_write_inode+0x0/0x40 [ext3] > Sep 21 10:35:52 hobbit kernel: [] __writeback_single_inode+0x282/0x390 > Sep 21 10:35:52 hobbit kernel: [] generic_writepages+0x20/0x30 > Sep 21 10:35:52 hobbit kernel: [] do_writepages+0x49/0x50 > Sep 21 10:35:52 hobbit kernel: [] __filemap_fdatawrite_range+0x71/0x90 > Sep 21 10:35:52 hobbit kernel: [] sync_inode+0x21/0x40 > Sep 21 10:35:52 hobbit kernel: [] ext3_sync_file+0x9e/0xc0 [ext3] > Sep 21 10:35:52 hobbit kernel: [] do_fsync+0x6e/0xb0 > Sep 21 10:35:52 hobbit kernel: [] __do_fsync+0x27/0x50 > Sep 21 10:35:52 hobbit kernel: [] sysenter_past_esp+0x78/0xb1 > Sep 21 10:35:52 hobbit kernel: ======================= > > It's almost always after fsync, but I guess it's due to the fact that > cleanup (from Postfix) process is the one who does that most often. > > After first such message (after which corresponding process will sleep > forever), no write to the corresponding device will succeed - all will > stall the same way. It looks like kvm just "forgets" about each and > every write, effectively turning the device into a black hole -- but > only writes, reads are all ok. > > Obviously the system will not reboot in that state, only force-reboot > (echo b > /proc/sysrq-trigger), or a "power-off" from the guest will > help. > > The device in question is a virtio block device (vda), which is on top > op a raid1 device on the host (/dev/md_d5, partitioned). The problem > happens after some up-time, from several hours to 2 days, usually under > heavy load. > > The system is Asus M3A-H/HDMI motherboard (AMD 780G/SB700 chipset), > with AMD Phenom 9750 CPU and 8Gb ECC memory. Stock 2.6.26.5 kernel, > with KVM optimizations (KVM_TIME etc) turned on in guest. kvm-72. > > I'm running it with IDE emulation right now, to see if it will change > something or not. With IDE (as opposed to virtio) the situation is exactly the same, switching if=virtio to if=ide didn't change anything at all. > The question is -- should I try with later kvm (kernel and userspace) > first? The thing is that it's production machine so any downtime is > not good... I'm waiting for opportunity to install a new kernel with new kvm... in a hope still. Thank you! /mjt