From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tokarev Subject: Re: writes to a virtio block device hungs Date: Fri, 26 Sep 2008 12:28:46 +0400 Message-ID: <48DC9D3E.2020106@msgid.tls.msk.ru> References: <48D76286.5090203@msgid.tls.msk.ru> <48D89563.5050502@msgid.tls.msk.ru> <20080925230251.GB22929@dmt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm@vger.kernel.org To: Marcelo Tosatti Return-path: Received: from hobbit.corpit.ru ([81.13.33.150]:22940 "EHLO hobbit.corpit.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752732AbYIZI2s (ORCPT ); Fri, 26 Sep 2008 04:28:48 -0400 In-Reply-To: <20080925230251.GB22929@dmt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: > On Tue, Sep 23, 2008 at 11:06:11AM +0400, Michael Tokarev wrote: >>> (both host and guests are linux machines), I placed >>> one virtual machine into production use, and almost >>> immediately come... issues. Here's how it looks like >>> from the guest: >>> >>> Sep 21 10:35:52 hobbit kernel: INFO: task cleanup:20535 blocked for more than 120 seconds. >>> Sep 21 10:35:52 hobbit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> Sep 21 10:35:52 hobbit kernel: cleanup D 00000000 0 20535 1570 >>> Sep 21 10:35:52 hobbit kernel: f73b39c0 00200086 00000000 00000000 c3a2ba48 00000000 f7022e00 00000000 >>> Sep 21 10:35:52 hobbit kernel: dbc48ed4 f789c000 c0399080 c0157e48 0000000e 00000000 d05e1b80 d05e1ce4 >>> Sep 21 10:35:52 hobbit kernel: 00000002 00200286 c01322f7 d05e1ce4 c0131ef0 dbc48ec8 00200286 c0132486 >>> Sep 21 10:35:52 hobbit kernel: Call Trace: >>> Sep 21 10:35:52 hobbit kernel: [] find_get_pages_tag+0x38/0x80 >>> Sep 21 10:35:52 hobbit kernel: [] lock_timer_base+0x27/0x60 >>> Sep 21 10:35:52 hobbit kernel: [] process_timeout+0x0/0x10 >>> Sep 21 10:35:52 hobbit kernel: [] __mod_timer+0x86/0xa0 >>> Sep 21 10:35:52 hobbit kernel: [] schedule_timeout+0x58/0xb0 >>> Sep 21 10:35:52 hobbit kernel: [] process_timeout+0x0/0x10 >>> Sep 21 10:35:52 hobbit kernel: [] journal_stop+0xa4/0x1b0 [jbd] >>> Sep 21 10:35:52 hobbit kernel: [] journal_start+0x88/0xc0 [jbd] >>> Sep 21 10:35:52 hobbit kernel: [] ext3_write_inode+0x0/0x40 [ext3] >>> Sep 21 10:35:52 hobbit kernel: [] ext3_write_inode+0x0/0x40 [ext3] >>> Sep 21 10:35:52 hobbit kernel: [] __writeback_single_inode+0x282/0x390 >>> Sep 21 10:35:52 hobbit kernel: [] generic_writepages+0x20/0x30 >>> Sep 21 10:35:52 hobbit kernel: [] do_writepages+0x49/0x50 >>> Sep 21 10:35:52 hobbit kernel: [] __filemap_fdatawrite_range+0x71/0x90 >>> Sep 21 10:35:52 hobbit kernel: [] sync_inode+0x21/0x40 >>> Sep 21 10:35:52 hobbit kernel: [] ext3_sync_file+0x9e/0xc0 [ext3] >>> Sep 21 10:35:52 hobbit kernel: [] do_fsync+0x6e/0xb0 >>> Sep 21 10:35:52 hobbit kernel: [] __do_fsync+0x27/0x50 >>> Sep 21 10:35:52 hobbit kernel: [] sysenter_past_esp+0x78/0xb1 >>> Sep 21 10:35:52 hobbit kernel: ======================= >>> >>> It's almost always after fsync, but I guess it's due to the fact that >>> cleanup (from Postfix) process is the one who does that most often. >>> >> I'm waiting for opportunity to install a new kernel with new kvm... >> in a hope still. Meanwhile I installed kvm-75, which did NOT change anything, -- the system still hangs. What really changed things is switching guest to single processor (was 2 before, from 4-core Phenom). > Are you using ext3 in the host as the filesystem to back the guest > image? If so, try writeback instead of ordered mode: On the host there's an MD device (raid1) that hold complete "raw" disk image for the guest. It was in my email: >> The device in question is a virtio block device (vda), which is on top >> op a raid1 device on the host (/dev/md_d5, partitioned). [...] I'm trying to set up a test system to debug the case further, because it's impossible to do that on production machine. /mjt