From mboxrd@z Thu Jan 1 00:00:00 1970 From: Antoine Martin Subject: Re: repeatable hang with loop mount and heavy IO in guest (now in host - not KVM then..) Date: Sun, 23 May 2010 02:33:15 +0700 Message-ID: <4BF8317B.6060804@nagafix.co.uk> References: <4B588E63.4060700@nagafix.co.uk> <4B595A7B.4010404@msgid.tls.msk.ru> <4B59EE4E.6070005@nagafix.co.uk> <4B59F958.8030709@nagafix.co.uk> <4B69CE50.5060901@nagafix.co.uk> <4B880731.5090500@nagafix.co.uk> <4BF654A1.6080700@nagafix.co.uk> <20100522181019.GA17092@psychosis.jim.sh> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Michael Tokarev , kvm@vger.kernel.org To: Jim Paris Return-path: Received: from mamba.nagafix.co.uk ([194.145.196.68]:39973 "EHLO mail.nagafix.co.uk" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755756Ab0EVTd3 (ORCPT ); Sat, 22 May 2010 15:33:29 -0400 In-Reply-To: <20100522181019.GA17092@psychosis.jim.sh> Sender: kvm-owner@vger.kernel.org List-ID: On 05/23/2010 01:10 AM, Jim Paris wrote: > Antoine Martin wrote: > >> On 02/27/2010 12:38 AM, Antoine Martin wrote: >> >>>>> 1 0 0 98 0 1| 0 0 | 66B 354B| 0 0 | 30 11 >>>>> 1 1 0 98 0 0| 0 0 | 66B 354B| 0 0 | 29 11 >>>>> >>>> > From that point onwards, nothing will happen. >>>> >>>>> The host has disk IO to spare... So what is it waiting for?? >>>>> >>>> Moved to an AMD64 host. No effect. >>>> Disabled swap before running the test. No effect. >>>> Moved the guest to a fully up-to-date FC12 server >>>> (2.6.31.6-145.fc12.x86_64), no effect. >>>> >>> I have narrowed it down to the guest's filesystem used for backing >>> the disk image which is loop mounted: although it was not >>> completely full (and had enough inodes), freeing some space on it >>> prevents the system from misbehaving. >>> >>> FYI: the disk image was clean and was fscked before each test. kvm >>> had been updated to 0.12.3 >>> The weird thing is that the same filesystem works fine (no system >>> hang) if used directly from the host, it is only misbehaving via >>> kvm... >>> >>> So I am not dismissing the possibility that kvm may be at least >>> partly to blame, or that it is exposing a filesystem bug (race?) >>> not normally encountered. >>> (I have backed up the full 32GB virtual disk in case someone >>> suggests further investigation) >>> >> Well, well. I've just hit the exact same bug on another *host* (not >> a guest), running stock Fedora 12. >> So this isn't a kvm bug after all. Definitely a loop+ext(4?) bug. >> Looks like you need a pretty big loop mounted partition to trigger >> it. (bigger than available ram?) >> >> This is what triggered it on a quad amd system with 8Gb of ram, >> software raid-1 partition: >> mount -o loop 2GB.dd source >> dd if=/dev/zero of=8GB.dd bs=1048576 count=8192 >> mkfs.ext4 -f 8GB.dd >> mount -o loop 8GB.dd dest >> rsync -rplogtD source/* dest/ >> umount source >> umount dest >> ^ this is where it hangs, I then tried to issue a 'sync' from >> another terminal, which also hung. >> It took more than 10 minutes to settle itself, during that time one >> CPU was stuck in wait state. >> > This sounds like: > https://bugzilla.kernel.org/show_bug.cgi?id=15906 > https://bugzilla.redhat.com/show_bug.cgi?id=588930 > Indeed it does. Let's hope this makes it to -stable fast. Antoine