From mboxrd@z Thu Jan  1 00:00:00 1970
From: Antoine Martin <antoine@nagafix.co.uk>
Subject: Re: repeatable hang with loop mount and heavy IO in guest (now in
 host - not KVM then..)
Date: Sun, 23 May 2010 02:33:15 +0700
Message-ID: <4BF8317B.6060804@nagafix.co.uk>
References: <4B588E63.4060700@nagafix.co.uk> <4B595A7B.4010404@msgid.tls.msk.ru> <4B59EE4E.6070005@nagafix.co.uk> <4B59F958.8030709@nagafix.co.uk> <4B69CE50.5060901@nagafix.co.uk> <4B880731.5090500@nagafix.co.uk> <4BF654A1.6080700@nagafix.co.uk> <20100522181019.GA17092@psychosis.jim.sh>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Michael Tokarev <mjt@tls.msk.ru>, kvm@vger.kernel.org
To: Jim Paris <jim@jtan.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mamba.nagafix.co.uk ([194.145.196.68]:39973 "EHLO
	mail.nagafix.co.uk" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
	with ESMTP id S1755756Ab0EVTd3 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sat, 22 May 2010 15:33:29 -0400
In-Reply-To: <20100522181019.GA17092@psychosis.jim.sh>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 05/23/2010 01:10 AM, Jim Paris wrote:
> Antoine Martin wrote:
>    
>> On 02/27/2010 12:38 AM, Antoine Martin wrote:
>>      
>>>>>   1   0   0  98   0   1|   0     0 |  66B  354B|   0     0 |  30    11
>>>>>   1   1   0  98   0   0|   0     0 |  66B  354B|   0     0 |  29    11
>>>>>            
>>>> > From that point onwards, nothing will happen.
>>>>          
>>>>> The host has disk IO to spare... So what is it waiting for??
>>>>>            
>>>> Moved to an AMD64 host. No effect.
>>>> Disabled swap before running the test. No effect.
>>>> Moved the guest to a fully up-to-date FC12 server
>>>> (2.6.31.6-145.fc12.x86_64), no effect.
>>>>          
>>> I have narrowed it down to the guest's filesystem used for backing
>>> the disk image which is loop mounted: although it was not
>>> completely full (and had enough inodes), freeing some space on it
>>> prevents the system from misbehaving.
>>>
>>> FYI: the disk image was clean and was fscked before each test. kvm
>>> had been updated to 0.12.3
>>> The weird thing is that the same filesystem works fine (no system
>>> hang) if used directly from the host, it is only misbehaving via
>>> kvm...
>>>
>>> So I am not dismissing the possibility that kvm may be at least
>>> partly to blame, or that it is exposing a filesystem bug (race?)
>>> not normally encountered.
>>> (I have backed up the full 32GB virtual disk in case someone
>>> suggests further investigation)
>>>        
>> Well, well. I've just hit the exact same bug on another *host* (not
>> a guest), running stock Fedora 12.
>> So this isn't a kvm bug after all. Definitely a loop+ext(4?) bug.
>> Looks like you need a pretty big loop mounted partition to trigger
>> it. (bigger than available ram?)
>>
>> This is what triggered it on a quad amd system with 8Gb of ram,
>> software raid-1 partition:
>> mount -o loop 2GB.dd source
>> dd if=/dev/zero of=8GB.dd bs=1048576 count=8192
>> mkfs.ext4 -f 8GB.dd
>> mount -o loop 8GB.dd dest
>> rsync -rplogtD source/* dest/
>> umount source
>> umount dest
>> ^ this is where it hangs, I then tried to issue a 'sync' from
>> another terminal, which also hung.
>> It took more than 10 minutes to settle itself, during that time one
>> CPU was stuck in wait state.
>>      
> This sounds like:
>    https://bugzilla.kernel.org/show_bug.cgi?id=15906
>    https://bugzilla.redhat.com/show_bug.cgi?id=588930
>    
Indeed it does.
Let's hope this makes it to -stable fast.

Antoine