From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=42846 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PNdhT-0006KL-Mo
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 22:50:06 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PNWjo-0000wn-BL
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 15:23:45 -0500
Received: from mail-gx0-f173.google.com ([209.85.161.173]:33710)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PNWjo-0000GO-6V
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 15:23:44 -0500
Received: by gxk24 with SMTP id 24so3220853gxk.4
	for <qemu-devel@nongnu.org>; Tue, 30 Nov 2010 12:23:34 -0800 (PST)
Message-ID: <4CF55D3E.5010400@codemonkey.ws>
Date: Tue, 30 Nov 2010 14:23:26 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <cover.1290552026.git.quintela@redhat.com>	<a12e21918775e0c0ae23c373be1e35060b52031b.1290552026.git.quintela@redhat.com>	<20101124104035.GB23493@redhat.com>
	<m3wro3dmdy.fsf@trasno.mitica>	<4CF46012.2060804@codemonkey.ws>
	<m3hbezvxcw.fsf@trasno.mitica>	<4CF50410.3080305@codemonkey.ws>
	<m31v62vmyu.fsf@trasno.mitica>	<20101130161032.GF20536@redhat.com>
	<m3lj4au609.fsf@trasno.mitica>	<4CF52A09.5080201@codemonkey.ws>
	<m34oayr8m9.fsf@trasno.mitica>	<4CF5485F.605@codemonkey.ws>
	<m3vd3epqqd.fsf@trasno.mitica>
In-Reply-To: <m3vd3epqqd.fsf@trasno.mitica>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Juan Quintela <quintela@redhat.com>
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>

On 11/30/2010 01:15 PM, Juan Quintela wrote:
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>    
>> On 11/30/2010 12:04 PM, Juan Quintela wrote:
>>      
>>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
>>>
>>>        
>>>> On 11/30/2010 10:32 AM, Juan Quintela wrote:
>>>>
>>>>          
>>>>> "Michael S. Tsirkin"<mst@redhat.com>    wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Basically our bitmap handling code is "exponential" on memory size,
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> I didn't realize this. What makes it exponential?
>>>>>>
>>>>>>
>>>>>>              
>>>>> Well, 1st of all, it is "exponential" as you measure it.
>>>>>
>>>>> stalls by default are:
>>>>>
>>>>> 1-2GB: milliseconds
>>>>> 2-4GB: 100-200ms
>>>>> 4-8GB: 1s
>>>>> 64GB: 59s
>>>>> 400GB: 24m (yes, minutes)
>>>>>
>>>>> That sounds really exponential.
>>>>>
>>>>>
>>>>>            
>>>> How are you measuring stalls btw?
>>>>
>>>>          
>>> At the end of the ram_save_live().  This was the reason that I put the
>>> information there.
>>>
>>> for the 24mins stall (I don't have that machine anymore) I had less
>>> "exact" measurements.  It was the amount that it "decided" to sent in
>>> the last non-live part of memory migration.  With the stalls&   zero page
>>> account, we just got to the point where we had basically infinity speed.
>>>
>>>        
>> That's not quite guest visible.
>>      
> Humm, guest don't answer in 24mins
> monitor don't answer in 24mins
> ping don't answer in 24mins
>
> are you sure that this is not visible?  Bug report put that guest had
> just died, it was me who waited to see that it took 24mins to end.
>    

I'm extremely sceptical that any of your patches would address this 
problem.  Even if you had to scan every page in a 400GB guest, it would 
not take 24 minutes.   Something is not quite right here.

24 minutes suggests that there's another problem that is yet to be 
identified.

Regards,

Anthony Liguori

>> It only is a "stall" if the guest is trying to access device emulation
>> and acquiring the qemu_mutex.  A more accurate measurement would be
>> something that measured guest availability.  For instance, I tight
>> loop of while (1) { usleep(100); gettimeofday(); } that then recorded
>> periods of unavailability>  X.
>>      
> This is better, and this is what qemu_mutex change should fix.
>
>    
>> Of course, it's critically important that a working version of pvclock
>> be available int he guest for this to be accurate.
>>      
> If the problem are 24mins, we don't need such an "exact" version O:-)
>
> Later, Juan.
>