From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=53827 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PNeRS-0000pA-0t
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 23:37:38 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PNRh2-00040r-6S
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 10:00:34 -0500
Received: from mail-qy0-f195.google.com ([209.85.216.195]:36948)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PNRh2-0003r4-3m
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 10:00:32 -0500
Received: by qyk2 with SMTP id 2so1747154qyk.10
	for <qemu-devel@nongnu.org>; Tue, 30 Nov 2010 07:00:22 -0800 (PST)
Message-ID: <4CF51179.9070306@codemonkey.ws>
Date: Tue, 30 Nov 2010 09:00:09 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <cover.1290552026.git.quintela@redhat.com>	<9b23b9b4cee242591bdb356c838a9cfb9af033c1.1290552026.git.quintela@redhat.com>
	<4CF45D67.5010906@codemonkey.ws> <4CF4A478.8080209@redhat.com>
	<4CF5008F.2090306@codemonkey.ws> <4CF5063B.7020504@redhat.com>
In-Reply-To: <4CF5063B.7020504@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too
	long
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Juan Quintela <quintela@trasno.org>, qemu-devel@nongnu.org, kvm-devel <kvm@vger.kernel.org>, Juan Quintela <quintela@redhat.com>

On 11/30/2010 08:12 AM, Paolo Bonzini wrote:
> On 11/30/2010 02:47 PM, Anthony Liguori wrote:
>> On 11/30/2010 01:15 AM, Paolo Bonzini wrote:
>>> On 11/30/2010 03:11 AM, Anthony Liguori wrote:
>>>>
>>>> BufferedFile should hit the qemu_file_rate_limit check when the socket
>>>> buffer gets filled up.
>>>
>>> The problem is that the file rate limit is not hit because work is
>>> done elsewhere. The rate can limit the bandwidth used and makes QEMU
>>> aware that socket operations may block (because that's what the
>>> buffered file freeze/unfreeze logic does); but it cannot be used to
>>> limit the _time_ spent in the migration code.
>>
>> Yes, it can, if you set the rate limit sufficiently low.
>
> You mean, just like you can drive a car without brakes by keeping the 
> speed sufficiently low.
>
>> [..] accounting zero pages as full sized
>> pages should "fix" the problem.
>
> I know you used quotes, but it's a very very generous definition of 
> fix.  Both these proposed "fixes" are nothing more than workarounds, 
> and even particularly ugly ones.  The worst thing about them is that 
> there is no guarantee of migration finishing in a reasonable time, or 
> at all.
>
> If you account zero pages as full, you don't use effectively the 
> bandwidth that was allotted to you, you use only 0.2% of it (8/4096). 
> It then takes an exaggerate amount of time to start iteration on pages 
> that matter.  If you set the bandwidth low, instead, you do not have 
> the bandwidth you need in order to converge.
>
> Even from an aesthetic point of view, if there is such a thing, I 
> don't understand why you advocate conflating network bandwidth and CPU 
> usage into a single measurement.  Nobody disagrees that all you 
> propose is nice to have, and that what Juan sent is a stopgap measure 
> (though a very effective one).  However, this doesn't negate that 
> Juan's accounting patches make a lot of sense in the current design.

Juan's patch, IIUC, does the following: If you've been iterating in a 
tight loop, return to the main loop for *one* iteration every 50ms.

But this means that during this 50ms period of time, a VCPU may be 
blocked from running.  If the guest isn't doing a lot of device I/O 
*and* you're on a relatively low link speed, then this will mean that 
you don't hold qemu_mutex for more than 50ms at a time.

But in the degenerate case where you have a high speed link and you have 
a guest doing a lot of device I/O, you'll see the guest VCPU being 
blocked for 50ms, then getting to run for a very brief period of time, 
followed by another block for 50ms.  The guest's execution will be 
extremely sporadic.

This isn't fixable with this approach.  The only way to really fix this 
is to say that over a given period of time, migration may only consume 
XX amount of CPU time which guarantees the VCPUs get the qemu_mutex for 
the rest of the time.

This is exactly what rate limiting does.  Yes, it results in a longer 
migration time but that's the trade-off we have to make if we want 
deterministic VCPU execution until we can implement threading properly.

If you want a simple example, doing I/O with the rtl8139 adapter while 
doing your migration test and run a tight loop in the get running 
gettimeofday().  Graph the results to see how much execution time the 
guest is actually getting.


>> In the long term, we need a new dirty bit interface from kvm.ko that
>> uses a multi-level table. That should dramatically improve scan
>> performance. We also need to implement live migration in a separate
>> thread that doesn't carry qemu_mutex while it runs.
>
> This may be a good way to fix it, but it's also basically a rewrite.

The only correct short term solution I can see if rate limiting 
unfortunately.

Regards,

Anthony Liguori

> Paolo
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html