From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=35883 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PNeg8-0002L9-Vs
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 23:52:54 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1PNQEe-00019L-L7
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 08:27:10 -0500
Received: from mx1.redhat.com ([209.132.183.28]:54535)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <quintela@redhat.com>) id 1PNQEe-0008RY-Cv
	for qemu-devel@nongnu.org; Tue, 30 Nov 2010 08:27:08 -0500
From: Juan Quintela <quintela@redhat.com>
In-Reply-To: <4CF45E3F.4040609@codemonkey.ws> (Anthony Liguori's message of
	"Mon, 29 Nov 2010 20:15:27 -0600")
References: <cover.1290552026.git.quintela@redhat.com>
	<9b23b9b4cee242591bdb356c838a9cfb9af033c1.1290552026.git.quintela@redhat.com>
	<20101124104010.GA23493@redhat.com> <m3r5ebdly8.fsf@trasno.mitica>
	<20101124111442.GF23493@redhat.com> <4CED2C34.4000503@redhat.com>
	<4CF45E3F.4040609@codemonkey.ws>
Date: Tue, 30 Nov 2010 14:26:43 +0100
Message-ID: <m3bp57vt64.fsf@trasno.mitica>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Subject: [Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too
	long
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/24/2010 09:16 AM, Paolo Bonzini wrote:
>> On 11/24/2010 12:14 PM, Michael S. Tsirkin wrote:
>>>> >  buffered_file timer runs each 100ms.  And we "try" to measure
>>>> channel
>>>> >  bandwidth from there.  If we are not able to run the timer, all the
>>>> >  calculations are wrong, and then stalls happens.
>>>
>>> So the problem is the timer in the buffered file abstraction?
>>> Why don't we just flush out data if the buffer is full?
>>
>> It takes a lot to fill the buffer if you have many zero pages, and
>> if that happens the guest is starved by the main loop filling the
>> buffer.
>
> Sounds like the sort of thing you'd only see if you created a guest a
> large guest that was mostly unallocated and then tried to migrate.
> That doesn't seem like a very real case to me though.

No.  this is the "easy" to reproduce case.  You can get that in normal
use.  Just with an idle guest with loads of memory is the worst possible
case, and trivial to reproduce.

> The best approach would be to drop qemu_mutex while processing this
> code instead of having an arbitrary back-off point.  The later is
> deferring the problem to another day when it becomes the source of a
> future problem.

As told in the other mail, you are offering me half a solution.  If I
implemente the qemu_mutex change (that I will do) we would still have
this problem on the main loop.  CPU stuck for 10s will be done, but
nothing else.

Later, Juan.

> Regards,
>
> Anthony Liguori
>
>> Paolo
>>