From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=35883 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNeg8-0002L9-Vs for qemu-devel@nongnu.org; Tue, 30 Nov 2010 23:52:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PNQEe-00019L-L7 for qemu-devel@nongnu.org; Tue, 30 Nov 2010 08:27:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54535) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PNQEe-0008RY-Cv for qemu-devel@nongnu.org; Tue, 30 Nov 2010 08:27:08 -0500 From: Juan Quintela In-Reply-To: <4CF45E3F.4040609@codemonkey.ws> (Anthony Liguori's message of "Mon, 29 Nov 2010 20:15:27 -0600") References: <9b23b9b4cee242591bdb356c838a9cfb9af033c1.1290552026.git.quintela@redhat.com> <20101124104010.GA23493@redhat.com> <20101124111442.GF23493@redhat.com> <4CED2C34.4000503@redhat.com> <4CF45E3F.4040609@codemonkey.ws> Date: Tue, 30 Nov 2010 14:26:43 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: [Qemu-devel] Re: [PATCH 09/10] Exit loop if we have been there too long List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Paolo Bonzini , qemu-devel@nongnu.org, "Michael S. Tsirkin" Anthony Liguori wrote: > On 11/24/2010 09:16 AM, Paolo Bonzini wrote: >> On 11/24/2010 12:14 PM, Michael S. Tsirkin wrote: >>>> > buffered_file timer runs each 100ms. And we "try" to measure >>>> channel >>>> > bandwidth from there. If we are not able to run the timer, all the >>>> > calculations are wrong, and then stalls happens. >>> >>> So the problem is the timer in the buffered file abstraction? >>> Why don't we just flush out data if the buffer is full? >> >> It takes a lot to fill the buffer if you have many zero pages, and >> if that happens the guest is starved by the main loop filling the >> buffer. > > Sounds like the sort of thing you'd only see if you created a guest a > large guest that was mostly unallocated and then tried to migrate. > That doesn't seem like a very real case to me though. No. this is the "easy" to reproduce case. You can get that in normal use. Just with an idle guest with loads of memory is the worst possible case, and trivial to reproduce. > The best approach would be to drop qemu_mutex while processing this > code instead of having an arbitrary back-off point. The later is > deferring the problem to another day when it becomes the source of a > future problem. As told in the other mail, you are offering me half a solution. If I implemente the qemu_mutex change (that I will do) we would still have this problem on the main loop. CPU stuck for 10s will be done, but nothing else. Later, Juan. > Regards, > > Anthony Liguori > >> Paolo >>