From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=42846 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PNdhT-0006KL-Mo for qemu-devel@nongnu.org; Tue, 30 Nov 2010 22:50:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PNWjo-0000wn-BL for qemu-devel@nongnu.org; Tue, 30 Nov 2010 15:23:45 -0500 Received: from mail-gx0-f173.google.com ([209.85.161.173]:33710) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PNWjo-0000GO-6V for qemu-devel@nongnu.org; Tue, 30 Nov 2010 15:23:44 -0500 Received: by gxk24 with SMTP id 24so3220853gxk.4 for ; Tue, 30 Nov 2010 12:23:34 -0800 (PST) Message-ID: <4CF55D3E.5010400@codemonkey.ws> Date: Tue, 30 Nov 2010 14:23:26 -0600 From: Anthony Liguori MIME-Version: 1.0 References: <20101124104035.GB23493@redhat.com> <4CF46012.2060804@codemonkey.ws> <4CF50410.3080305@codemonkey.ws> <20101130161032.GF20536@redhat.com> <4CF52A09.5080201@codemonkey.ws> <4CF5485F.605@codemonkey.ws> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH 02/10] Add buffered_file_internal constant List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" On 11/30/2010 01:15 PM, Juan Quintela wrote: > Anthony Liguori wrote: > >> On 11/30/2010 12:04 PM, Juan Quintela wrote: >> >>> Anthony Liguori wrote: >>> >>> >>>> On 11/30/2010 10:32 AM, Juan Quintela wrote: >>>> >>>> >>>>> "Michael S. Tsirkin" wrote: >>>>> >>>>> >>>>> >>>>>> On Tue, Nov 30, 2010 at 04:40:41PM +0100, Juan Quintela wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Basically our bitmap handling code is "exponential" on memory size, >>>>>>> >>>>>>> >>>>>>> >>>>>> I didn't realize this. What makes it exponential? >>>>>> >>>>>> >>>>>> >>>>> Well, 1st of all, it is "exponential" as you measure it. >>>>> >>>>> stalls by default are: >>>>> >>>>> 1-2GB: milliseconds >>>>> 2-4GB: 100-200ms >>>>> 4-8GB: 1s >>>>> 64GB: 59s >>>>> 400GB: 24m (yes, minutes) >>>>> >>>>> That sounds really exponential. >>>>> >>>>> >>>>> >>>> How are you measuring stalls btw? >>>> >>>> >>> At the end of the ram_save_live(). This was the reason that I put the >>> information there. >>> >>> for the 24mins stall (I don't have that machine anymore) I had less >>> "exact" measurements. It was the amount that it "decided" to sent in >>> the last non-live part of memory migration. With the stalls& zero page >>> account, we just got to the point where we had basically infinity speed. >>> >>> >> That's not quite guest visible. >> > Humm, guest don't answer in 24mins > monitor don't answer in 24mins > ping don't answer in 24mins > > are you sure that this is not visible? Bug report put that guest had > just died, it was me who waited to see that it took 24mins to end. > I'm extremely sceptical that any of your patches would address this problem. Even if you had to scan every page in a 400GB guest, it would not take 24 minutes. Something is not quite right here. 24 minutes suggests that there's another problem that is yet to be identified. Regards, Anthony Liguori >> It only is a "stall" if the guest is trying to access device emulation >> and acquiring the qemu_mutex. A more accurate measurement would be >> something that measured guest availability. For instance, I tight >> loop of while (1) { usleep(100); gettimeofday(); } that then recorded >> periods of unavailability> X. >> > This is better, and this is what qemu_mutex change should fix. > > >> Of course, it's critically important that a working version of pvclock >> be available int he guest for this to be accurate. >> > If the problem are 24mins, we don't need such an "exact" version O:-) > > Later, Juan. >