From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37214) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UlWM5-0000kP-AQ for qemu-devel@nongnu.org; Sat, 08 Jun 2013 23:31:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UlWM4-0004Ee-3A for qemu-devel@nongnu.org; Sat, 08 Jun 2013 23:31:45 -0400 Received: from e23smtp07.au.ibm.com ([202.81.31.140]:42931) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UlWM3-0004E8-GG for qemu-devel@nongnu.org; Sat, 08 Jun 2013 23:31:44 -0400 Received: from /spool/local by e23smtp07.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sun, 9 Jun 2013 13:20:37 +1000 Message-ID: <51B3F706.1030204@linux.vnet.ibm.com> Date: Sun, 09 Jun 2013 11:31:18 +0800 From: Wenchao Xia MIME-Version: 1.0 References: <51A7036A.3050407@ozlabs.ru> <51A7049F.6040207@redhat.com> <51A70B3D.90609@ozlabs.ru> <51A71705.6060009@kamp.de> <51A74D79.7040204@redhat.com> <2765FDFA-8050-4AA3-8621-7E9EA2C89F9C@kamp.de> <51A764FC.7080705@redhat.com> <51ADF122.70307@kamp.de> <51ADF637.7060804@redhat.com> <51ADFBCE.3080200@kamp.de> <51ADFC7A.7030009@redhat.com> <51AE035A.5070301@kamp.de> <51B2EB0A.7000704@linux.vnet.ibm.com> <51B2EBA2.5060401@ozlabs.ru> <51B3E58C.50301@linux.vnet.ibm.com> <51B3E9A8.5010705@ozlabs.ru> <51B3EFFA.4040608@linux.vnet.ibm.com> <51B3F1FD.1090401@ozlabs.ru> In-Reply-To: <51B3F1FD.1090401@ozlabs.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] broken incoming migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: "qemu-ppc@nongnu.org" , Paolo Bonzini , Peter Lieven , "qemu-devel@nongnu.org" , David Gibson 于 2013-6-9 11:09, Alexey Kardashevskiy 写道: > On 06/09/2013 01:01 PM, Wenchao Xia wrote: >> 于 2013-6-9 10:34, Alexey Kardashevskiy 写道: >>> On 06/09/2013 12:16 PM, Wenchao Xia wrote: >>>> 于 2013-6-8 16:30, Alexey Kardashevskiy 写道: >>>>> On 06/08/2013 06:27 PM, Wenchao Xia wrote: >>>>>>> On 04.06.2013 16:40, Paolo Bonzini wrote: >>>>>>>> Il 04/06/2013 16:38, Peter Lieven ha scritto: >>>>>>>>> On 04.06.2013 16:14, Paolo Bonzini wrote: >>>>>>>>>> Il 04/06/2013 15:52, Peter Lieven ha scritto: >>>>>>>>>>> On 30.05.2013 16:41, Paolo Bonzini wrote: >>>>>>>>>>>> Il 30/05/2013 16:38, Peter Lieven ha scritto: >>>>>>>>>>>>>>> You could also scan the page for nonzero values before >>>>>>>>>>>>>>> writing it. >>>>>>>>>>>>> i had this in mind, but then choosed the other approach.... turned >>>>>>>>>>>>> out to be a bad idea. >>>>>>>>>>>>> >>>>>>>>>>>>> alexey: i will prepare a patch later today, could you then please >>>>>>>>>>>>> verify it fixes your problem. >>>>>>>>>>>>> >>>>>>>>>>>>> paolo: would we still need the madvise or is it enough to not >>>>>>>>>>>>> write >>>>>>>>>>>>> the zeroes? >>>>>>>>>>>> It should be enough to not write them. >>>>>>>>>>> Problem: checking the pages for zero allocates them. even at the >>>>>>>>>>> source. >>>>>>>>>> It doesn't look like. I tried this program and top doesn't show an >>>>>>>>>> increasing amount of reserved memory: >>>>>>>>>> >>>>>>>>>> #include >>>>>>>>>> #include >>>>>>>>>> int main() >>>>>>>>>> { >>>>>>>>>> char *x = malloc(500 << 20); >>>>>>>>>> int i, j; >>>>>>>>>> for (i = 0; i < 500; i += 10) { >>>>>>>>>> for (j = 0; j < 10 << 20; j += 4096) { >>>>>>>>>> *(volatile char*) (x + (i << 20) + j); >>>>>>>>>> } >>>>>>>>>> getchar(); >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>> strange. we are talking about RSS size, right? >>>>>>>> None of the three top values change, and only VIRT is >500 MB. >>>>>>>> >>>>>>>>> is the malloc above using mmapped memory? >>>>>>>> Yes. >>>>>>>> >>>>>>>>> which kernel version do you use? >>>>>>>> 3.9. >>>>>>>> >>>>>>>>> what avoids allocating the memory for me is the following (with >>>>>>>>> whatever side effects it has ;-)) >>>>>>>> This would also fail to migrate any page that is swapped out, breaking >>>>>>>> overcommit in a more subtle way. :) >>>>>>>> >>>>>>>> Paolo >>>>>>> the following does also not allocate memory, but qemu does... >>>>>>> >>>>>> Hi, Peter >>>>>> As the patch writes >>>>>> >>>>>> "not sending zero pages breaks migration if a page is zero >>>>>> at the source but not at the destination." >>>>>> >>>>>> I don't understand why it would be trouble, shouldn't all page >>>>>> not received in dest be treated as zero pages? >>>>> >>>>> >>>>> How would the destination guest know if some page must be cleared? The >>>>> previous patch (which Peter reverted) did not send anything for the pages >>>>> which were zero on the source side. >>>>> >>>>> >>>> If an page was not received and destination knows that page should >>>> exist according to total size, fill it with zero at destination, would >>>> it solve the problem? >>> >>> It is _live_ migration, the source sends changes, same pages can change and >>> be sent several times. So we would need to turn tracking on on the >>> destination to know if some page was received from the source or changed by >>> the destination itself (by writing there bios/firmware images, etc) and >>> then clear pages which were touched by the destination and were not sent by >>> the source. >> OK, I can understand the problem is, for example: >> Destination boots up with 0x0000-0xFFFF filled with bios image. >> Source forgot to send zero pages in 0x0000-0xFFFF. > > > The source did not forget, instead it zeroed these pages during its life > and thought that they must be zeroed at the destination already (as the > destination did not start and did not have a chance to write something there). > > >> After migration destination got 0x0000-0xFFFF dirty(different with >> source) > > Yep. And those pages were empty on the source what made debugging very easy :) > > >> Thanks for explain. >> >> This seems refer to the migration protocol: how should the guest treat >> unsent pages. The patch causing the problem, actually treat zero pages >> as "not to sent" at source, but another half is missing: treat "not >> received" as zero pages at destination. I guess if second half is added, >> problem is gone: >> after page transfer completed, before destination resume, >> fill zero in "not received" pages. > > > > Make a working patch, we'll discuss it :) I do not see much acceleration > coming from there. > > 4k zero page is compressed into header: 8 bytes flag + 1 byte tail + ( 1 + strlen(idstr) when ramblock is a new one), so take 10 bytes as average, ram:network flow is 4000:10 = 400:1 Then for a typical 4GB guest, sending the zero pages will take about 10M network flow, indeed not much acceleration. I think current method is already good enough, unless there are other benefits in not sending zero pages. -- Best Regards Wenchao Xia