From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Problems using xl migrate Date: Mon, 24 Nov 2014 14:13:24 +0000 Message-ID: <54733D04.2030104@citrix.com> References: <54732EF5.5040607@citrix.com> <20141124140939.GA14073@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20141124140939.GA14073@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: George Dunlap , "xen-devel@lists.xen.org" , Ian Jackson , Ian Campbell , M A Young List-Id: xen-devel@lists.xenproject.org On 24/11/14 14:09, Wei Liu wrote: > On Mon, Nov 24, 2014 at 01:13:25PM +0000, Andrew Cooper wrote: >> On 24/11/14 11:50, George Dunlap wrote: >>> On Mon, Nov 24, 2014 at 12:07 AM, M A Young wrote: >>>> On Sat, 22 Nov 2014, M A Young wrote: >>>> >>>>> While investigating a bug reported on Red Hat Bugzilla >>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461 >>>>> I discovered the following >>>>> >>>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the >>>>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There >>>>> are actually two issues here >>>>> >>>>> * the segfault in libxl-save-helper --restore-domain (as reported in the >>>>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is >>>>> presumably because the allocated memory eventually runs out >>>> I have found a bit more out about this. The segfault at at line 1378 of >>>> tools/libxc/xc_domain_restore.c which is >>>> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx " >>>> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn], >>>> csum_page(region_base + (i + curbatch)*PAGE_SIZE), >>>> csum_page(buf)); >>>> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the >>>> array. This occurs in the verification phase. >>>> >>>>> * the segfault doesn't occur if the guest memory is 128M, but the >>>>> migration still fails. The first attached file contains the log from a run >>>>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines >>>>> stripped out to make the size manageable). >>>> The difference actually seems to be down to how active the VM is rather than >>>> the memory size (my small memory test system was doing very little, my >>>> larger system was a full OS install). In the non-segfault case the problem >>>> was the printf and printf_info commands in the create_domain() routine in >>>> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages >>>> back from the restoring dom0, these commands cause an unexpected message. If >>>> you move them onto stderr then the migration completes in the non-segfault >>>> case. >>> Good job tracking those down -- are there patches in the works? >> The segfault for "--debug" has already been identified and a patch >> posted by Wen Congyang >> >> The call to csum_page() incorrectly calculates the offset it is supposed >> to checksum, and wanders beyond the mapping of guest space. >> >> Patch in 1409908261-18682-3-git-send-email-wency@cn.fujitsu.com >> > And the said patch has been applied (3460eeb3fc2) so we're fine. But not backported to 4.4, which is why Michael is falling over it. ~Andrew