From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Olsowski Subject: Re: pv guests die after failed migration Date: Fri, 23 Sep 2011 11:15:48 +0200 Message-ID: <4E7C4E44.70508@leuphana.de> References: <4E786015.80603@leuphana.de> <1316546879.5182.26.camel@dagon.hellion.org.uk> <4E7C37BD.2000706@leuphana.de> <1316764045.23371.100.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Return-path: In-Reply-To: <1316764045.23371.100.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org On 09/23/2011 09:47 AM, Ian Campbell wrote: > This seems to be taking the non-cancelled resume path, does this patch > help at all: > > diff -r d7b14b76f1eb tools/libxl/libxl.c > --- a/tools/libxl/libxl.c Thu Sep 22 14:26:08 2011 +0100 > +++ b/tools/libxl/libxl.c Fri Sep 23 08:45:28 2011 +0100 > @@ -246,7 +246,7 @@ int libxl_domain_resume(libxl_ctx *ctx, > rc = ERROR_NI; > goto out; > } > - if (xc_domain_resume(ctx->xch, domid, 0)) { > + if (xc_domain_resume(ctx->xch, domid, 1)) { > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, > "xc_domain_resume failed for domain %u", > domid); > > I don't think that's a solution but if this patch works then it may > indicate a problem with xc_domain_resume_any. > For the curent xen-4.1-testing.hg the patch had to be modified to a different position: --- a/tools/libxl/libxl.c Thu Sep 22 14:26:08 2011 +0100 +++ b/tools/libxl/libxl.c Fri Sep 23 08:45:28 2011 +0100 @@ -229,7 +229,7 @@ rc = ERROR_NI; goto out; } - if (xc_domain_resume(ctx->xch, domid, 0)) { + if (xc_domain_resume(ctx->xch, domid, 1)) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_domain_resume failed for domain %u", domid); I did a clean/make/install after that, compilation worked fine. I then tested the migration towards an unsuitable target again and it did what you thought it could do. The guest resumes correctly at sender ############### root@xenturio1:~# xl -vvv migrate thishopefullywontfail xenturio2 Saving to migration stream new xl format (info 0x0/0x0/407) migration target: Ready to receive domain. Loading new save file incoming migration stream (new xl fmt info 0x0/0x0/407) Savefile contains xl domain config xc: detail: Had 0 unexplained entries in p2m table xc: Saving memory: iter 0 (last sent 0 skipped 0): 133120/133120 100% xc: detail: delta 9529ms, dom0 93%, target 1%, sent 449Mb/s, dirtied 1Mb/s 502 pages xc: Saving memory: iter 1 (last sent 130592 skipped 480): 133120/133120 100% xc: detail: delta 37ms, dom0 91%, target 2%, sent 444Mb/s, dirtied 30Mb/s 34 pages xc: Saving memory: iter 2 (last sent 502 skipped 0): 133120/133120 100% xc: detail: Start last iteration libxl: debug: libxl_dom.c:384:libxl__domain_suspend_common_callback issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:389:libxl__domain_suspend_common_callback wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:434:libxl__domain_suspend_common_callback guest acknowledged suspend request libxl: debug: libxl_dom.c:438:libxl__domain_suspend_common_callback wait for the guest to suspend libxl: debug: libxl_dom.c:450:libxl__domain_suspend_common_callback guest has suspended xc: detail: SUSPEND shinfo 0007fafc xc: detail: delta 204ms, dom0 3%, target 0%, sent 4Mb/s, dirtied 25Mb/s 156 pages xc: Saving memory: iter 3 (last sent 30 skipped 4): 133120/133120 100% xc: detail: delta 3ms, dom0 0%, target 0%, sent 1703Mb/s, dirtied 1703Mb/s 156 pages xc: detail: Total pages sent= 131280 (0.99x) xc: detail: (of which 0 were fixups) xc: detail: All memory is saved xc: detail: Save exit rc=0 libxl: error: libxl.c:900:validate_virtual_disk failed to stat /dev/xen-data/thishopefullywontfail-root: No such file or directory cannot add disk 0 to domain: -6 migration target: Domain creation failed (code -3). libxl: error: libxl_utils.c:408:libxl_read_exactly file/stream truncated reading ready message from migration receiver stream libxl: info: libxl_exec.c:72:libxl_report_child_exitstatus migration target process [16608] exited with error status 3 Migration failed, resuming at sender. root@xenturio1:~# xl console thishopefullywontfail PM: freeze of devices complete after 0.197 msecs PM: late freeze of devices complete after 0.067 msecs PM: early thaw of devices complete after 0.074 msecs PM: thaw of devices complete after 0.077 msecs root@thishopefullywontfail:~# ##################### So that works There is no mention of the migration failing in the guest log though, maybe when a final patch is made it should log that failing migration? with best regards andreas