From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brendan Cully Subject: Re: slow live magration / xc_restore on xen4 pvops Date: Thu, 3 Jun 2010 10:29:27 -0700 Message-ID: <20100603172927.GA4817@kremvax.cs.ubc.ca> References: <2FD61F37AFF16D4DB46149330E4273C702FF9687@dcl-ex.dcml.docomolabs-usa.com> <4C0578EB.2040800@uni.leuphana.de> <19462.33905.936222.605434@mariner.uk.xensource.com> <20100602162745.GA27542@kremvax.cs.ubc.ca> <19463.32147.268104.94905@mariner.uk.xensource.com> <20100603150305.GA53591@zanzibar.domain.invalid> <19463.58180.892314.230322@mariner.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <19463.58180.892314.230322@mariner.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Jackson Cc: "xen-devel@lists.xensource.com" , Andreas Olsowski List-Id: xen-devel@lists.xenproject.org On Thursday, 03 June 2010 at 18:15, Ian Jackson wrote: > Brendan Cully writes ("Re: [Xen-devel] slow live magration / xc_restore on xen4 pvops"): > > The sender closes the fd, as it always has. xc_domain_restore has > > always consumed the entire contents of the fd, because the qemu tail > > has no length header under normal migration. There's no behavioral > > difference here that I can see. > > No, that is not the case. Look for example at "save" in > XendCheckpoint.py in xend, where the save code: > 1. Converts the domain config to sxp and writes it to the fd > 2. Calls xc_save (which calls xc_domain_save) > 3. Writes the qemu save file to the fd 4. (in XendDomain) closed the fd. Again, this is the _sender_. I fail to see your point. > > I have no objection to a more explicit interface. The current form is > > simply Remus trying to be as invisible as possible to the rest of the > > tool stack. > > My complaint is that that is not currently the case. > > > 1. reads are only supposed to be able to time out after the entire > > first checkpoint has been received (IOW this wouldn't kick in until > > normal migration had already completed) > > OMG I hadn't noticed that you had introduced a static variable for > that; I had assumed that "read_exact_timed" was roughly what it said > on the tin. > > I think I shall stop now before I become more rude. Feel free to reply if you have an actual Remus-caused regression instead of FUD based on misreading the code. I'd certainly be interested in fixing something real.