From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andres Lagar Cavilla Subject: Re: Live Migration Error Date: Mon, 16 May 2005 13:55:37 -0500 Message-ID: <4288ECA9.4000503@cs.toronto.edu> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Pratt Cc: Teemu Koponen , xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Hi Ian, I got a fresh code image this morning. Live migration works fine, even after un-tweaking the timer back to its default value. I have tested, not necessarily thoroughly, but I haven't run into trouble yet. I guess this closes this chapter. For whatever it may be worth, I have some comments regarding the "previous" (Friday May 13) xfrd version: - Even though timeout increase would allow live migration to complete succesfully this was not always the case; there was actually a 50% chance of success. - On all successful migrations, the number of skipped pages after the last iteration and before domain suspend was always zero: Saving memory pages: iter 3 0% 3: sent 0, skipped 0, 3: sent 0, skipped 0, [DEBUG] Conn_sxpr> (AndresNfsDomain 8)[DEBUG] Conn_sxpr< err=0 [1116255361.997192] SUSPEND flags 00020004 shinfo 00000beb eip c01068fe esi 0002de60 - On all failed migrations, there was a nonzero number of said skipped pages (sometimes 12, sometimes 4) Hope this somehow helps. Keep up the excellent work Andres Ian Pratt wrote: > > > >>Teemu saves the day!!! >>I actually set the timeout to 100 for no particular reason >>(originally it was 10, 20 didn't work either) Thanks Ian for >>your suggestion as well >> > >I'd be really surprised if increasing the timeout actually made a difference. Are you sure you're not just using the shadow mode fix that was checked in a couple of hours ago? > >Best, >Ian > > >>Cheers!! >>Andres >>At 02:45 PM 5/13/2005, Teemu Koponen wrote: >> >>>On May 13, 2005, at 20:07, Andres Lagar Cavilla wrote: >>> >>>Andres, >>> >>> >>>>I try to do a live migration in the same physical host, i.e. xm >>>>migrate --live 'whatever' localhost It fails with 'Error: errors: >>>>suspend, failed, Callbak timed out'. >>>>It seems like transfer of memory pages works until the >>>> >>point when the >> >>>>domain needs to be suspended to do the final transfer. >>>> >>Funny thing is >> >>>>it used to work before, gloriously, and I haven't made any >>>>software/hardware changes. At some point a xm save command >>>> >>failed with >> >>>>timeout, and from there on live migration fails with this message. >>>>Non-live migration works perfectly, also between different physical >>>>hosts. save/restore also works flawlessly. >>>> >>>I had similar timeout errors previously, when I was using a >>> >>bit slower >> >>>servers. I overcame the problem by slightly increasing the timeout >>>value in controller.py. It seemed to provide a remedy. >>> >>>Teemu >>> >>>-- >>> >> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >> >>