From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Byrne Subject: Stability of migration? Date: Tue, 13 Jun 2006 17:32:52 -0700 Message-ID: <448F5934.1010805@hp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel List-Id: xen-devel@lists.xenproject.org Hi, With xen-unstable changset 10333:360f9dc71f51, live migration is not reliable. Migrating an active domain (I use a kernel build in my test) back and forth between two machines will result in the build or the domain crashing. I tweaked xc_linux_save.c to enable the verify pass without outputting all the debugging messages and I can see that one or two pages do not get a data match in the log. I have yet to see a failure of the domain with non-live migration, but I sometimes see a data mismatch on a page during the verification. Which would indicate that either suspend doesn't mean what I think it does or pages of a suspended VM are being altered when they shouldn't be. So, I guess I'll start with the easy question: should non-live migration ever have a page fail to verify? If not, how can I identify the source of the problem? The harder question: how to identify the source of the corruption in live migration? Thanks, John Byrne