From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olaf Hering Subject: reliable live migration of large and busy guests Date: Tue, 6 Nov 2012 21:28:16 +0100 Message-ID: <20121106202816.GA29655@aepfle.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org We got a customer report about long-lasting and then failing live migration of busy guests. The guest has 64G memory, is busy with its set of applications and as a result there will be always dirty pages to transfer. While some of this can be solved with faster network connection, the underlying issue is that tools/libxc/xc_domain_save.c:xc_domain_save will suspend a domain after a given number of iterations to transfer the remaining dirty pages. From what I understand this pausing of the guest (I dont know how long it is actually paused) is causing issues within the guest, the applications start to fail (again, no details). Their suggestion is to add some knob to the overall live migration process to avoid the suspend. If the guest could not be transfered with the parameters passed to xc_domain_save(), abort the migration and let it running on the old host. My questions are: Was such issue ever seen elsewhere? Should 'xm migrate --live' and 'xl migrate' get something like a --no-suspend option? Olaf