From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52168) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vjp2Z-00028J-EI for qemu-devel@nongnu.org; Fri, 22 Nov 2013 06:36:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vjp2T-0000Vw-B4 for qemu-devel@nongnu.org; Fri, 22 Nov 2013 06:36:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:1553) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vjp2T-0000Vk-31 for qemu-devel@nongnu.org; Fri, 22 Nov 2013 06:36:45 -0500 Message-ID: <528F41BD.1050007@redhat.com> Date: Fri, 22 Nov 2013 12:36:29 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1385025100-3191-1-git-send-email-lilei@linux.vnet.ibm.com> <20131121101934.GA9135@redhat.com> <528F4001.3050600@linux.vnet.ibm.com> In-Reply-To: <528F4001.3050600@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Lei Li Cc: Andrea Arcangeli , quintela@redhat.com, qemu-devel@nongnu.org, mrhines@linux.vnet.ibm.com, mdroth@linux.vnet.ibm.com, aliguori@amazon.com, lagarcia@br.ibm.com, rcj@linux.vnet.ibm.com Il 22/11/2013 12:29, Lei Li ha scritto: > During the page flipping migration, ram page of source guest would > be flipped to the destination, that's why the source guest can not > be resumed. AFAICT, the page flipping migration may fail at the > connection stage (including the exchange of pipe fd) and migration > register stage (say any blocker like unsupported migration device), Unfortunately, some migration problems (e.g. misconfiguration of the destination QEMU) cannot be detected until the device data is migrated. This happens after RAM migration, so there is indeed a reliability problem. Postcopy would fix this (assuming the postcopy phase is reliable) by migrating device data before any page flipping occurs. Paolo > but it could be resumed for such situation since the memory has not > been flipped to another content. Once the connection is successfully > setup, it would proceed the transmission of ram page which hardly > fails. And for the failure handling in Libvirt, ZhengSheng has proposed > that restarts the old QEMU instead of resume. I know 'hardly' is not > an good answer to your concern, but it is the cost of the limited > memory IMO. > > So if downtime is the key to the user, or if it's *zero toleration of > the restarting of QEMU, page flipping migration might not be a good > choice. From the perspective of management app like Libvirt, as the > 'live upgrade' of QEMU will be done through localhost migration, and > there are other migration solutions which have lower downtime, like > the real live migration and the postcopy migration that Paolo mentioned > in the previous version [3]. Why not have more than one choice for it?