From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52168)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Vjp2Z-00028J-EI
	for qemu-devel@nongnu.org; Fri, 22 Nov 2013 06:36:57 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Vjp2T-0000Vw-B4
	for qemu-devel@nongnu.org; Fri, 22 Nov 2013 06:36:51 -0500
Received: from mx1.redhat.com ([209.132.183.28]:1553)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Vjp2T-0000Vk-31
	for qemu-devel@nongnu.org; Fri, 22 Nov 2013 06:36:45 -0500
Message-ID: <528F41BD.1050007@redhat.com>
Date: Fri, 22 Nov 2013 12:36:29 +0100
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1385025100-3191-1-git-send-email-lilei@linux.vnet.ibm.com>
	<20131121101934.GA9135@redhat.com>
	<528F4001.3050600@linux.vnet.ibm.com>
In-Reply-To: <528F4001.3050600@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side
 channel for ram
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Lei Li <lilei@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>, quintela@redhat.com, qemu-devel@nongnu.org, mrhines@linux.vnet.ibm.com, mdroth@linux.vnet.ibm.com, aliguori@amazon.com, lagarcia@br.ibm.com, rcj@linux.vnet.ibm.com

Il 22/11/2013 12:29, Lei Li ha scritto:
> During the page flipping migration, ram page of source guest would
> be flipped to the destination, that's why the source guest can not
> be resumed. AFAICT, the page flipping migration may fail at the
> connection stage (including the exchange of pipe fd) and migration
> register stage (say any blocker like unsupported migration device),

Unfortunately, some migration problems (e.g. misconfiguration of the
destination QEMU) cannot be detected until the device data is migrated.
 This happens after RAM migration, so there is indeed a reliability problem.

Postcopy would fix this (assuming the postcopy phase is reliable) by
migrating device data before any page flipping occurs.

Paolo

> but it could be resumed for such situation since the memory has not
> been flipped to another content. Once the connection is successfully
> setup, it would proceed the transmission of ram page which hardly
> fails. And for the failure handling in Libvirt, ZhengSheng has proposed
> that restarts the old QEMU instead of resume. I know 'hardly' is not
> an good answer to your concern, but it is the cost of the limited
> memory IMO.
> 
> So if downtime is the key to the user, or if it's *zero toleration of
> the restarting of QEMU, page flipping migration might not be a good
> choice. From the perspective of management app like Libvirt, as the
> 'live upgrade' of QEMU will be done through localhost migration, and
> there are other migration solutions which have lower downtime, like
> the real live migration and the postcopy migration that Paolo mentioned
> in the previous version [3]. Why not have more than one choice for it?