From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=46732 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGz04-0002T3-Gr for qemu-devel@nongnu.org; Tue, 25 May 2010 14:37:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OGz03-0003Vw-2D for qemu-devel@nongnu.org; Tue, 25 May 2010 14:37:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:9679) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGz02-0003Vq-QI for qemu-devel@nongnu.org; Tue, 25 May 2010 14:37:11 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o4PIbAji030096 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 25 May 2010 14:37:10 -0400 From: Juan Quintela In-Reply-To: <20100525150157.6c8d1599@redhat.com> (Luiz Capitulino's message of "Tue, 25 May 2010 15:01:57 -0300") References: <889abbffe3359f5160234e580cb663ec6189174e.1274796992.git.quintela@redhat.com> <20100525150157.6c8d1599@redhat.com> Date: Tue, 25 May 2010 20:37:07 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: [Qemu-devel] Re: [PATCH 1/5] Exit if incoming migration fails List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: qemu-devel@nongnu.org Luiz Capitulino wrote: > On Tue, 25 May 2010 16:21:01 +0200 > Juan Quintela wrote: > >> Signed-off-by: Juan Quintela >> --- >> migration.c | 16 ++++++++++------ >> migration.h | 2 +- >> vl.c | 7 ++++++- >> 3 files changed, 17 insertions(+), 8 deletions(-) >> > While I agree on the change, I have two comments: > > 1. By taking a look at the code I have the impression that most of the > fun failures will happen on the handler passed to qemu_set_fd_handler2(), > do you agree? Any plan to address that? That is outgoing migration, not incoming migration. Incoming migration in synchronous.. > 1. Is exit()ing the best thing to be done? I understand it's the easiest > and maybe better than nothing, but wouldn't it be better to enter in > paused-forever state so that clients can query and decide what to do? For incoming migration, if it fails in the middle, every bet is off. You are in a really inconsistent state, not sure which one, and if migration was live, with the other host possibly retaking the disks to continue. In some cases, you can't do anything: - you got passed an fd, and fd got closed/image corrupted/... - you got passed an exec command like "exec: gzip -d < foo.gz" If gzip failed once, it will fail forever. If you are running it by hand, cursor up + enter, and you are back If you are using a management application, it is going to be easier to restart the process that trying to cleanup everything. Experience shows that people really tries to do weird things when machine is in this state. Later, Juan.