From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NJCtw-0001FI-Dt for qemu-devel@nongnu.org; Fri, 11 Dec 2009 16:19:48 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NJCtr-00016y-7i for qemu-devel@nongnu.org; Fri, 11 Dec 2009 16:19:47 -0500 Received: from [199.232.76.173] (port=55370 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NJCtq-00016n-WB for qemu-devel@nongnu.org; Fri, 11 Dec 2009 16:19:43 -0500 Received: from mx1.redhat.com ([209.132.183.28]:27460) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NJCtq-0007pE-2r for qemu-devel@nongnu.org; Fri, 11 Dec 2009 16:19:42 -0500 Date: Fri, 11 Dec 2009 21:19:35 +0000 From: "Daniel P. Berrange" Subject: Re: [Qemu-devel] RFC: exit on incoming exec migrate failure Message-ID: <20091211211935.GO21297@redhat.com> References: <8994198D-0AA5-45C0-8A46-375BCA34E201@hq.newdream.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8994198D-0AA5-45C0-8A46-375BCA34E201@hq.newdream.net> Reply-To: "Daniel P. Berrange" List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrew Farmer Cc: qemu-devel@nongnu.org On Wed, Dec 09, 2009 at 01:10:18PM -0800, Andrew Farmer wrote: > Right now, if an incoming migrate through exec fails, the qemu process > will end up chewing CPU indefinitely - it looks like it closes the > migration FD but doesn't remove its IO handler properly. An easy way > to reproduce this is to try launching with -incoming exec:/bin/false. > This is obviously useless, but illustrates the issue handily. I've hit this in real life too, with restore from a file containing the saved state which had got corrupted/truncated. I only discovered the failure when I wondered by QEMU was chewing 100% cpu > One solution might be to retry the command on migrate failure, but that > won't really help in all circumstances (for instance, if the migration > command is broken!), so it seems equally appropriate to just die if an > incoming exec migration fails. The patch is trivial, and follows - does > this look sensible? (I'm new to qemu development, but trying to pick it up.) It looks like a reasonable approach to me. If we carried on running, it would be hard for apps to determine whether migration succeeded & thus QEMU is running, or whether it failed and is just idling. By exiting we give the management app/user the optional to retry simply by relaunching > diff --git a/migration-exec.c b/migration-exec.c > index c830669..0292c19 100644 > --- a/migration-exec.c > +++ b/migration-exec.c > @@ -114,7 +114,7 @@ static void exec_accept_incoming_migration(void *opaque) > ret = qemu_loadvm_state(f); > if (ret < 0) { > fprintf(stderr, "load of migration failed\n"); > - goto err; > + exit(0); > } > qemu_announce_self(); > dprintf("successfully loaded vm state\n"); > @@ -123,7 +123,6 @@ static void exec_accept_incoming_migration(void *opaque) > if (autostart) > vm_start(); > > -err: > qemu_fclose(f); > } Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|