From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45197) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b2zPd-0004dP-IH for qemu-devel@nongnu.org; Wed, 18 May 2016 07:13:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b2zPZ-0005WD-BB for qemu-devel@nongnu.org; Wed, 18 May 2016 07:13:12 -0400 Date: Wed, 18 May 2016 12:13:04 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20160518111303.GB2520@work-vm> References: <146356663919.20589.15258524664873470668.stgit@bahia.huguette.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <146356663919.20589.15258524664873470668.stgit@bahia.huguette.org> Subject: Re: [Qemu-devel] [PATCH] migration: regain control of images when migration fails to complete List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Greg Kurz , kwolf@redhat.com, mreitz@redhat.com, stefanha@redhat.com Cc: Amit Shah , Juan Quintela , qemu-stable@nongnu.org, qemu-devel@nongnu.org * Greg Kurz (gkurz@linux.vnet.ibm.com) wrote: > We currently have an error path during migration that can cause > the source QEMU to abort: Hmm, wasn't there something similar recently, sorry I can't remember the details, but cc'ing some block people who might remember. Dave > > migration_thread() > migration_completion() > runstate_is_running() ----------------> true if guest is running > bdrv_inactivate_all() ----------------> inactivate images > qemu_savevm_state_complete_precopy() > ... qemu_fflush() > socket_writev_buffer() --------> error because destination fails > qemu_fflush() -------------------> set error on migration stream > migration_completion() -----------------> set migrate state to FAILED > migration_thread() -----------------------> break migration loop > vm_start() -----------------------------> restart guest with inactive > images > > and you get: > > qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615) > qemu-system-ppc64: /home/greg/Work/qemu/qemu-master/block/io.c:1342:bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed. > Aborted (core dumped) > > If we try postcopy with a similar scenario, we also get the writev error > message but QEMU leaves the guest paused because entered_postcopy is true. > > We could possibly do the same with precopy and leave the guest paused. > But since the historical default for migration errors is to restart the > source, this patch adds a call to bdrv_invalidate_cache_all() instead. > > Signed-off-by: Greg Kurz > --- > migration/migration.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/migration/migration.c b/migration/migration.c > index 991313a8629a..5726959ddfd9 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1568,8 +1568,17 @@ static void migration_completion(MigrationState *s, int current_active_state, > ret = bdrv_inactivate_all(); > } > if (ret >= 0) { > + Error *local_err = NULL; > + > qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX); > qemu_savevm_state_complete_precopy(s->to_dst_file, false); > + > + if (qemu_file_get_error(s->to_dst_file)) { > + bdrv_invalidate_cache_all(&local_err); > + if (local_err) { > + error_report_err(local_err); > + } > + } > } > } > qemu_mutex_unlock_iothread(); > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK