From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58720) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aUDBO-0001NV-5W for qemu-devel@nongnu.org; Fri, 12 Feb 2016 07:50:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aUDBK-0006RE-4W for qemu-devel@nongnu.org; Fri, 12 Feb 2016 07:50:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:53899) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aUDBJ-0006R1-TA for qemu-devel@nongnu.org; Fri, 12 Feb 2016 07:50:42 -0500 Date: Fri, 12 Feb 2016 12:50:36 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20160212125035.GA4920@work-vm> References: <1455259174-3384-1-git-send-email-den@openvz.org> <1455259174-3384-3-git-send-email-den@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1455259174-3384-3-git-send-email-den@openvz.org> Subject: Re: [Qemu-devel] [PATCH 2/2] migration: move bdrv_invalidate_cache_all of of coroutine context List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Denis V. Lunev" Cc: Amit Shah , Paolo Bonzini , qemu-devel@nongnu.org, Juan Quintela * Denis V. Lunev (den@openvz.org) wrote: > There is a possibility to hit an assert in qcow2_get_specific_info that > s->qcow_version is undefined. This happens when VM in starting from > suspended state, i.e. it processes incoming migration, and in the same > time 'info block' is called. > > The problem is that qcow2_invalidate_cache() closes the image and > memset()s BDRVQcowState in the middle. > > The patch moves processing of bdrv_invalidate_cache_all out of > coroutine context for postcopy migration to avoid that. This function > is called with the following stack: > process_incoming_migration_co > qemu_loadvm_state > qemu_loadvm_state_main > loadvm_process_command > loadvm_postcopy_handle_run > > Signed-off-by: Denis V. Lunev > CC: Paolo Bonzini > CC: Juan Quintela > CC: Amit Shah I've tested my normal postcopy smoke test and it seems to work, so Tested-by: Dr. David Alan Gilbert however, can you add a comment before loadvm_postcopy_handle_run_bh to explain why it's a bh, otherwise at some point someone might put it back. I'll admit to not really understanding what the difference is between bh and coroutine context; I'd thought if it was all in the main thread stuff was safe. Dave > --- > migration/savevm.c | 27 +++++++++++++++++---------- > 1 file changed, 17 insertions(+), 10 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index 94f2894..8415fd9 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1496,18 +1496,10 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis) > return 0; > } > > -/* After all discards we can start running and asking for pages */ > -static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > +static void loadvm_postcopy_handle_run_bh(void *opaque) > { > - PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING); > Error *local_err = NULL; > > - trace_loadvm_postcopy_handle_run(); > - if (ps != POSTCOPY_INCOMING_LISTENING) { > - error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps); > - return -1; > - } > - > /* TODO we should move all of this lot into postcopy_ram.c or a shared code > * in migration.c > */ > @@ -1519,7 +1511,6 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > bdrv_invalidate_cache_all(&local_err); > if (local_err) { > error_report_err(local_err); > - return -1; > } > > trace_loadvm_postcopy_handle_run_cpu_sync(); > @@ -1534,6 +1525,22 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > /* leave it paused and let management decide when to start the CPU */ > runstate_set(RUN_STATE_PAUSED); > } > +} > + > +/* After all discards we can start running and asking for pages */ > +static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > +{ > + PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING); > + QEMUBH *bh; > + > + trace_loadvm_postcopy_handle_run(); > + if (ps != POSTCOPY_INCOMING_LISTENING) { > + error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps); > + return -1; > + } > + > + bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL); > + qemu_bh_schedule(bh); > > /* We need to finish reading the stream from the package > * and also stop reading anything more from the stream that loaded the > -- > 2.1.4 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK