All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Amit Shah <amit.shah@redhat.com>
Cc: "Denis V. Lunev" <den@openvz.org>,
	Peter Maydell <peter.maydell@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	qemu list <qemu-devel@nongnu.org>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PULL 2/5] migration: move bdrv_invalidate_cache_all of of coroutine context
Date: Mon, 7 Mar 2016 12:49:11 +0000	[thread overview]
Message-ID: <20160307124911.GB2253@work-vm> (raw)
In-Reply-To: <33f7c8c309e6625942e6b8548faa96606a6f99b1.1456212545.git.amit.shah@redhat.com>

* Amit Shah (amit.shah@redhat.com) wrote:
> From: "Denis V. Lunev" <den@openvz.org>
> 
> There is a possibility to hit an assert in qcow2_get_specific_info that
> s->qcow_version is undefined. This happens when VM in starting from
> suspended state, i.e. it processes incoming migration, and in the same
> time 'info block' is called.
> 
> The problem is that qcow2_invalidate_cache() closes the image and
> memset()s BDRVQcowState in the middle.
> 
> The patch moves processing of bdrv_invalidate_cache_all out of
> coroutine context for postcopy migration to avoid that. This function
> is called with the following stack:
>   process_incoming_migration_co
>   qemu_loadvm_state
>   qemu_loadvm_state_main
>   loadvm_process_command
>   loadvm_postcopy_handle_run
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

hmm; actually - this segs in a variety of different ways;
there are two problems:

   a) +    bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL);
     That's the easy one; that NULL should be 'mis', because
     the bh is expecting to use it as a MigrationIncomingState
     so it segs fairly reliably in the qemu_bh_delete(mis->bh)

   b) The harder problem is that there's a race where qemu_bh_delete
      segs, and I'm not 100% sure why yet - it only does it sometime
      (i.e. run virt-test and leave it and it occasionally does it).
      From the core it looks like qemu->bh is corrupt (0x10101010...)
      so maybe mis has been freed at that point?
      I'm suspecting this is the postcopy_ram_listen_thread freeing
      mis at the end of it, but I don't know yet.

Dave

> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Amit Shah <amit.shah@redhat.com>
> Message-Id: <1455259174-3384-3-git-send-email-den@openvz.org>
> Signed-off-by: Amit Shah <amit.shah@redhat.com>
> ---
>  migration/savevm.c | 27 +++++++++++++++++----------
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 94f2894..8415fd9 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1496,18 +1496,10 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>      return 0;
>  }
>  
> -/* After all discards we can start running and asking for pages */
> -static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
> +static void loadvm_postcopy_handle_run_bh(void *opaque)
>  {
> -    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
>      Error *local_err = NULL;
>  
> -    trace_loadvm_postcopy_handle_run();
> -    if (ps != POSTCOPY_INCOMING_LISTENING) {
> -        error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
> -        return -1;
> -    }
> -
>      /* TODO we should move all of this lot into postcopy_ram.c or a shared code
>       * in migration.c
>       */
> @@ -1519,7 +1511,6 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>      bdrv_invalidate_cache_all(&local_err);
>      if (local_err) {
>          error_report_err(local_err);
> -        return -1;
>      }
>  
>      trace_loadvm_postcopy_handle_run_cpu_sync();
> @@ -1534,6 +1525,22 @@ static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
>          /* leave it paused and let management decide when to start the CPU */
>          runstate_set(RUN_STATE_PAUSED);
>      }
> +}
> +
> +/* After all discards we can start running and asking for pages */
> +static int loadvm_postcopy_handle_run(MigrationIncomingState *mis)
> +{
> +    PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING);
> +    QEMUBH *bh;
> +
> +    trace_loadvm_postcopy_handle_run();
> +    if (ps != POSTCOPY_INCOMING_LISTENING) {
> +        error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps);
> +        return -1;
> +    }
> +
> +    bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL);
> +    qemu_bh_schedule(bh);
>  
>      /* We need to finish reading the stream from the package
>       * and also stop reading anything more from the stream that loaded the
> -- 
> 2.5.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2016-03-07 12:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-23  7:30 [Qemu-devel] [PULL 0/5] migration pull Amit Shah
2016-02-23  7:30 ` [Qemu-devel] [PULL 1/5] migration: move bdrv_invalidate_cache_all of of coroutine context Amit Shah
2016-02-23  7:30 ` [Qemu-devel] [PULL 2/5] " Amit Shah
2016-03-07 12:49   ` Dr. David Alan Gilbert [this message]
2016-03-07 13:30     ` Paolo Bonzini
2016-03-07 18:06       ` Dr. David Alan Gilbert
2016-03-07 18:58     ` Denis V. Lunev
2016-03-08 10:45       ` Dr. David Alan Gilbert
2016-03-08 10:54         ` Denis V. Lunev
2016-02-23  7:30 ` [Qemu-devel] [PULL 3/5] migration: reorder code to make it symmetric Amit Shah
2016-02-23  7:30 ` [Qemu-devel] [PULL 4/5] configure: detect ifunc and avx2 attribute Amit Shah
2016-02-23  7:30 ` [Qemu-devel] [PULL 5/5] cutils: add avx2 instruction optimization Amit Shah
2016-02-23  9:09 ` [Qemu-devel] [PULL 0/5] migration pull Peter Maydell
2016-02-23  9:38   ` Amit Shah
2016-02-23  9:48   ` Paolo Bonzini
2016-02-23 10:43     ` Peter Maydell
2016-02-23 11:18       ` Li, Liang Z
2016-02-23 11:25       ` Peter Maydell
2016-02-23 14:04         ` Paolo Bonzini
2016-02-24  9:27           ` Li, Liang Z
2016-03-08  4:23             ` Amit Shah
2016-03-08  4:28               ` Li, Liang Z
2016-02-23  9:55   ` Li, Liang Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160307124911.GB2253@work-vm \
    --to=dgilbert@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=den@openvz.org \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.