From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55484)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1XTry1-0006yK-Fw
	for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:34:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1XTrxv-0000Iq-BB
	for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:34:45 -0400
Received: from mx1.redhat.com ([209.132.183.28]:50464)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1XTrxv-0000HZ-1d
	for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:34:39 -0400
Date: Tue, 16 Sep 2014 14:34:31 +0200
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20140916123431.GB4886@noname.str.redhat.com>
References: <5416C46D.7040105@ozlabs.ru> <541826CA.7050607@ozlabs.ru>
	<541828BF.8090301@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <541828BF.8090301@redhat.com>
Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141:
 qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Max Reitz <mreitz@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Am 16.09.2014 um 14:10 hat Paolo Bonzini geschrieben:
> Il 16/09/2014 14:02, Alexey Kardashevskiy ha scritto:
> > I am having problems when migrate a guest via libvirt like this:
> > 
> > virsh migrate --live --persistent --undefinesource --copy-storage-all
> > --verbose --desturi qemu+ssh://legkvm/system --domain chig1
> > 
> > The XML used to create the guest is at the end of this mail.
> > 
> > I see NBD FLUSH command after the destination QEMU received EOF for
> > migration stream and this produces a crash in qcow2_co_flush_to_os() as
> > s->lock is false or s->l2_table_cache is NULL.
> > 
> 
> Max, Kevin, could the fix be something like this?
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 0daf25c..e7459ea 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1442,6 +1442,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
>          memcpy(&aes_decrypt_key, &s->aes_decrypt_key, sizeof(aes_decrypt_key));
>      }
>  
> +    qemu_co_mutex_lock(&s->lock);
>      qcow2_close(bs);
>  
>      bdrv_invalidate_cache(bs->file, &local_err);
> @@ -1455,6 +1456,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
>  
>      ret = qcow2_open(bs, options, flags, &local_err);
>      QDECREF(options);
> +    qemu_co_mutex_unlock(&s->lock);
>      if (local_err) {
>          error_setg(errp, "Could not reopen qcow2 layer: %s",
>                     error_get_pretty(local_err));
> 
> On top of this, *_invalidate_cache needs to be marked as coroutine_fn.

I think bdrv_invalidate_cache() really needs to call bdrv_drain_all()
before starting to reopen stuff. There could be requests in flight
without holding the lock and if you can indeed reopen their BDS under
their feet without breaking things (I doubt it), that would be pure
luck.

Kevin