From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51514)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1XU910-0003OS-Ol
	for qemu-devel@nongnu.org; Wed, 17 Sep 2014 02:47:06 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1XU90q-0002WQ-Ff
	for qemu-devel@nongnu.org; Wed, 17 Sep 2014 02:46:58 -0400
Received: from mail-pa0-f53.google.com ([209.85.220.53]:56663)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aik@ozlabs.ru>) id 1XU90q-0002VC-9J
	for qemu-devel@nongnu.org; Wed, 17 Sep 2014 02:46:48 -0400
Received: by mail-pa0-f53.google.com with SMTP id rd3so1519109pab.40
	for <qemu-devel@nongnu.org>; Tue, 16 Sep 2014 23:46:44 -0700 (PDT)
Message-ID: <54192E4A.8020401@ozlabs.ru>
Date: Wed, 17 Sep 2014 16:46:34 +1000
From: Alexey Kardashevskiy <aik@ozlabs.ru>
MIME-Version: 1.0
References: <5416C46D.7040105@ozlabs.ru> <541826CA.7050607@ozlabs.ru>
	<541828BF.8090301@redhat.com>
	<20140916123431.GB4886@noname.str.redhat.com>
In-Reply-To: <20140916123431.GB4886@noname.str.redhat.com>
Content-Type: text/plain; charset=koi8-r
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141:
 qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, Stefan Hajnoczi <stefanha@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>

On 09/16/2014 10:34 PM, Kevin Wolf wrote:
> Am 16.09.2014 um 14:10 hat Paolo Bonzini geschrieben:
>> Il 16/09/2014 14:02, Alexey Kardashevskiy ha scritto:
>>> I am having problems when migrate a guest via libvirt like this:
>>>
>>> virsh migrate --live --persistent --undefinesource --copy-storage-all
>>> --verbose --desturi qemu+ssh://legkvm/system --domain chig1
>>>
>>> The XML used to create the guest is at the end of this mail.
>>>
>>> I see NBD FLUSH command after the destination QEMU received EOF for
>>> migration stream and this produces a crash in qcow2_co_flush_to_os() as
>>> s->lock is false or s->l2_table_cache is NULL.
>>>
>>
>> Max, Kevin, could the fix be something like this?
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 0daf25c..e7459ea 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1442,6 +1442,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
>>          memcpy(&aes_decrypt_key, &s->aes_decrypt_key, sizeof(aes_decrypt_key));
>>      }
>>  
>> +    qemu_co_mutex_lock(&s->lock);
>>      qcow2_close(bs);
>>  
>>      bdrv_invalidate_cache(bs->file, &local_err);
>> @@ -1455,6 +1456,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp)
>>  
>>      ret = qcow2_open(bs, options, flags, &local_err);
>>      QDECREF(options);
>> +    qemu_co_mutex_unlock(&s->lock);
>>      if (local_err) {
>>          error_setg(errp, "Could not reopen qcow2 layer: %s",
>>                     error_get_pretty(local_err));
>>
>> On top of this, *_invalidate_cache needs to be marked as coroutine_fn.
> 
> I think bdrv_invalidate_cache() really needs to call bdrv_drain_all()
> before starting to reopen stuff. There could be requests in flight
> without holding the lock and if you can indeed reopen their BDS under
> their feet without breaking things (I doubt it), that would be pure
> luck.


I tried the patch below and it did not help. So I assume I did it wrong,
could you please explain more? Thanks!


diff --git a/block.c b/block.c
index 2df600e..ecc876d 100644
--- a/block.c
+++ b/block.c
@@ -5038,11 +5038,16 @@ void bdrv_invalidate_cache(BlockDriverState *bs,
Error **errp)
         return;
     }

+    bdrv_drain_all();
+
     if (bs->drv->bdrv_invalidate_cache) {
         bs->drv->bdrv_invalidate_cache(bs, &local_err);
     } else if (bs->file) {
         bdrv_invalidate_cache(bs->file, &local_err);
     }
+
+    bdrv_drain_all();
+
     if (local_err) {
         error_propagate(errp, local_err);
         return;


-- 
Alexey