From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51514) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XU910-0003OS-Ol for qemu-devel@nongnu.org; Wed, 17 Sep 2014 02:47:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XU90q-0002WQ-Ff for qemu-devel@nongnu.org; Wed, 17 Sep 2014 02:46:58 -0400 Received: from mail-pa0-f53.google.com ([209.85.220.53]:56663) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XU90q-0002VC-9J for qemu-devel@nongnu.org; Wed, 17 Sep 2014 02:46:48 -0400 Received: by mail-pa0-f53.google.com with SMTP id rd3so1519109pab.40 for ; Tue, 16 Sep 2014 23:46:44 -0700 (PDT) Message-ID: <54192E4A.8020401@ozlabs.ru> Date: Wed, 17 Sep 2014 16:46:34 +1000 From: Alexey Kardashevskiy MIME-Version: 1.0 References: <5416C46D.7040105@ozlabs.ru> <541826CA.7050607@ozlabs.ru> <541828BF.8090301@redhat.com> <20140916123431.GB4886@noname.str.redhat.com> In-Reply-To: <20140916123431.GB4886@noname.str.redhat.com> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf , Paolo Bonzini Cc: Max Reitz , "qemu-devel@nongnu.org" , Stefan Hajnoczi , "Dr. David Alan Gilbert" On 09/16/2014 10:34 PM, Kevin Wolf wrote: > Am 16.09.2014 um 14:10 hat Paolo Bonzini geschrieben: >> Il 16/09/2014 14:02, Alexey Kardashevskiy ha scritto: >>> I am having problems when migrate a guest via libvirt like this: >>> >>> virsh migrate --live --persistent --undefinesource --copy-storage-all >>> --verbose --desturi qemu+ssh://legkvm/system --domain chig1 >>> >>> The XML used to create the guest is at the end of this mail. >>> >>> I see NBD FLUSH command after the destination QEMU received EOF for >>> migration stream and this produces a crash in qcow2_co_flush_to_os() as >>> s->lock is false or s->l2_table_cache is NULL. >>> >> >> Max, Kevin, could the fix be something like this? >> >> diff --git a/block/qcow2.c b/block/qcow2.c >> index 0daf25c..e7459ea 100644 >> --- a/block/qcow2.c >> +++ b/block/qcow2.c >> @@ -1442,6 +1442,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp) >> memcpy(&aes_decrypt_key, &s->aes_decrypt_key, sizeof(aes_decrypt_key)); >> } >> >> + qemu_co_mutex_lock(&s->lock); >> qcow2_close(bs); >> >> bdrv_invalidate_cache(bs->file, &local_err); >> @@ -1455,6 +1456,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp) >> >> ret = qcow2_open(bs, options, flags, &local_err); >> QDECREF(options); >> + qemu_co_mutex_unlock(&s->lock); >> if (local_err) { >> error_setg(errp, "Could not reopen qcow2 layer: %s", >> error_get_pretty(local_err)); >> >> On top of this, *_invalidate_cache needs to be marked as coroutine_fn. > > I think bdrv_invalidate_cache() really needs to call bdrv_drain_all() > before starting to reopen stuff. There could be requests in flight > without holding the lock and if you can indeed reopen their BDS under > their feet without breaking things (I doubt it), that would be pure > luck. I tried the patch below and it did not help. So I assume I did it wrong, could you please explain more? Thanks! diff --git a/block.c b/block.c index 2df600e..ecc876d 100644 --- a/block.c +++ b/block.c @@ -5038,11 +5038,16 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp) return; } + bdrv_drain_all(); + if (bs->drv->bdrv_invalidate_cache) { bs->drv->bdrv_invalidate_cache(bs, &local_err); } else if (bs->file) { bdrv_invalidate_cache(bs->file, &local_err); } + + bdrv_drain_all(); + if (local_err) { error_propagate(errp, local_err); return; -- Alexey