From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55484) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTry1-0006yK-Fw for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:34:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XTrxv-0000Iq-BB for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:34:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50464) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTrxv-0000HZ-1d for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:34:39 -0400 Date: Tue, 16 Sep 2014 14:34:31 +0200 From: Kevin Wolf Message-ID: <20140916123431.GB4886@noname.str.redhat.com> References: <5416C46D.7040105@ozlabs.ru> <541826CA.7050607@ozlabs.ru> <541828BF.8090301@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <541828BF.8090301@redhat.com> Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Alexey Kardashevskiy , "qemu-devel@nongnu.org" , Max Reitz , Stefan Hajnoczi , "Dr. David Alan Gilbert" Am 16.09.2014 um 14:10 hat Paolo Bonzini geschrieben: > Il 16/09/2014 14:02, Alexey Kardashevskiy ha scritto: > > I am having problems when migrate a guest via libvirt like this: > > > > virsh migrate --live --persistent --undefinesource --copy-storage-all > > --verbose --desturi qemu+ssh://legkvm/system --domain chig1 > > > > The XML used to create the guest is at the end of this mail. > > > > I see NBD FLUSH command after the destination QEMU received EOF for > > migration stream and this produces a crash in qcow2_co_flush_to_os() as > > s->lock is false or s->l2_table_cache is NULL. > > > > Max, Kevin, could the fix be something like this? > > diff --git a/block/qcow2.c b/block/qcow2.c > index 0daf25c..e7459ea 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -1442,6 +1442,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp) > memcpy(&aes_decrypt_key, &s->aes_decrypt_key, sizeof(aes_decrypt_key)); > } > > + qemu_co_mutex_lock(&s->lock); > qcow2_close(bs); > > bdrv_invalidate_cache(bs->file, &local_err); > @@ -1455,6 +1456,7 @@ static void qcow2_invalidate_cache(BlockDriverState *bs, Error **errp) > > ret = qcow2_open(bs, options, flags, &local_err); > QDECREF(options); > + qemu_co_mutex_unlock(&s->lock); > if (local_err) { > error_setg(errp, "Could not reopen qcow2 layer: %s", > error_get_pretty(local_err)); > > On top of this, *_invalidate_cache needs to be marked as coroutine_fn. I think bdrv_invalidate_cache() really needs to call bdrv_drain_all() before starting to reopen stuff. There could be requests in flight without holding the lock and if you can indeed reopen their BDS under their feet without breaking things (I doubt it), that would be pure luck. Kevin