From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58930) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTsFJ-0005t5-Pi for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:52:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XTsFD-00058d-L0 for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:52:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:5250) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XTsFD-000583-CP for qemu-devel@nongnu.org; Tue, 16 Sep 2014 08:52:31 -0400 Date: Tue, 16 Sep 2014 14:52:23 +0200 From: Kevin Wolf Message-ID: <20140916125223.GC4886@noname.str.redhat.com> References: <5416C46D.7040105@ozlabs.ru> <541826CA.7050607@ozlabs.ru> <541828BF.8090301@redhat.com> <20140916123431.GB4886@noname.str.redhat.com> <54182EAE.4000802@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54182EAE.4000802@redhat.com> Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Alexey Kardashevskiy , "qemu-devel@nongnu.org" , Max Reitz , Stefan Hajnoczi , "Dr. David Alan Gilbert" Am 16.09.2014 um 14:35 hat Paolo Bonzini geschrieben: > Il 16/09/2014 14:34, Kevin Wolf ha scritto: > > I think bdrv_invalidate_cache() really needs to call bdrv_drain_all() > > before starting to reopen stuff. There could be requests in flight > > without holding the lock and if you can indeed reopen their BDS under > > their feet without breaking things (I doubt it), that would be pure > > luck. > > But even that's not enough without a lock if .bdrv_invalidate_cache (the > callback) is called from a coroutine. As soon as it yields, another > request can come in, for example from the NBD server. Yes, that's true. We can't fix this problem in qcow2, though, because it's a more general one. I think we must make sure that bdrv_invalidate_cache() doesn't yield. Either by forbidding to run bdrv_invalidate_cache() in a coroutine and moving the problem to the caller (where and why is it even called from a coroutine?), or possibly by creating a new coroutine for the driver callback and running that in a nested event loop that only handles bdrv_invalidate_cache() callbacks, so that the NBD server doesn't get a chance to process new requests in this thread. Forbidding to run in a coroutine sounds easier, but I don't see yet which caller would have to be fixed. Kevin