From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54832) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XUtTS-0005Pg-Nh for qemu-devel@nongnu.org; Fri, 19 Sep 2014 04:23:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XUtTM-0004BH-Ai for qemu-devel@nongnu.org; Fri, 19 Sep 2014 04:23:26 -0400 Received: from mail-pa0-f50.google.com ([209.85.220.50]:63187) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XUtTM-0004AB-59 for qemu-devel@nongnu.org; Fri, 19 Sep 2014 04:23:20 -0400 Received: by mail-pa0-f50.google.com with SMTP id eu11so3384672pac.9 for ; Fri, 19 Sep 2014 01:23:13 -0700 (PDT) Message-ID: <541BE7E8.5060504@ozlabs.ru> Date: Fri, 19 Sep 2014 18:23:04 +1000 From: Alexey Kardashevskiy MIME-Version: 1.0 References: <5416C46D.7040105@ozlabs.ru> <541826CA.7050607@ozlabs.ru> <541828BF.8090301@redhat.com> <20140917090615.GB10699@stefanha-thinkpad.redhat.com> <54195395.9010201@redhat.com> <5419902B.1030309@ozlabs.ru> <541A50F6.4060703@ozlabs.ru> <541AAC64.4020006@redhat.com> In-Reply-To: <541AAC64.4020006@redhat.com> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] migration: qemu-coroutine-lock.c:141: qemu_co_mutex_unlock: Assertion `mutex->locked == 1' failed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Stefan Hajnoczi Cc: Kevin Wolf , "qemu-devel@nongnu.org" , Max Reitz , Stefan Hajnoczi , "Dr. David Alan Gilbert" On 09/18/2014 07:56 PM, Paolo Bonzini wrote: > Il 18/09/2014 05:26, Alexey Kardashevskiy ha scritto: >> On 09/18/2014 01:07 AM, Stefan Hajnoczi wrote: >>> On Wed, Sep 17, 2014 at 2:44 PM, Alexey Kardashevskiy wrote: >>>> On 09/17/2014 07:25 PM, Paolo Bonzini wrote: >>>> btw any better idea of a hack to try? Testers are pushing me - they want to >>>> upgrade the broken setup and I am blocking them :) Thanks! >>> >>> Paolo's qemu_co_mutex_lock(&s->lock) idea in qcow2_invalidate_cache() >>> is good. Have you tried that patch? >> >> >> Yes, did not help. >> >>> >>> I haven't checked the qcow2 code whether that works properly across >>> bdrv_close() (is the lock freed?) but in principle that's how you >>> protect against concurrent I/O. >> >> I thought we have to avoid qemu_coroutine_yield() in this particular case. >> I fail to see how the locks may help if we still do yeild. But the whole >> thing is already way behind of my understanding :) For example - how many >> BlockDriverState things are layered here? NBD -> QCOW2 -> RAW? > > No, this is an NBD server. So we have three users of the same QCOW2 > image: migration, NBD server and virtio disk (not active while the bug > happens, and thus not depicted): > > > NBD server -> QCOW2 <- migration > | > v > File > > The problem is that the NBD server accesses the QCOW2 image while > migration does qcow2_invalidate_cache. Ufff. Cool. Anyway, the qemu_co_mutex_lock(&s->lock) hack does not work as after qcow2_close() the lock is cleared and qemu_co_mutex_unlock(&s->lock) fails. Moving the lock to BlockDriverState caused weird side effects, debugging... -- Alexey