From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37775) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKzMK-0006dl-Ac for qemu-devel@nongnu.org; Mon, 09 Feb 2015 20:11:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YKzMF-0000C4-FY for qemu-devel@nongnu.org; Mon, 09 Feb 2015 20:11:24 -0500 Received: from szxga02-in.huawei.com ([119.145.14.65]:12833) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKzME-0000Bo-TZ for qemu-devel@nongnu.org; Mon, 09 Feb 2015 20:11:19 -0500 Message-ID: <54D95A06.4090500@huawei.com> Date: Tue, 10 Feb 2015 09:08:22 +0800 From: Bin Wu MIME-Version: 1.0 References: <1423302708-7900-1-git-send-email-wu.wubin@huawei.com> <1423302708-7900-2-git-send-email-wu.wubin@huawei.com> <20150209081252.GA29334@ad.nay.redhat.com> <54D87F8A.7000509@huawei.com> <20150209101219.GB3963@noname.str.redhat.com> In-Reply-To: <20150209101219.GB3963@noname.str.redhat.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] qemu-coroutine-lock: fix co_queue multi-adding bug List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: pbonzini@redhat.com, Fam Zheng , qemu-devel@nongnu.org, stefanha@redhat.com On 2015/2/9 18:12, Kevin Wolf wrote: > Am 09.02.2015 um 10:36 hat Bin Wu geschrieben: >> On 2015/2/9 16:12, Fam Zheng wrote: >>> On Sat, 02/07 17:51, w00214312 wrote: >>>> From: Bin Wu >>>> >>>> When a coroutine holds a lock, other coroutines who want to get >>>> the lock must wait on a co_queue by adding themselves to the >>>> CoQueue. However, if a waiting coroutine is woken up with the >>>> lock still be holding by other coroutine, this waiting coroutine >>> >>> Could you explain who wakes up the waiting coroutine? Maybe the bug is that >>> it shouldn't be awaken in the first place. >>> >> >> During the mirror phase with nbd devices, if we send a cancel command or >> physical network breaks down, the source qemu process will receive a readable >> event and the main loop will invoke nbd_reply_ready to deal with it. This >> function finds the connection is down and then goes into >> nbd_teardown_connection. nbd_teardown_connection wakes up all working coroutines >> by nbd_recv_coroutines_enter_all. These coroutines include the one which holds >> the sending lock, the ones which wait for the lock, and the ones which wait for >> receiving messages. > > This is the bug. It's not allowed to reenter a coroutine if you don't > know its state. NBD needs a fix, not the the generic coroutine > infrastructure. > > If we want to change anything in the lock implementation, it should be > adding an assertion to catch such violations of the rule. (Untested, but > I think the assertion should hold true.) > > Kevin > > diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c > index e4860ae..25fc111 100644 > --- a/qemu-coroutine-lock.c > +++ b/qemu-coroutine-lock.c > @@ -123,9 +123,8 @@ void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex) > > trace_qemu_co_mutex_lock_entry(mutex, self); > > - while (mutex->locked) { > - qemu_co_queue_wait(&mutex->queue); > - } > + qemu_co_queue_wait(&mutex->queue); > + assert(!mutex->locked); > > mutex->locked = true; > > . > I see. paolo's patch in his first reply can fix the bug in NBD (not in coroutine). I will complete that patch to do some tests and send another version latter. -- Bin Wu