From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37775)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wu.wubin@huawei.com>) id 1YKzMK-0006dl-Ac
	for qemu-devel@nongnu.org; Mon, 09 Feb 2015 20:11:25 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <wu.wubin@huawei.com>) id 1YKzMF-0000C4-FY
	for qemu-devel@nongnu.org; Mon, 09 Feb 2015 20:11:24 -0500
Received: from szxga02-in.huawei.com ([119.145.14.65]:12833)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wu.wubin@huawei.com>) id 1YKzME-0000Bo-TZ
	for qemu-devel@nongnu.org; Mon, 09 Feb 2015 20:11:19 -0500
Message-ID: <54D95A06.4090500@huawei.com>
Date: Tue, 10 Feb 2015 09:08:22 +0800
From: Bin Wu <wu.wubin@huawei.com>
MIME-Version: 1.0
References: <1423302708-7900-1-git-send-email-wu.wubin@huawei.com>
	<1423302708-7900-2-git-send-email-wu.wubin@huawei.com>
	<20150209081252.GA29334@ad.nay.redhat.com>
	<54D87F8A.7000509@huawei.com>
	<20150209101219.GB3963@noname.str.redhat.com>
In-Reply-To: <20150209101219.GB3963@noname.str.redhat.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] qemu-coroutine-lock: fix co_queue
	multi-adding bug
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: pbonzini@redhat.com, Fam Zheng <famz@redhat.com>, qemu-devel@nongnu.org, stefanha@redhat.com

On 2015/2/9 18:12, Kevin Wolf wrote:
> Am 09.02.2015 um 10:36 hat Bin Wu geschrieben:
>> On 2015/2/9 16:12, Fam Zheng wrote:
>>> On Sat, 02/07 17:51, w00214312 wrote:
>>>> From: Bin Wu <wu.wubin@huawei.com>
>>>>
>>>> When a coroutine holds a lock, other coroutines who want to get
>>>> the lock must wait on a co_queue by adding themselves to the
>>>> CoQueue. However, if a waiting coroutine is woken up with the
>>>> lock still be holding by other coroutine, this waiting coroutine
>>>
>>> Could you explain who wakes up the waiting coroutine? Maybe the bug is that
>>> it shouldn't be awaken in the first place.
>>>
>>
>> During the mirror phase with nbd devices, if we send a cancel command or
>> physical network breaks down, the source qemu process will receive a readable
>> event and the main loop will invoke nbd_reply_ready to deal with it. This
>> function finds the connection is down and then goes into
>> nbd_teardown_connection. nbd_teardown_connection wakes up all working coroutines
>> by nbd_recv_coroutines_enter_all. These coroutines include the one which holds
>> the sending lock, the ones which wait for the lock, and the ones which wait for
>> receiving messages.
> 
> This is the bug. It's not allowed to reenter a coroutine if you don't
> know its state. NBD needs a fix, not the the generic coroutine
> infrastructure.
> 
> If we want to change anything in the lock implementation, it should be
> adding an assertion to catch such violations of the rule. (Untested, but
> I think the assertion should hold true.)
> 
> Kevin
> 
> diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
> index e4860ae..25fc111 100644
> --- a/qemu-coroutine-lock.c
> +++ b/qemu-coroutine-lock.c
> @@ -123,9 +123,8 @@ void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
>  
>      trace_qemu_co_mutex_lock_entry(mutex, self);
>  
> -    while (mutex->locked) {
> -        qemu_co_queue_wait(&mutex->queue);
> -    }
> +    qemu_co_queue_wait(&mutex->queue);
> +    assert(!mutex->locked);
>  
>      mutex->locked = true;
> 
> .
> 

I see. paolo's patch in his first reply can fix the bug in NBD (not in
coroutine). I will complete that patch to do some tests and send another version
latter.

-- 
Bin Wu