From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34801)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1YKkYl-0000DV-QC
	for qemu-devel@nongnu.org; Mon, 09 Feb 2015 04:23:17 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1YKkYi-0004Cl-Jb
	for qemu-devel@nongnu.org; Mon, 09 Feb 2015 04:23:15 -0500
Received: from mail-we0-x234.google.com ([2a00:1450:400c:c03::234]:63316)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1YKkYi-0004CY-Ct
	for qemu-devel@nongnu.org; Mon, 09 Feb 2015 04:23:12 -0500
Received: by mail-we0-f180.google.com with SMTP id m14so25534658wev.11
	for <qemu-devel@nongnu.org>; Mon, 09 Feb 2015 01:23:11 -0800 (PST)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <54D87C7B.9010600@redhat.com>
Date: Mon, 09 Feb 2015 10:23:07 +0100
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1423302708-7900-1-git-send-email-wu.wubin@huawei.com>
In-Reply-To: <1423302708-7900-1-git-send-email-wu.wubin@huawei.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH] fix the co_queue multi-adding bug
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: w00214312 <wu.wubin@huawei.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, famz@redhat.com, stefanha@redhat.com


On 07/02/2015 10:51, w00214312 wrote:
> From: Bin Wu <wu.wubin@huawei.com>
> 
> When we test the drive_mirror between different hosts by ndb devices, 
> we find that, during the cancel phase the qemu process crashes sometimes.
> By checking the crash core file, we find the stack as follows, which means
> a coroutine re-enter error occurs:

This bug probably can be fixed simply by delaying the setting of
recv_coroutine.

What are the symptoms if you only apply your "qemu-coroutine-lock: fix
co_queue multi-adding bug" patch but not "qemu-coroutine: fix
qemu_co_queue_run_restart error"?

Can you try the patch below?  (Compile-tested only).

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 6e1c97c..23d6a71 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -104,10 +104,21 @@ static int nbd_co_send_request(NbdClientSession *s,
     QEMUIOVector *qiov, int offset)
 {
     AioContext *aio_context;
-    int rc, ret;
+    int rc, ret, i;
 
     qemu_co_mutex_lock(&s->send_mutex);
+
+    for (i = 0; i < MAX_NBD_REQUESTS; i++) {
+        if (s->recv_coroutine[i] == NULL) {
+            s->recv_coroutine[i] = qemu_coroutine_self();
+            break;
+        }
+    }
+
+    assert(i < MAX_NBD_REQUESTS);
+    request->handle = INDEX_TO_HANDLE(s, i);
     s->send_coroutine = qemu_coroutine_self();
+
     aio_context = bdrv_get_aio_context(s->bs);
     aio_set_fd_handler(aio_context, s->sock,
                        nbd_reply_ready, nbd_restart_write, s);
@@ -164,8 +175,6 @@ static void nbd_co_receive_reply(NbdClientSession *s,
 static void nbd_coroutine_start(NbdClientSession *s,
    struct nbd_request *request)
 {
-    int i;
-
     /* Poor man semaphore.  The free_sema is locked when no other request
      * can be accepted, and unlocked after receiving one reply.  */
     if (s->in_flight >= MAX_NBD_REQUESTS - 1) {
@@ -174,15 +183,7 @@ static void nbd_coroutine_start(NbdClientSession *s,
     }
     s->in_flight++;
 
-    for (i = 0; i < MAX_NBD_REQUESTS; i++) {
-        if (s->recv_coroutine[i] == NULL) {
-            s->recv_coroutine[i] = qemu_coroutine_self();
-            break;
-        }
-    }
-
-    assert(i < MAX_NBD_REQUESTS);
-    request->handle = INDEX_TO_HANDLE(s, i);
+    /* s->recv_coroutine[i] is set as soon as we get the send_lock.  */
 }
 
 static void nbd_coroutine_end(NbdClientSession *s,