From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34801) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKkYl-0000DV-QC for qemu-devel@nongnu.org; Mon, 09 Feb 2015 04:23:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YKkYi-0004Cl-Jb for qemu-devel@nongnu.org; Mon, 09 Feb 2015 04:23:15 -0500 Received: from mail-we0-x234.google.com ([2a00:1450:400c:c03::234]:63316) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YKkYi-0004CY-Ct for qemu-devel@nongnu.org; Mon, 09 Feb 2015 04:23:12 -0500 Received: by mail-we0-f180.google.com with SMTP id m14so25534658wev.11 for ; Mon, 09 Feb 2015 01:23:11 -0800 (PST) Sender: Paolo Bonzini Message-ID: <54D87C7B.9010600@redhat.com> Date: Mon, 09 Feb 2015 10:23:07 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1423302708-7900-1-git-send-email-wu.wubin@huawei.com> In-Reply-To: <1423302708-7900-1-git-send-email-wu.wubin@huawei.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH] fix the co_queue multi-adding bug List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: w00214312 , qemu-devel@nongnu.org Cc: kwolf@redhat.com, famz@redhat.com, stefanha@redhat.com On 07/02/2015 10:51, w00214312 wrote: > From: Bin Wu > > When we test the drive_mirror between different hosts by ndb devices, > we find that, during the cancel phase the qemu process crashes sometimes. > By checking the crash core file, we find the stack as follows, which means > a coroutine re-enter error occurs: This bug probably can be fixed simply by delaying the setting of recv_coroutine. What are the symptoms if you only apply your "qemu-coroutine-lock: fix co_queue multi-adding bug" patch but not "qemu-coroutine: fix qemu_co_queue_run_restart error"? Can you try the patch below? (Compile-tested only). diff --git a/block/nbd-client.c b/block/nbd-client.c index 6e1c97c..23d6a71 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -104,10 +104,21 @@ static int nbd_co_send_request(NbdClientSession *s, QEMUIOVector *qiov, int offset) { AioContext *aio_context; - int rc, ret; + int rc, ret, i; qemu_co_mutex_lock(&s->send_mutex); + + for (i = 0; i < MAX_NBD_REQUESTS; i++) { + if (s->recv_coroutine[i] == NULL) { + s->recv_coroutine[i] = qemu_coroutine_self(); + break; + } + } + + assert(i < MAX_NBD_REQUESTS); + request->handle = INDEX_TO_HANDLE(s, i); s->send_coroutine = qemu_coroutine_self(); + aio_context = bdrv_get_aio_context(s->bs); aio_set_fd_handler(aio_context, s->sock, nbd_reply_ready, nbd_restart_write, s); @@ -164,8 +175,6 @@ static void nbd_co_receive_reply(NbdClientSession *s, static void nbd_coroutine_start(NbdClientSession *s, struct nbd_request *request) { - int i; - /* Poor man semaphore. The free_sema is locked when no other request * can be accepted, and unlocked after receiving one reply. */ if (s->in_flight >= MAX_NBD_REQUESTS - 1) { @@ -174,15 +183,7 @@ static void nbd_coroutine_start(NbdClientSession *s, } s->in_flight++; - for (i = 0; i < MAX_NBD_REQUESTS; i++) { - if (s->recv_coroutine[i] == NULL) { - s->recv_coroutine[i] = qemu_coroutine_self(); - break; - } - } - - assert(i < MAX_NBD_REQUESTS); - request->handle = INDEX_TO_HANDLE(s, i); + /* s->recv_coroutine[i] is set as soon as we get the send_lock. */ } static void nbd_coroutine_end(NbdClientSession *s,