All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Bin Wu <wu.wubin@huawei.com>, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, famz@redhat.com, stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH v2] nbd: fix the co_queue multi-adding bug
Date: Tue, 10 Feb 2015 10:06:57 +0100	[thread overview]
Message-ID: <54D9CA31.80807@redhat.com> (raw)
In-Reply-To: <1423552846-3896-1-git-send-email-wu.wubin@huawei.com>



On 10/02/2015 08:20, Bin Wu wrote:
> From: Bin Wu <wu.wubin@huawei.com>
> 
> When we tested the VM migartion between different hosts with NBD
> devices, we found if we sent a cancel command after the drive_mirror
> was just started, a coroutine re-enter error would occur. The stack
> was as follow:
> 
> (gdb) bt
> 00)  0x00007fdfc744d885 in raise () from /lib64/libc.so.6
> 01)  0x00007fdfc744ee61 in abort () from /lib64/libc.so.6
> 02)  0x00007fdfca467cc5 in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0)
> at qemu-coroutine.c:118
> 03)  0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedb400) at
> qemu-coroutine-lock.c:59
> 04)  0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8,
> to=0x7fdfcaedb400) at qemu-coroutine.c:96
> 05)  0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0)
> at qemu-coroutine.c:123
> 06)  0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedbdc0) at
> qemu-coroutine-lock.c:59
> 07)  0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8,
> to=0x7fdfcaedbdc0) at qemu-coroutine.c:96
> 08)  0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedbdc0, opaque=0x0)
> at qemu-coroutine.c:123
> 09)  0x00007fdfca4a1fa4 in nbd_recv_coroutines_enter_all (s=0x7fdfcaef7dd0) at
> block/nbd-client.c:41
> 10) 0x00007fdfca4a1ff9 in nbd_teardown_connection (client=0x7fdfcaef7dd0) at
> block/nbd-client.c:50
> 11) 0x00007fdfca4a20f0 in nbd_reply_ready (opaque=0x7fdfcaef7dd0) at
> block/nbd-client.c:92
> 12) 0x00007fdfca45ed80 in aio_dispatch (ctx=0x7fdfcae15e90) at aio-posix.c:144
> 13) 0x00007fdfca45ef1b in aio_poll (ctx=0x7fdfcae15e90, blocking=false) at
> aio-posix.c:222
> 14) 0x00007fdfca448c34 in aio_ctx_dispatch (source=0x7fdfcae15e90, callback=0x0,
> user_data=0x0) at async.c:212
> 15) 0x00007fdfc8f2f69a in g_main_context_dispatch () from
> /usr/lib64/libglib-2.0.so.0
> 16) 0x00007fdfca45c391 in glib_pollfds_poll () at main-loop.c:190
> 17) 0x00007fdfca45c489 in os_host_main_loop_wait (timeout=1483677098) at
> main-loop.c:235
> 18) 0x00007fdfca45c57b in main_loop_wait (nonblocking=0) at main-loop.c:484
> 19) 0x00007fdfca25f403 in main_loop () at vl.c:2249
> 20) 0x00007fdfca266fc2 in main (argc=42, argv=0x7ffff517d638,
> envp=0x7ffff517d790) at vl.c:4814
> 
> We find the nbd_recv_coroutines_enter_all function (triggered by a cancel
> command or a network connection breaking down) will enter a coroutine which
> is waiting for the sending lock. If the lock is still held by another coroutine,
> the entering coroutine will be added into the co_queue again. Latter, when the
> lock is released, a coroutine re-enter error will occur.
> 
> This bug can be fixed simply by delaying the setting of recv_coroutine as
> suggested by paolo. After applying this patch, we have tested the cancel
> operation in mirror phase looply for more than 5 hous and everything is fine.
> Without this patch, a coroutine re-enter error will occur in 5 minutes.
> 
> Signed-off-by: Bn Wu <wu.wubin@huawei.com>
> ---
> v2: fix the coroutine re-enter bug in NBD code, not in coroutine infrastructure
> as suggested by paolo and kevin.
> ---
>  block/nbd-client.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 28bfb62..1b0f7c7 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -104,10 +104,21 @@ static int nbd_co_send_request(NbdClientSession *s,
>      QEMUIOVector *qiov, int offset)
>  {
>      AioContext *aio_context;
> -    int rc, ret;
> +    int rc, ret, i;
>  
>      qemu_co_mutex_lock(&s->send_mutex);
> +
> +    for (i = 0; i < MAX_NBD_REQUESTS; i++) {
> +        if (s->recv_coroutine[i] == NULL) {
> +            s->recv_coroutine[i] = qemu_coroutine_self();
> +            break;
> +        }
> +    }
> +
> +    assert(i < MAX_NBD_REQUESTS);
> +    request->handle = INDEX_TO_HANDLE(s, i);
>      s->send_coroutine = qemu_coroutine_self();
> +
>      aio_context = bdrv_get_aio_context(s->bs);
>      aio_set_fd_handler(aio_context, s->sock,
>                         nbd_reply_ready, nbd_restart_write, s);
> @@ -164,8 +175,6 @@ static void nbd_co_receive_reply(NbdClientSession *s,
>  static void nbd_coroutine_start(NbdClientSession *s,
>     struct nbd_request *request)
>  {
> -    int i;
> -
>      /* Poor man semaphore.  The free_sema is locked when no other request
>       * can be accepted, and unlocked after receiving one reply.  */
>      if (s->in_flight >= MAX_NBD_REQUESTS - 1) {
> @@ -174,15 +183,7 @@ static void nbd_coroutine_start(NbdClientSession *s,
>      }
>      s->in_flight++;
>  
> -    for (i = 0; i < MAX_NBD_REQUESTS; i++) {
> -        if (s->recv_coroutine[i] == NULL) {
> -            s->recv_coroutine[i] = qemu_coroutine_self();
> -            break;
> -        }
> -    }
> -
> -    assert(i < MAX_NBD_REQUESTS);
> -    request->handle = INDEX_TO_HANDLE(s, i);
> +    /* s->recv_coroutine[i] is set as soon as we get the send_lock.  */
>  }
>  
>  static void nbd_coroutine_end(NbdClientSession *s,
> 

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

  reply	other threads:[~2015-02-10  9:13 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-10  7:20 [Qemu-devel] [PATCH v2] nbd: fix the co_queue multi-adding bug Bin Wu
2015-02-10  9:06 ` Paolo Bonzini [this message]
2015-02-12 17:03 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54D9CA31.80807@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=wu.wubin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.