qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Bin Wu <wu.wubin@huawei.com>
Cc: kwolf@redhat.com, pbonzini@redhat.com, famz@redhat.com,
	qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH v2] nbd: fix the co_queue multi-adding bug
Date: Thu, 12 Feb 2015 17:03:40 +0000	[thread overview]
Message-ID: <20150212170340.GG4054@stefanha-thinkpad.redhat.com> (raw)
In-Reply-To: <1423552846-3896-1-git-send-email-wu.wubin@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 3416 bytes --]

On Tue, Feb 10, 2015 at 03:20:46PM +0800, Bin Wu wrote:
> From: Bin Wu <wu.wubin@huawei.com>
> 
> When we tested the VM migartion between different hosts with NBD
> devices, we found if we sent a cancel command after the drive_mirror
> was just started, a coroutine re-enter error would occur. The stack
> was as follow:
> 
> (gdb) bt
> 00)  0x00007fdfc744d885 in raise () from /lib64/libc.so.6
> 01)  0x00007fdfc744ee61 in abort () from /lib64/libc.so.6
> 02)  0x00007fdfca467cc5 in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0)
> at qemu-coroutine.c:118
> 03)  0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedb400) at
> qemu-coroutine-lock.c:59
> 04)  0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8,
> to=0x7fdfcaedb400) at qemu-coroutine.c:96
> 05)  0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedb400, opaque=0x0)
> at qemu-coroutine.c:123
> 06)  0x00007fdfca467f6c in qemu_co_queue_run_restart (co=0x7fdfcaedbdc0) at
> qemu-coroutine-lock.c:59
> 07)  0x00007fdfca467be5 in coroutine_swap (from=0x7fdfcaf3c4e8,
> to=0x7fdfcaedbdc0) at qemu-coroutine.c:96
> 08)  0x00007fdfca467cea in qemu_coroutine_enter (co=0x7fdfcaedbdc0, opaque=0x0)
> at qemu-coroutine.c:123
> 09)  0x00007fdfca4a1fa4 in nbd_recv_coroutines_enter_all (s=0x7fdfcaef7dd0) at
> block/nbd-client.c:41
> 10) 0x00007fdfca4a1ff9 in nbd_teardown_connection (client=0x7fdfcaef7dd0) at
> block/nbd-client.c:50
> 11) 0x00007fdfca4a20f0 in nbd_reply_ready (opaque=0x7fdfcaef7dd0) at
> block/nbd-client.c:92
> 12) 0x00007fdfca45ed80 in aio_dispatch (ctx=0x7fdfcae15e90) at aio-posix.c:144
> 13) 0x00007fdfca45ef1b in aio_poll (ctx=0x7fdfcae15e90, blocking=false) at
> aio-posix.c:222
> 14) 0x00007fdfca448c34 in aio_ctx_dispatch (source=0x7fdfcae15e90, callback=0x0,
> user_data=0x0) at async.c:212
> 15) 0x00007fdfc8f2f69a in g_main_context_dispatch () from
> /usr/lib64/libglib-2.0.so.0
> 16) 0x00007fdfca45c391 in glib_pollfds_poll () at main-loop.c:190
> 17) 0x00007fdfca45c489 in os_host_main_loop_wait (timeout=1483677098) at
> main-loop.c:235
> 18) 0x00007fdfca45c57b in main_loop_wait (nonblocking=0) at main-loop.c:484
> 19) 0x00007fdfca25f403 in main_loop () at vl.c:2249
> 20) 0x00007fdfca266fc2 in main (argc=42, argv=0x7ffff517d638,
> envp=0x7ffff517d790) at vl.c:4814
> 
> We find the nbd_recv_coroutines_enter_all function (triggered by a cancel
> command or a network connection breaking down) will enter a coroutine which
> is waiting for the sending lock. If the lock is still held by another coroutine,
> the entering coroutine will be added into the co_queue again. Latter, when the
> lock is released, a coroutine re-enter error will occur.
> 
> This bug can be fixed simply by delaying the setting of recv_coroutine as
> suggested by paolo. After applying this patch, we have tested the cancel
> operation in mirror phase looply for more than 5 hous and everything is fine.
> Without this patch, a coroutine re-enter error will occur in 5 minutes.
> 
> Signed-off-by: Bn Wu <wu.wubin@huawei.com>
> ---
> v2: fix the coroutine re-enter bug in NBD code, not in coroutine infrastructure
> as suggested by paolo and kevin.
> ---
>  block/nbd-client.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

      parent reply	other threads:[~2015-02-12 17:03 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-10  7:20 [Qemu-devel] [PATCH v2] nbd: fix the co_queue multi-adding bug Bin Wu
2015-02-10  9:06 ` Paolo Bonzini
2015-02-12 17:03 ` Stefan Hajnoczi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150212170340.GG4054@stefanha-thinkpad.redhat.com \
    --to=stefanha@gmail.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=wu.wubin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).