All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Bin Wu <wu.wubin@huawei.com>
Cc: pbonzini@redhat.com, famz@redhat.com, qemu-devel@nongnu.org,
	stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH v3] qemu-coroutine: segfault when restarting co_queue
Date: Tue, 10 Feb 2015 11:32:59 +0100	[thread overview]
Message-ID: <20150210103259.GC5202@noname.str.redhat.com> (raw)
In-Reply-To: <1423545380-8748-1-git-send-email-wu.wubin@huawei.com>

Am 10.02.2015 um 06:16 hat Bin Wu geschrieben:
> From: Bin Wu <wu.wubin@huawei.com>
> 
> We tested VMs migration with their disk images by drive_mirror. With
> migration, two VMs copyed large files between each other. During the
> test, a segfault occured. The stack was as follow:
> 
> 00)  0x00007fa5a0c63fc5 in qemu_co_queue_run_restart (co=0x7fa5a1798648) at
> qemu-coroutine-lock.c:66
> 01)  0x00007fa5a0c63bed in coroutine_swap (from=0x7fa5a178f160,
> to=0x7fa5a1798648) at qemu-coroutine.c:97
> 02)  0x00007fa5a0c63dbf in qemu_coroutine_yield () at qemu-coroutine.c:140
> 03)  0x00007fa5a0c9e474 in nbd_co_receive_reply (s=0x7fa5a1a3cfd0,
> request=0x7fa28c2ffa10, reply=0x7fa28c2ffa30, qiov=0x0, offset=0) at
> block/nbd-client.c:165
> 04)  0x00007fa5a0c9e8b5 in nbd_co_writev_1 (client=0x7fa5a1a3cfd0,
> sector_num=8552704, nb_sectors=2040, qiov=0x7fa5a1757468, offset=0) at
> block/nbd-client.c:262
> 05)  0x00007fa5a0c9e9dd in nbd_client_session_co_writev (client=0x7fa5a1a3cfd0,
> sector_num=8552704, nb_sectors=2048, qiov=0x7fa5a1757468) at
> block/nbd-client.c:296
> 06)  0x00007fa5a0c9dda1 in nbd_co_writev (bs=0x7fa5a198fcb0, sector_num=8552704,
> nb_sectors=2048, qiov=0x7fa5a1757468) at block/nbd.c:291
> 07)  0x00007fa5a0c509a4 in bdrv_aligned_pwritev (bs=0x7fa5a198fcb0,
> req=0x7fa28c2ffbb0, offset=4378984448, bytes=1048576, qiov=0x7fa5a1757468,
> flags=0) at block.c:3321
> 08)  0x00007fa5a0c50f3f in bdrv_co_do_pwritev (bs=0x7fa5a198fcb0,
> offset=4378984448, bytes=1048576, qiov=0x7fa5a1757468, flags=(unknown: 0)) at
> block.c:3447
> 09)  0x00007fa5a0c51007 in bdrv_co_do_writev (bs=0x7fa5a198fcb0,
> sector_num=8552704, nb_sectors=2048, qiov=0x7fa5a1757468, flags=(unknown: 0)) at
> block.c:3471
> 10) 0x00007fa5a0c51074 in bdrv_co_writev (bs=0x7fa5a198fcb0, sector_num=8552704,
> nb_sectors=2048, qiov=0x7fa5a1757468) at block.c:3480
> 11) 0x00007fa5a0c652ec in raw_co_writev (bs=0x7fa5a198c110, sector_num=8552704,
> nb_sectors=2048, qiov=0x7fa5a1757468) at block/raw_bsd.c:62
> 12) 0x00007fa5a0c509a4 in bdrv_aligned_pwritev (bs=0x7fa5a198c110,
> req=0x7fa28c2ffe30, offset=4378984448, bytes=1048576, qiov=0x7fa5a1757468,
> flags=0) at block.c:3321
> 13) 0x00007fa5a0c50f3f in bdrv_co_do_pwritev (bs=0x7fa5a198c110,
> offset=4378984448, bytes=1048576, qiov=0x7fa5a1757468, flags=(unknown: 0)) at
> block.c:3447
> 14) 0x00007fa5a0c51007 in bdrv_co_do_writev (bs=0x7fa5a198c110,
> sector_num=8552704, nb_sectors=2048, qiov=0x7fa5a1757468, flags=(unknown: 0)) at
> block.c:3471
> 15) 0x00007fa5a0c542b3 in bdrv_co_do_rw (opaque=0x7fa5a17a0000) at block.c:4706
> 16) 0x00007fa5a0c64e6e in coroutine_trampoline (i0=-1585909408, i1=32677) at
> coroutine-ucontext.c:121
> 17) 0x00007fa59dc5aa50 in __correctly_grouped_prefixwc () from /lib64/libc.so.6
> 18) 0x0000000000000000 in ?? ()
> 
> After analyzing the stack and reviewing the code, we find the
> qemu_co_queue_run_restart should not be put in the coroutine_swap function which
> can be invoked by qemu_coroutine_enter or qemu_coroutine_yield. Only
> qemu_coroutine_enter needs to restart the co_queue.
> 
> The error scenario is as follow: coroutine C1 enters C2, C2 yields
> back to C1, then C1 ternimates and the related coroutine memory
> becomes invalid. After a while, the C2 coroutine is entered again.
> At this point, C1 is used as a parameter passed to
> qemu_co_queue_run_restart. Therefore, qemu_co_queue_run_restart
> accesses an invalid memory and a segfault error ocurrs.
> 
> The qemu_co_queue_run_restart function re-enters coroutines waiting
> in the co_queue. However, this function should be only used int the
> qemu_coroutine_enter context. Only in this context, when the current
> coroutine gets execution control again(after the execution of
> qemu_coroutine_switch), we can restart the target coutine because the
> target coutine has yielded back to the current coroutine or it has
> terminated.
> 
> First we want to put qemu_co_queue_run_restart in qemu_coroutine_enter,
> but we find we can not access the target coroutine if it terminates.
> 
> Signed-off-by: Bin Wu <wu.wubin@huawei.com>

As I replied for v2, I'll send a series with a simpler fix and some
cleanup that should result in a nicer design in the end.

Kevin

  reply	other threads:[~2015-02-10 10:33 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-10  5:16 [Qemu-devel] [PATCH v3] qemu-coroutine: segfault when restarting co_queue Bin Wu
2015-02-10 10:32 ` Kevin Wolf [this message]
2015-02-10 11:14   ` Bin Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150210103259.GC5202@noname.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=famz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=wu.wubin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.