All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wen Congyang <wency@cn.fujitsu.com>
To: Bin Wu <wu.wubin@huawei.com>, Stefan Hajnoczi <stefanha@gmail.com>
Cc: kwolf@redhat.com, famz@redhat.com, boby.chen@huawei.com,
	subo7@huawei.com, kathy.wangting@huawei.com,
	rudy.zhangmin@huawei.com, qemu-devel@nongnu.org,
	arei.gonglei@huawei.com, stefanha@redhat.com,
	pbonzini@redhat.com, bruce.fon@huawei.com
Subject: Re: [Qemu-devel] [PATCH v2] qemu-coroutine: segfault when restarting co_queue
Date: Tue, 10 Feb 2015 12:49:05 +0800	[thread overview]
Message-ID: <54D98DC1.2010603@cn.fujitsu.com> (raw)
In-Reply-To: <54D97F85.4070200@huawei.com>

On 02/10/2015 11:48 AM, Bin Wu wrote:
> On 2015/2/10 11:16, Wen Congyang wrote:
>> On 02/09/2015 10:48 PM, Stefan Hajnoczi wrote:
>>> On Mon, Feb 09, 2015 at 02:50:39PM +0800, Bin Wu wrote:
>>>> From: Bin Wu <wu.wubin@huawei.com>
>>>>
>>>> We tested VMs migration with their disk images by drive_mirror. With
>>>> migration, two VMs copyed large files between each other. During the
>>>> test, a segfault occured. The stack was as follow:
>>>>
>>>> (gdb) bt
>>>> qemu-coroutine-lock.c:66
>>>> to=0x7fa5a1798648) at qemu-coroutine.c:97
>>>> request=0x7fa28c2ffa10, reply=0x7fa28c2ffa30, qiov=0x0, offset=0) at
>>>> block/nbd-client.c:165
>>>> sector_num=8552704, nb_sectors=2040, qiov=0x7fa5a1757468, offset=0) at
>>>> block/nbd-client.c:262
>>>> sector_num=8552704, nb_sectors=2048, qiov=0x7fa5a1757468) at
>>>> block/nbd-client.c:296
>>>> nb_sectors=2048, qiov=0x7fa5a1757468) at block/nbd.c:291
>>>> req=0x7fa28c2ffbb0, offset=4378984448, bytes=1048576, qiov=0x7fa5a1757468,
>>>> flags=0) at block.c:3321
>>>> offset=4378984448, bytes=1048576, qiov=0x7fa5a1757468, flags=(unknown: 0)) at
>>>> block.c:3447
>>>> sector_num=8552704, nb_sectors=2048, qiov=0x7fa5a1757468, flags=(unknown: 0)) at
>>>> block.c:3471
>>>> nb_sectors=2048, qiov=0x7fa5a1757468) at block.c:3480
>>>> nb_sectors=2048, qiov=0x7fa5a1757468) at block/raw_bsd.c:62
>>>> req=0x7fa28c2ffe30, offset=4378984448, bytes=1048576, qiov=0x7fa5a1757468,
>>>> flags=0) at block.c:3321
>>>> offset=4378984448, bytes=1048576, qiov=0x7fa5a1757468, flags=(unknown: 0)) at
>>>> block.c:3447
>>>> sector_num=8552704, nb_sectors=2048, qiov=0x7fa5a1757468, flags=(unknown: 0)) at
>>>> block.c:3471
>>>> coroutine-ucontext.c:121
>>>
>>> This backtrace is incomplete.  Where are the function names?  The
>>> parameter lists appear incomplete too.
>>>
>>>> After analyzing the stack and reviewing the code, we find the
>>>> qemu_co_queue_run_restart should not be put in the coroutine_swap function which
>>>> can be invoked by qemu_coroutine_enter or qemu_coroutine_yield. Only
>>>> qemu_coroutine_enter needs to restart the co_queue.
>>>>
>>>> The error scenario is as follow: coroutine C1 enters C2, C2 yields
>>>> back to C1, then C1 ternimates and the related coroutine memory
>>>> becomes invalid. After a while, the C2 coroutine is entered again.
>>>> At this point, C1 is used as a parameter passed to
>>>> qemu_co_queue_run_restart. Therefore, qemu_co_queue_run_restart
>>>> accesses an invalid memory and a segfault error ocurrs.
>>>>
>>>> The qemu_co_queue_run_restart function re-enters coroutines waiting
>>>> in the co_queue. However, this function should be only used int the
>>>> qemu_coroutine_enter context. Only in this context, when the current
>>>> coroutine gets execution control again(after the execution of
>>>> qemu_coroutine_switch), we can restart the target coutine because the
>>>> target coutine has yielded back to the current coroutine or it has
>>>> terminated.
>>>>
>>>> First we want to put qemu_co_queue_run_restart in qemu_coroutine_enter,
>>>> but we find we can not access the target coroutine if it terminates.
>>>
>>> This example captures the scenario you describe:
>>>
>>> diff --git a/qemu-coroutine.c b/qemu-coroutine.c
>>> index 525247b..883cbf5 100644
>>> --- a/qemu-coroutine.c
>>> +++ b/qemu-coroutine.c
>>> @@ -103,7 +103,10 @@ static void coroutine_swap(Coroutine *from, Coroutine *to)
>>>  {
>>>      CoroutineAction ret;
>>>  
>>> +    fprintf(stderr, "> %s from %p to %p\n", __func__, from, to);
>>>      ret = qemu_coroutine_switch(from, to, COROUTINE_YIELD);
>>> +    fprintf(stderr, "< %s from %p to %p switch %s\n", __func__, from, to,
>>> +            ret == COROUTINE_YIELD ? "yield" : "terminate");
>>>  
>>>      qemu_co_queue_run_restart(to);
>>>  
>>> @@ -111,6 +114,7 @@ static void coroutine_swap(Coroutine *from, Coroutine *to)
>>>      case COROUTINE_YIELD:
>>>          return;
>>>      case COROUTINE_TERMINATE:
>>> +        fprintf(stderr, "coroutine_delete %p\n", to);
>>>          trace_qemu_coroutine_terminate(to);
>>>          coroutine_delete(to);
>>>          return;
>>> diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
>>> index 27d1b6f..d44c428 100644
>>> --- a/tests/test-coroutine.c
>>> +++ b/tests/test-coroutine.c
>>> @@ -13,6 +13,7 @@
>>>  
>>>  #include <glib.h>
>>>  #include "block/coroutine.h"
>>> +#include "block/coroutine_int.h"
>>>  
>>>  /*
>>>   * Check that qemu_in_coroutine() works
>>> @@ -122,6 +123,35 @@ static void test_yield(void)
>>>      g_assert_cmpint(i, ==, 5); /* coroutine must yield 5 times */
>>>  }
>>>  
>>> +static void coroutine_fn c2_fn(void *opaque)
>>> +{
>>> +    fprintf(stderr, "c2 Part 1\n");
>>> +    qemu_coroutine_yield();
>>> +    fprintf(stderr, "c2 Part 2\n");
>>> +}
>>> +
>>> +static void coroutine_fn c1_fn(void *opaque)
>>> +{
>>> +    Coroutine *c2 = opaque;
>>> +
>>> +    fprintf(stderr, "c1 Part 1\n");
>>> +    qemu_coroutine_enter(c2, NULL);
>>> +    fprintf(stderr, "c1 Part 2\n");
>>> +}
>>> +
>>> +static void test_co_queue(void)
>>> +{
>>> +    Coroutine *c1;
>>> +    Coroutine *c2;
>>> +
>>> +    c1 = qemu_coroutine_create(c1_fn);
>>> +    c2 = qemu_coroutine_create(c2_fn);
>>> +
>>> +    qemu_coroutine_enter(c1, c2);
>>> +    memset(c1, 0xff, sizeof(Coroutine));
>>> +    qemu_coroutine_enter(c2, NULL);
>>> +}
>>> +
>>>  /*
>>>   * Check that creation, enter, and return work
>>>   */
>>> @@ -343,6 +373,7 @@ static void perf_cost(void)
>>>  int main(int argc, char **argv)
>>>  {
>>>      g_test_init(&argc, &argv, NULL);
>>> +    g_test_add_func("/basic/co_queue", test_co_queue);
>>>      g_test_add_func("/basic/lifecycle", test_lifecycle);
>>>      g_test_add_func("/basic/yield", test_yield);
>>>      g_test_add_func("/basic/nesting", test_nesting);
>>>
>>> Here is the output (with printfs in coroutine_swap):
>>>
>>> -> coroutine_swap from MAIN to C1
>>> c1 Part 1
>>> -> coroutine_swap from C1 to C2
>>> c2 Part 1
>>> -> coroutine_swap from C2 to C1
>>> <- coroutine_swap from C1 to C2 switch yield
>>> c1 Part 2
>>> <- coroutine_swap from MAIN to C1 switch terminate
>>> coroutine_delete C1
>>> -> coroutine_swap from MAIN to C2
>>> <- coroutine_swap from C2 to C1 switch yield  !!!
>>> c2 Part 2
>>> <- coroutine_swap from MAIN to C2 switch terminate
>>> coroutine_delete C2
>>>
>>> I have marked the problematic line with "!!!".  The to=C1 variable is
>>> used after C1 has been deleted.
>>
>> How to build the test program? make test? I meet the following errors:
>> make test
>> make -C tests/tcg test
>> make[1]: Entering directory '/work/src/qemu/tests/tcg'
>>   CC    test_path.o
>> In file included from /work/src/qemu/include/qemu-common.h:43:0,
>>                  from /work/src/qemu/util/cutils.c:24,
>>                  from test_path.c:3:
>> /work/src/qemu/include/glib-compat.h:19:18: fatal error: glib.h: No such file or directory
>>  #include <glib.h>
>>                   ^
>> compilation terminated.
>> /work/src/qemu/rules.mak:57: recipe for target 'test_path.o' failed
>> make[1]: *** [test_path.o] Error 1
>> make[1]: Leaving directory '/work/src/qemu/tests/tcg'
>> Makefile:428: recipe for target 'test' failed
>> make: *** [test] Error 2
>>
>> Do I miss something?
>>
>> Thanks
>> Wen Congyang
>>
> 
> First, you need to install glib2-devel package which includes the glib.h. Then
> you can use "make check" in the source tree to compile and run all tests (or use
> "make check-help" to get some helps).

I have installed glib2-devel, and "make check" can work, but "make test" cannot work...

Thanks
Wen Congyang

> 
>>>
>>> The test crashes since it writes 0xff to C1 after it has terminated.
>>>
>>>> Signed-off-by: Bin Wu <wu.wubin@huawei.com>
>>>> ---
>>>>  qemu-coroutine.c | 16 ++++++++++------
>>>>  1 file changed, 10 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/qemu-coroutine.c b/qemu-coroutine.c
>>>> index 525247b..cc0bdfa 100644
>>>> --- a/qemu-coroutine.c
>>>> +++ b/qemu-coroutine.c
>>>> @@ -99,29 +99,31 @@ static void coroutine_delete(Coroutine *co)
>>>>      qemu_coroutine_delete(co);
>>>>  }
>>>>  
>>>> -static void coroutine_swap(Coroutine *from, Coroutine *to)
>>>> +static CoroutineAction coroutine_swap(Coroutine *from, Coroutine *to)
>>>>  {
>>>>      CoroutineAction ret;
>>>>  
>>>>      ret = qemu_coroutine_switch(from, to, COROUTINE_YIELD);
>>>>  
>>>> -    qemu_co_queue_run_restart(to);
>>>> -
>>>>      switch (ret) {
>>>>      case COROUTINE_YIELD:
>>>> -        return;
>>>> +        break;
>>>>      case COROUTINE_TERMINATE:
>>>>          trace_qemu_coroutine_terminate(to);
>>>> +        qemu_co_queue_run_restart(to);
>>>>          coroutine_delete(to);
>>>> -        return;
>>>> +        break;
>>>>      default:
>>>>          abort();
>>>>      }
>>>> +
>>>> +    return ret;
>>>>  }
>>>>  
>>>>  void qemu_coroutine_enter(Coroutine *co, void *opaque)
>>>>  {
>>>>      Coroutine *self = qemu_coroutine_self();
>>>> +    CoroutineAction ret;
>>>>  
>>>>      trace_qemu_coroutine_enter(self, co, opaque);
>>>>  
>>>> @@ -132,7 +134,9 @@ void qemu_coroutine_enter(Coroutine *co, void *opaque)
>>>>  
>>>>      co->caller = self;
>>>>      co->entry_arg = opaque;
>>>> -    coroutine_swap(self, co);
>>>> +    ret = coroutine_swap(self, co);
>>>> +    if (ret == COROUTINE_YIELD)
>>>> +        qemu_co_queue_run_restart(co);
>>>>  }
>>>
>>> Your fix looks correct although QEMU coding style requires {}.
>>>
>>> I tried to think of a simpler solution that keeps a single
>>> qemu_co_queue_run_restart() call but was unable to find one.
>>>
>>> Please send another revision with a test-coroutine.c test case so we can
>>> protect against regressions.
>>>
>>> Thanks,
>>> Stefan
>>>
>>
>>
>> .
>>
> 

  reply	other threads:[~2015-02-10  4:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-09  6:50 [Qemu-devel] [PATCH v2] qemu-coroutine: segfault when restarting co_queue Bin Wu
2015-02-09  9:09 ` Paolo Bonzini
2015-02-10  0:55   ` Bin Wu
2015-02-09  9:42 ` Kevin Wolf
2015-02-09 14:48 ` Stefan Hajnoczi
2015-02-10  0:51   ` Bin Wu
2015-02-10  3:16   ` Wen Congyang
2015-02-10  3:48     ` Bin Wu
2015-02-10  4:49       ` Wen Congyang [this message]
2015-02-10 10:13   ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54D98DC1.2010603@cn.fujitsu.com \
    --to=wency@cn.fujitsu.com \
    --cc=arei.gonglei@huawei.com \
    --cc=boby.chen@huawei.com \
    --cc=bruce.fon@huawei.com \
    --cc=famz@redhat.com \
    --cc=kathy.wangting@huawei.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rudy.zhangmin@huawei.com \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=subo7@huawei.com \
    --cc=wu.wubin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.