qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: lersek@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
	rjones@redhat.com
Subject: Re: [Qemu-devel] [PATCH] AioContext: fix broken ctx->dispatching optimization
Date: Wed, 15 Jul 2015 22:14:35 +0200	[thread overview]
Message-ID: <20150715201435.GF3582@noname.redhat.com> (raw)
In-Reply-To: <1436980403-25898-1-git-send-email-pbonzini@redhat.com>

Am 15.07.2015 um 19:13 hat Paolo Bonzini geschrieben:
> This patch rewrites the ctx->dispatching optimization, which was the cause
> of some mysterious hangs that could be reproduced on aarch64 KVM only.
> The hangs were indirectly caused by aio_poll() and in particular by
> flash memory updates's call to blk_write(), which invokes aio_poll().
> Fun stuff: they had an extremely short race window, so much that
> adding all kind of tracing to either the kernel or QEMU made it
> go away (a single printf made it half as reproducible).
> [...]

Awesome commit message. :-)

Took a while to read, think about and understand everything, but you did
a great job on describing the bug. In the end, all the information I
needed was there.

I'm not sure if you want to trust my review, considering that I failed
the first time around, but for the most part, this looks good to me.

> diff --git a/aio-win32.c b/aio-win32.c
> index 233d8f5..ae7c6cf 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -318,11 +313,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
>      first = true;
>  
>      /* wait until next event */
> -    while (count > 0) {
> +    do {
>          HANDLE event;
>          int ret;
>  
> -        timeout = blocking
> +        timeout = blocking && !have_select_revents
>              ? qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)) : 0;
>          if (timeout) {
>              aio_context_release(ctx);

Here I'm not sure why you switched to a do..while loop? It's not obvious
to me how the change from aio_set_dispatching() to ctx->notify_me is
related to this.

Does it mean that we can call WaitForMultipleObjects() with nCount == 0?
This seems to be forbidden.

> diff --git a/tests/test-aio.c b/tests/test-aio.c
> index a7cb5c9..217e337 100644
> --- a/tests/test-aio.c
> +++ b/tests/test-aio.c
> @@ -97,14 +97,6 @@ static void event_ready_cb(EventNotifier *e)
>  
>  /* Tests using aio_*.  */
>  
> -static void test_notify(void)
> -{
> -    g_assert(!aio_poll(ctx, false));
> -    aio_notify(ctx);
> -    g_assert(!aio_poll(ctx, true));
> -    g_assert(!aio_poll(ctx, false));
> -}
> -
>  typedef struct {
>      QemuMutex start_lock;
>      bool thread_acquired;

Okay. We don't actually notify any more, so this test hangs now.
Removing seems fine.

> @@ -331,7 +323,7 @@ static void test_wait_event_notifier(void)
>      EventNotifierTestData data = { .n = 0, .active = 1 };
>      event_notifier_init(&data.e, false);
>      aio_set_event_notifier(ctx, &data.e, event_ready_cb);
> -    g_assert(!aio_poll(ctx, false));
> +    while (aio_poll(ctx, false));
>      g_assert_cmpint(data.n, ==, 0);
>      g_assert_cmpint(data.active, ==, 1);
>  
> @@ -356,7 +348,7 @@ static void test_flush_event_notifier(void)
>      EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
>      event_notifier_init(&data.e, false);
>      aio_set_event_notifier(ctx, &data.e, event_ready_cb);
> -    g_assert(!aio_poll(ctx, false));
> +    while (aio_poll(ctx, false));
>      g_assert_cmpint(data.n, ==, 0);
>      g_assert_cmpint(data.active, ==, 10);

These two hunks I don't understand. Where would an event come from that
needs to be handled here?

With these hunks reverted, the test still passes for me.

> @@ -494,14 +486,6 @@ static void test_timer_schedule(void)
>   *   works well, and that's what I am using.
>   */
>  
> -static void test_source_notify(void)
> -{
> -    while (g_main_context_iteration(NULL, false));
> -    aio_notify(ctx);
> -    g_assert(g_main_context_iteration(NULL, true));
> -    g_assert(!g_main_context_iteration(NULL, false));
> -}

Same as above, good.

>  static void test_source_flush(void)
>  {
>      g_assert(!g_main_context_iteration(NULL, false));
> @@ -669,7 +653,7 @@ static void test_source_wait_event_notifier(void)
>      EventNotifierTestData data = { .n = 0, .active = 1 };
>      event_notifier_init(&data.e, false);
>      aio_set_event_notifier(ctx, &data.e, event_ready_cb);
> -    g_assert(g_main_context_iteration(NULL, false));
> +    while (g_main_context_iteration(NULL, false));
>      g_assert_cmpint(data.n, ==, 0);
>      g_assert_cmpint(data.active, ==, 1);
>  
> @@ -694,7 +678,7 @@ static void test_source_flush_event_notifier(void)
>      EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true };
>      event_notifier_init(&data.e, false);
>      aio_set_event_notifier(ctx, &data.e, event_ready_cb);
> -    g_assert(g_main_context_iteration(NULL, false));
> +    while (g_main_context_iteration(NULL, false));
>      g_assert_cmpint(data.n, ==, 0);
>      g_assert_cmpint(data.active, ==, 10);

Okay, this one is confusing. Took me a while to realise that aio_poll()
returns false when the only event is an aio_notify(), whereas
g_main_context_iteration() returns true in the same case.

With this information, I understand that what has changed is that the
return value of g_main_context_iteration() changes from true before this
patch (had the aio_notify() from aio_set_fd_handler() pending) to false
after the patch (aio_notify() doesn't inject an event any more).

This should mean that like above we can assert that the first iteration
returns false, i.e. reverse the assertion (and indeed, with this
change the test still passes for me).

Kevin

  parent reply	other threads:[~2015-07-15 20:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-15 17:13 [Qemu-devel] [PATCH] AioContext: fix broken ctx->dispatching optimization Paolo Bonzini
2015-07-15 18:00 ` Laszlo Ersek
2015-07-15 20:14 ` Kevin Wolf [this message]
2015-07-15 22:59   ` Paolo Bonzini
2015-07-16  9:14     ` Kevin Wolf
2015-07-16  9:34       ` Paolo Bonzini
2015-07-16  5:29 ` Fam Zheng
2015-07-16  9:14   ` Paolo Bonzini
2015-07-16  9:16   ` Stefan Hajnoczi
2015-07-16 10:17 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150715201435.GF3582@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=lersek@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).