All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: "Richard W.M. Jones" <rjones@redhat.com>
Cc: kwolf@redhat.com, Marc Zyngier <Marc.Zyngier@arm.com>,
	lersek@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me
Date: Fri, 17 Jul 2015 11:30:38 +0200	[thread overview]
Message-ID: <55A8CB3E.5090404@redhat.com> (raw)
In-Reply-To: <55A8883D.1010207@redhat.com>



On 17/07/2015 06:44, Paolo Bonzini wrote:
> 
> 
> On 16/07/2015 21:05, Richard W.M. Jones wrote:
>>
>> Sorry to spoil things, but I'm still seeing this bug, although it is
>> now a lot less frequent with your patch.  I would estimate it happens
>> more often than 1 in 5 runs with qemu.git, and probably 1 in 200 runs
>> with qemu.git + the v2 patch series.
>>
>> It's the exact same hang in both cases.
>>
>> Is it possible that this patch doesn't completely close any race?
>>
>> Still, it is an improvement, so there is that.
> 
> Would seem at first glance like a different bug.
> 
> Interestingly, adding some "tracing" (qemu_clock_get_ns) makes the bug
> more likely: now it reproduces in about 10 tries.  Of course :) adding
> other kinds of tracing instead make it go away again (>50 tries).
> 
> Perhaps this:
> 
>    i/o thread         vcpu thread                   worker thread
>    ---------------------------------------------------------------------
>    lock_iothread
>    notify_me = 1
>    ...
>    unlock_iothread
>                       lock_iothread
>                       notify_me = 3
>                       ppoll
>                       notify_me = 1
>                                                      bh->scheduled = 1
>                                                      event_notifier_set
>                       event_notifier_test_and_clear
>    ppoll
>     ^^ hang
> 	
> In the exact shape above, it doesn't seem too likely to happen, but
> perhaps there's another simpler case.  Still, the bug exists.
> 
> The above is not really related to notify_me.  Here the notification is
> not being optimized away!  So I wonder if this one has been there forever.
> 
> Fam suggested putting the event_notifier_test_and_clear before
> aio_bh_poll(), but it does not work.  I'll look more close
> 
> However, an unconditional event_notifier_test_and_clear is pretty
> expensive.  On one hand, obviously correctness comes first.  On the
> other hand, an expensive operation at the wrong place can mask the race
> very easily; I'll let the fix run for a while, but I'm not sure if a
> successful test really says anything useful.

So it may not be useful, but still successful test is successful. :)
The following patch, which also includes the delta between v2 and v3
of this series, survived 674 reboots before hitting a definitely
unrelated problem:

error: kvm run failed Function not implemented
PC=00000000bf671210  SP=00000000c00001f0
X00=000000000a003e70 X01=0000000000000000 X02=00000000bf680548 X03=0000000000000030
X04=00000000bbb5fc18 X05=00000000004b7770 X06=00000000bf721930 X07=000000000000009a
X08=00000000bf716858 X09=0000000000000090 X10=0000000000000000 X11=0000000000000046
X12=00000000a007e03a X13=0000000000000000 X14=0000000000000000 X15=0000000000000000
X16=00000000bf716df0 X17=0000000000000000 X18=0000000000000000 X19=00000000bf6f5f18
X20=0000000000000000 X21=0000000000000000 X22=0000000000000000 X23=0000000000000000
X24=0000000000000000 X25=0000000000000000 X26=0000000000000000 X27=0000000000000000
X28=0000000000000000 X29=0000000000000000 X30=0000000000000000 PSTATE=60000305 (flags -ZC-)

For the record, this is the kvm_run struct:

$6 = {request_interrupt_window = 0 '\000', padding1 = "\000\000\000\000\000\000", exit_reason = 0, 
  ready_for_interrupt_injection = 0 '\000', if_flag = 0 '\000', flags = 0, cr8 = 0, apic_base = 0, {hw = {
      hardware_exit_reason = 150994968}, fail_entry = {hardware_entry_failure_reason = 150994968}, ex = {
      exception = 150994968, error_code = 0}, io = {direction = 24 '\030', size = 0 '\000', port = 2304, 
      count = 0, data_offset = 144}, debug = {arch = {<No data fields>}}, mmio = {phys_addr = 150994968, 
      data = "\220\000\000\000\000\000\000", len = 4, is_write = 0 '\000'}, hypercall = {nr = 150994968, 
      args = {144, 4, 0, 0, 0, 0}, ret = 0, longmode = 0, pad = 0}, tpr_access = {rip = 150994968, 
      is_write = 144, pad = 0}, s390_sieic = {icptcode = 24 '\030', ipa = 2304, ipb = 0}, 
    s390_reset_flags = 150994968, s390_ucontrol = {trans_exc_code = 150994968, pgm_code = 144}, dcr = {
      dcrn = 150994968, data = 0, is_write = 144 '\220'}, internal = {suberror = 150994968, ndata = 0, 
      data = {144, 4, 0 <repeats 14 times>}}, osi = {gprs = {150994968, 144, 4, 0 <repeats 29 times>}}, 
    papr_hcall = {nr = 150994968, ret = 144, args = {4, 0, 0, 0, 0, 0, 0, 0, 0}}, s390_tsch = {
      subchannel_id = 24, subchannel_nr = 2304, io_int_parm = 0, io_int_word = 144, ipb = 0, 
      dequeued = 4 '\004'}, epr = {epr = 150994968}, system_event = {type = 150994968, flags = 144}, 
    s390_stsi = {addr = 150994968, ar = 144 '\220', reserved = 0 '\000', fc = 0 '\000', sel1 = 0 '\000', 
      sel2 = 0}, 
    padding = "\030\000\000\t\000\000\000\000\220\000\000\000\000\000\000\000\004", '\000' <repeats 238 times>}, kvm_valid_regs = 0, kvm_dirty_regs = 0, s = {regs = {<No data fields>}, 
    padding = '\000' <repeats 2047 times>}}

Marc, does it ring any bell?

Paolo

diff --git a/aio-posix.c b/aio-posix.c
index 268d14d..d2011d0 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -273,6 +273,13 @@ bool aio_poll(AioContext *ctx, bool blocking)
         aio_context_acquire(ctx);
     }
 
+    /* This should be optimized... */
+    event_notifier_test_and_clear(&ctx->notifier);
+
+    if (blocking) {
+        atomic_sub(&ctx->notify_me, 2);
+    }
+
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
         for (i = 0; i < npfd; i++) {
@@ -283,10 +290,6 @@ bool aio_poll(AioContext *ctx, bool blocking)
     npfd = 0;
     ctx->walking_handlers--;
 
-    if (blocking) {
-        atomic_sub(&ctx->notify_me, 2);
-    }
-
     /* Run dispatch even if there were no readable fds to run timers */
     if (aio_dispatch(ctx)) {
         progress = true;
diff --git a/aio-win32.c b/aio-win32.c
index 2bfd5f8..33809fd 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -326,6 +326,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
         if (timeout) {
             aio_context_acquire(ctx);
         }
+
+        /* This should be optimized... */
+        event_notifier_test_and_clear(&ctx->notifier);
+
         if (blocking) {
             assert(first);
             atomic_sub(&ctx->notify_me, 2);
diff --git a/async.c b/async.c
index 9204907..120e183 100644
--- a/async.c
+++ b/async.c
@@ -202,6 +202,9 @@ aio_ctx_check(GSource *source)
     AioContext *ctx = (AioContext *) source;
     QEMUBH *bh;
 
+    /* This should be optimized... */
+    event_notifier_test_and_clear(&ctx->notifier);
+
     atomic_and(&ctx->notify_me, ~1);
     for (bh = ctx->first_bh; bh; bh = bh->next) {
         if (!bh->deleted && bh->scheduled) {
@@ -280,6 +280,10 @@ static void aio_rfifolock_cb(void *opaque)
     aio_notify(opaque);
 }
 
+static void event_notifier_dummy_cb(EventNotifier *e)
+{
+}
+
 AioContext *aio_context_new(Error **errp)
 {
     int ret;
@@ -292,7 +296,7 @@ AioContext *aio_context_new(Error **errp)
         return NULL;
     }
     g_source_set_can_recurse(&ctx->source, true);
-    aio_set_event_notifier(ctx, &ctx->notifier, NULL);
+    aio_set_event_notifier(ctx, &ctx->notifier, event_notifier_dummy_cb);
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
     rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);

  reply	other threads:[~2015-07-17  9:30 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-16  9:56 [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 1/3] tests: remove irrelevant assertions from test-aio Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 2/3] aio-win32: reorganize polling loop Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 3/3] AioContext: fix broken ctx->dispatching optimization Paolo Bonzini
2015-07-17  2:25   ` Fam Zheng
2015-07-17  2:27     ` Paolo Bonzini
2015-07-17  4:17   ` Paolo Bonzini
2015-07-17  8:39     ` Stefan Hajnoczi
2015-07-16 11:07 ` [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me Kevin Wolf
2015-07-16 12:44 ` Richard W.M. Jones
2015-07-16 19:05 ` Richard W.M. Jones
2015-07-16 22:06   ` Paolo Bonzini
2015-07-17  0:17     ` Paolo Bonzini
2015-07-17  4:44   ` Paolo Bonzini
2015-07-17  9:30     ` Paolo Bonzini [this message]
2015-07-17 12:58       ` Richard W.M. Jones
2015-07-17 13:02         ` Paolo Bonzini
2015-07-17 13:28       ` Marc Zyngier
2015-07-17 13:39         ` Laszlo Ersek
2015-07-17 13:48           ` Marc Zyngier
2015-07-17 13:53             ` Richard W.M. Jones
2015-07-17 14:03               ` Marc Zyngier
2015-07-17 13:57             ` Laszlo Ersek
2015-07-17 14:04         ` Paolo Bonzini
2015-07-17 14:18           ` Marc Zyngier
  -- strict thread matches above, loose matches on Subject: below --
2015-07-18 20:21 Paolo Bonzini
2015-07-19 10:08 ` Richard W.M. Jones
2015-07-20 16:17 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55A8CB3E.5090404@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=Marc.Zyngier@arm.com \
    --cc=kwolf@redhat.com \
    --cc=lersek@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.