qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Fam Zheng <famz@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	qemu-devel <qemu-devel@nongnu.org>, Kevin Wolf <kwolf@redhat.com>,
	qemu block <qemu-block@nongnu.org>, Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9-rc5 v2] block: Drain BH in bdrv_drained_begin
Date: Mon, 17 Apr 2017 11:33:34 +0800	[thread overview]
Message-ID: <20170417033334.GA6717@lemon.lan> (raw)
In-Reply-To: <20170416093743.GA24137@stefanha-x1.localdomain>

On Sun, 04/16 10:37, Stefan Hajnoczi wrote:
> On Sat, Apr 15, 2017 at 01:10:02AM +0800, Paolo Bonzini wrote:
> > 
> > 
> > On 14/04/2017 16:51, Stefan Hajnoczi wrote:
> > > On Fri, Apr 14, 2017 at 9:02 AM, Fam Zheng <famz@redhat.com> wrote:
> > >> @@ -398,11 +399,15 @@ void bdrv_drain_all(void);
> > >>           */                                                \
> > >>          assert(!bs_->wakeup);                              \
> > >>          bs_->wakeup = true;                                \
> > >> -        while ((cond)) {                                   \
> > >> -            aio_context_release(ctx_);                     \
> > >> -            aio_poll(qemu_get_aio_context(), true);        \
> > >> -            aio_context_acquire(ctx_);                     \
> > >> -            waited_ = true;                                \
> > >> +        while (busy_) {                                    \
> > >> +            if ((cond)) {                                  \
> > >> +                waited_ = busy_ = true;                    \
> > >> +                aio_context_release(ctx_);                 \
> > >> +                aio_poll(qemu_get_aio_context(), true);    \
> > >> +                aio_context_acquire(ctx_);                 \
> > >> +            } else {                                       \
> > >> +                busy_ = aio_poll(ctx_, false);             \
> > >> +            }                                              \
> > > 
> > > Wait, I'm confused.  The current thread is not in the BDS AioContext.
> > > We're not allowed to call aio_poll(ctx_, false).
> > 
> > It's pretty ugly indeed.  Strictly from a thread-safety point of view,
> > everything that aio_poll calls will acquire the AioContext, so that is
> > safe and in fact the release/acquire pair can beeven  hoisted outside
> > the "if".
> > 
> > If we did that for blocking=true in both I/O and main thread, then that
> > would be racy.  This is the scenario mentioned in the commit message for
> > c9d1a56, "block: only call aio_poll on the current thread's AioContext",
> > 2016-10-28).
> > 
> > If only one thread has blocking=true, it's subject to races too.  In
> > this case, the I/O thread may fail to be woken by iothread_stop's
> > aio_notify.  However, by the time iothread_stop is called there should
> > be no BlockDriverStates (and thus no BDRV_POLL_WHILE running the above
> > code) for the I/O thread's AioContext.
> 
> The scenario I have in mind is:
> 
> BDRV_POLL_WHILE() is called from the IOThread and the main loop also
> invokes BDRV_POLL_WHILE().
> 
> The IOThread is blocked in aio_poll(ctx, true).
> 
> The main loop calls aio_poll(ctx, false) and can therefore steal the
> event/completion condition.
> 
> I *think* ppoll() will return in both threads and they will race to
> invoke handlers, but I'm not 100% sure that two BDRV_POLL_WHILE() calls
> in parallel are safe.
> 
> What do you think?

BDRV_POLL_WHILE in both IOThread and main loop has aio_context_acquire(ctx)
around it; in the branch where main loop calls aio_poll(ctx, false),
there is also no aio_context_release(ctx). So I thinki it is protected by the
AioContext lock, and is safe.

I tested your suggested "aio_poll(qemu_get_aio_context(), false)", but it
doesn't work. The new hunk is:

diff --git a/include/block/block.h b/include/block/block.h
index 97d4330..0e77522 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -381,12 +381,13 @@ void bdrv_drain_all(void);
 
 #define BDRV_POLL_WHILE(bs, cond) ({                       \
     bool waited_ = false;                                  \
+    bool busy_ = true;                                     \
     BlockDriverState *bs_ = (bs);                          \
     AioContext *ctx_ = bdrv_get_aio_context(bs_);          \
     if (aio_context_in_iothread(ctx_)) {                   \
-        while ((cond)) {                                   \
-            aio_poll(ctx_, true);                          \
-            waited_ = true;                                \
+        while ((cond) || busy_) {                          \
+            busy_ = aio_poll(ctx_, (cond));                \
+            waited_ |= !!(cond);                           \
         }                                                  \
     } else {                                               \
         assert(qemu_get_current_aio_context() ==           \
@@ -398,11 +399,12 @@ void bdrv_drain_all(void);
          */                                                \
         assert(!bs_->wakeup);                              \
         bs_->wakeup = true;                                \
-        while ((cond)) {                                   \
-            aio_context_release(ctx_);                     \
-            aio_poll(qemu_get_aio_context(), true);        \
-            aio_context_acquire(ctx_);                     \
-            waited_ = true;                                \
+        while ((cond) || busy_) {                          \
+                aio_context_release(ctx_);                 \
+                busy_ = aio_poll(qemu_get_aio_context(),   \
+                                 (cond));                  \
+                aio_context_acquire(ctx_);                 \
+                waited_ |= !!(cond);                       \
         }                                                  \
         bs_->wakeup = false;                               \
     }                                                      \

I get a coroutine re-enter crash (with the same drive-mirror -> ready -> reset
reproducer), which I don't fully understand yet:

#0  0x00007f09fdec091f in raise () at /lib64/libc.so.6
#1  0x00007f09fdec251a in abort () at /lib64/libc.so.6
#2  0x0000563e2017db9b in qemu_aio_coroutine_enter (ctx=0x563e21f52020, co=0x563e22cf1840) at /stor/work/qemu/util/qemu-coroutine.c:114
#3  0x0000563e2015ee82 in aio_co_enter (ctx=0x563e21f52020, co=0x563e22cf1840) at /stor/work/qemu/util/async.c:472
#4  0x0000563e2007241a in bdrv_coroutine_enter (bs=0x563e21fde470, co=0x563e22cf1840) at /stor/work/qemu/block.c:4333
#5  0x0000563e200745e6 in block_job_enter (job=0x563e22e76000) at /stor/work/qemu/blockjob.c:535
#6  0x0000563e20074554 in block_job_resume (job=0x563e22e76000) at /stor/work/qemu/blockjob.c:521
#7  0x0000563e20073372 in block_job_drained_end (opaque=0x563e22e76000) at /stor/work/qemu/blockjob.c:80
#8  0x0000563e200c0578 in blk_root_drained_end (child=0x563e21f9c8f0) at /stor/work/qemu/block/block-backend.c:1945
#9  0x0000563e200ce7af in bdrv_parent_drained_end (bs=0x563e21fde470) at /stor/work/qemu/block/io.c:64
#10 0x0000563e200cefe3 in bdrv_drained_end (bs=0x563e21fde470) at /stor/work/qemu/block/io.c:245 #11 0x0000563e200cf06f in bdrv_drain (bs=0x563e21fde470) at /stor/work/qemu/block/io.c:270
#12 0x0000563e200bf485 in blk_drain (blk=0x563e21f8eec0) at /stor/work/qemu/block/block-backend.c:1383
#13 0x0000563e1fd069aa in virtio_blk_reset (vdev=0x563e23afd950) at /stor/work/qemu/hw/block/virtio-blk.c:708
#14 0x0000563e1fd4c2fd in virtio_reset (opaque=0x563e23afd950) at /stor/work/qemu/hw/virtio/virtio.c:1200
#15 0x0000563e1ffdea74 in virtio_bus_reset (bus=0x563e23afd8d8) at /stor/work/qemu/hw/virtio/virtio-bus.c:96
#16 0x0000563e1ffdc609 in virtio_pci_reset (qdev=0x563e23af5440) at /stor/work/qemu/hw/virtio/virtio-pci.c:1873
#17 0x0000563e1fe994e9 in device_reset (dev=0x563e23af5440) at /stor/work/qemu/hw/core/qdev.c:1169
#18 0x0000563e1fe9735b in qdev_reset_one (dev=0x563e23af5440, opaque=0x0) at /stor/work/qemu/hw/core/qdev.c:310
#19 0x0000563e1fe97f5f in qdev_walk_children (dev=0x563e23af5440, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x563e1fe9733f <qdev_reset_one>, post_busfn=0x563e1fe97362 <qbus_reset_one>, opaque=0x0) at /stor/work/qemu/hw/core/qdev.c:625
#20 0x0000563e1fe9c125 in qbus_walk_children (bus=0x563e2216e1f0, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x563e1fe9733f <qdev_reset_one>, post_busfn=0x563e1fe97362 <qbus_reset_one>, opaque=0x0) at /stor/work/qemu/hw/core/bus.c:59
#21 0x0000563e1fe97f23 in qdev_walk_children (dev=0x563e2216c790, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x563e1fe9733f <qdev_reset_one>, post_busfn=0x563e1fe97362 <qbus_reset_one>, opaque=0x0) at /stor/work/qemu/hw/core/qdev.c:617
#22 0x0000563e1fe9c125 in qbus_walk_children (bus=0x563e21fe9ae0, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x563e1fe9733f <qdev_reset_one>, post_busfn=0x563e1fe97362 <qbus_reset_one>, opaque=0x0) at /stor/work/qemu/hw/core/bus.c:59
#23 0x0000563e1fe97475 in qbus_reset_all (bus=0x563e21fe9ae0) at /stor/work/qemu/hw/core/qdev.c:336
#24 0x0000563e1fe97498 in qbus_reset_all_fn (opaque=0x563e21fe9ae0) at /stor/work/qemu/hw/core/qdev.c:342
#25 0x0000563e1fe9ca6b in qemu_devices_reset () at /stor/work/qemu/hw/core/reset.c:69
#26 0x0000563e1fd6377d in pc_machine_reset () at /stor/work/qemu/hw/i386/pc.c:2234
#27 0x0000563e1fe0f4d4 in qemu_system_reset (report=true) at /stor/work/qemu/vl.c:1697
#28 0x0000563e1fe0f8fe in main_loop_should_exit () at /stor/work/qemu/vl.c:1865
#29 0x0000563e1fe0f9cb in main_loop () at /stor/work/qemu/vl.c:1902
#30 0x0000563e1fe17716 in main (argc=20, argv=0x7ffde0a17168, envp=0x7ffde0a17210) at /stor/work/qemu/vl.c:4709

(gdb) info thread
  Id   Target Id         Frame 
* 1    Thread 0x7f0a04396c00 (LWP 6192) "qemu-system-x86" 0x0000563e2017db9b in qemu_aio_coroutine_enter (ctx=0x563e21f52020, co=0x563e22cf1840)
    at /stor/work/qemu/util/qemu-coroutine.c:114
  2    Thread 0x7f09f535e700 (LWP 6193) "qemu-system-x86" 0x00007f0a022d2c7d in nanosleep () from /lib64/libpthread.so.0
  3    Thread 0x7f09f4b5d700 (LWP 6194) "qemu-system-x86" 0x0000563e2015e98c in aio_notify (ctx=0x563e21f52020) at /stor/work/qemu/util/async.c:341
  4    Thread 0x7f09ef117700 (LWP 6196) "qemu-system-x86" 0x00007f0a022cf460 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

(gdb) t 3
[Switching to thread 3 (Thread 0x7f09f4b5d700 (LWP 6194))]
#0  0x0000563e2015e98c in aio_notify (ctx=0x563e21f52020) at /stor/work/qemu/util/async.c:341
341	    if (ctx->notify_me) {
(gdb) bt
#0  0x0000563e2015e98c in aio_notify (ctx=0x563e21f52020) at /stor/work/qemu/util/async.c:341
#1  0x0000563e2015ea10 in aio_timerlist_notify (opaque=0x563e21f52020, type=QEMU_CLOCK_REALTIME) at /stor/work/qemu/util/async.c:356
#2  0x0000563e2016032f in timerlist_notify (timer_list=0x563e21f521e0) at /stor/work/qemu/util/qemu-timer.c:281
#3  0x0000563e20160656 in timerlist_rearm (timer_list=0x563e21f521e0) at /stor/work/qemu/util/qemu-timer.c:404
#4  0x0000563e20160729 in timer_mod_ns (ts=0x7f09f00014c0, expire_time=176119278967) at /stor/work/qemu/util/qemu-timer.c:432
#5  0x0000563e20160807 in timer_mod (ts=0x7f09f00014c0, expire_time=176119278967) at /stor/work/qemu/util/qemu-timer.c:462
#6  0x0000563e2017f00a in co_aio_sleep_ns (ctx=0x563e21f52020, type=QEMU_CLOCK_REALTIME, ns=100000000) at /stor/work/qemu/util/qemu-coroutine-sleep.c:38
#7  0x0000563e20074941 in block_job_sleep_ns (job=0x563e22e76000, type=QEMU_CLOCK_REALTIME, ns=100000000) at /stor/work/qemu/blockjob.c:636
#8  0x0000563e200cbba1 in mirror_run (opaque=0x563e22e76000) at /stor/work/qemu/block/mirror.c:881
#9  0x0000563e20073b1e in block_job_co_entry (opaque=0x563e22e76000) at /stor/work/qemu/blockjob.c:282
#10 0x0000563e2017f0bb in coroutine_trampoline (i0=583997504, i1=22078) at /stor/work/qemu/util/coroutine-ucontext.c:79
#11 0x00007f09fded5bf0 in __start_context () at /lib64/libc.so.6
#12 0x00007ffde0a14cd0 in  ()
#13 0x0000000000000000 in  ()

  reply	other threads:[~2017-04-17  3:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-14  8:02 [Qemu-devel] [PATCH for-2.9-rc5 v2] block: Drain BH in bdrv_drained_begin Fam Zheng
2017-04-14  8:10 ` Fam Zheng
2017-04-14  8:45 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-04-14  8:51 ` Stefan Hajnoczi
2017-04-14 17:10   ` Paolo Bonzini
2017-04-16  9:37     ` Stefan Hajnoczi
2017-04-17  3:33       ` Fam Zheng [this message]
2017-04-18  8:16         ` [Qemu-devel] " Paolo Bonzini
2017-04-17  8:27   ` [Qemu-devel] [Qemu-block] " Fam Zheng
2017-04-17 11:21     ` Jeff Cody
2017-04-18  8:18     ` [Qemu-devel] " Paolo Bonzini
2017-04-18  8:36       ` Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170417033334.GA6717@lemon.lan \
    --to=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).