All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Sergio Lopez <slp@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH] virtio-blk: Disable request queuing while switching contexts
Date: Tue, 2 Jun 2020 15:04:33 +0200	[thread overview]
Message-ID: <20200602130433.GH5940@linux.fritz.box> (raw)
In-Reply-To: <20200602121835.kfld5taju25p6njn@dritchie>

[-- Attachment #1: Type: text/plain, Size: 10919 bytes --]

Am 02.06.2020 um 14:18 hat Sergio Lopez geschrieben:
> On Tue, Jun 02, 2020 at 01:23:14PM +0200, Kevin Wolf wrote:
> > Am 02.06.2020 um 10:11 hat Sergio Lopez geschrieben:
> > > Disable request queuing while switching contexts on
> > > virtio_blk_data_plane_[start|stop](), preventing requests from getting
> > > queued on the wrong context.
> > >
> > > Placing requests on the wrong context may lead to them being wrongly
> > > accessed in parallel from different threads, potentially leading to
> > > multiple issues.
> > >
> > > For example, stopping and resuming a VM multiple times while the guest
> > > is generating I/O on a virtio_blk device can trigger a crash with a
> > > stack tracing looking like this one:
> > >
> > > <------>
> > >  Thread 2 (Thread 0x7ff736765700 (LWP 1062503)):
> > >  #0  0x00005567a13b99d6 in iov_memset
> > >      (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803)
> > >      at util/iov.c:69
> > >  #1  0x00005567a13bab73 in qemu_iovec_memset
> > >      (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530
> > >  #2  0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86
> > >  #3  0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217
> > >  #4  0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323
> > >  #5  0x00005567a12f43d9 in qemu_laio_process_completions_and_submit (s=0x7ff7182e8420)
> > >      at block/linux-aio.c:236
> > >  #6  0x00005567a12f44c2 in qemu_laio_poll_cb (opaque=0x7ff7182e8430) at block/linux-aio.c:267
> > >  #7  0x00005567a13aed83 in run_poll_handlers_once (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8)
> > >      at util/aio-posix.c:520
> > >  #8  0x00005567a13aee9f in run_poll_handlers (ctx=0x5567a2b58c70, max_ns=16000, timeout=0x7ff7367645f8)
> > >      at util/aio-posix.c:562
> > >  #9  0x00005567a13aefde in try_poll_mode (ctx=0x5567a2b58c70, timeout=0x7ff7367645f8)
> > >      at util/aio-posix.c:597
> > >  #10 0x00005567a13af115 in aio_poll (ctx=0x5567a2b58c70, blocking=true) at util/aio-posix.c:639
> > >  #11 0x00005567a109acca in iothread_run (opaque=0x5567a2b29760) at iothread.c:75
> > >  #12 0x00005567a13b2790 in qemu_thread_start (args=0x5567a2b694c0) at util/qemu-thread-posix.c:519
> > >  #13 0x00007ff73eedf2de in start_thread () at /lib64/libpthread.so.0
> > >  #14 0x00007ff73ec10e83 in clone () at /lib64/libc.so.6
> > >
> > >  Thread 1 (Thread 0x7ff743986f00 (LWP 1062500)):
> > >  #0  0x00005567a13b99d6 in iov_memset
> > >      (iov=0x6563617073206f4e, iov_cnt=1717922848, offset=516096, fillc=0, bytes=7018105756081554803)
> > >      at util/iov.c:69
> > >  #1  0x00005567a13bab73 in qemu_iovec_memset
> > >      (qiov=0x7ff73ec99748, offset=516096, fillc=0, bytes=7018105756081554803) at util/iov.c:530
> > >  #2  0x00005567a12f411c in qemu_laio_process_completion (laiocb=0x7ff6512ee6c0) at block/linux-aio.c:86
> > >  #3  0x00005567a12f42ff in qemu_laio_process_completions (s=0x7ff7182e8420) at block/linux-aio.c:217
> > >  #4  0x00005567a12f480d in ioq_submit (s=0x7ff7182e8420) at block/linux-aio.c:323
> > >  #5  0x00005567a12f4a2f in laio_do_submit (fd=19, laiocb=0x7ff5f4ff9ae0, offset=472363008, type=2)
> > >      at block/linux-aio.c:375
> > >  #6  0x00005567a12f4af2 in laio_co_submit
> > >      (bs=0x5567a2b8c460, s=0x7ff7182e8420, fd=19, offset=472363008, qiov=0x7ff5f4ff9ca0, type=2)
> > >      at block/linux-aio.c:394
> > >  #7  0x00005567a12f1803 in raw_co_prw
> > >      (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, type=2)
> > >      at block/file-posix.c:1892
> > >  #8  0x00005567a12f1941 in raw_co_pwritev
> > >      (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, flags=0)
> > >      at block/file-posix.c:1925
> > >  #9  0x00005567a12fe3e1 in bdrv_driver_pwritev
> > >      (bs=0x5567a2b8c460, offset=472363008, bytes=20480, qiov=0x7ff5f4ff9ca0, qiov_offset=0, flags=0)
> > >      at block/io.c:1183
> > >  #10 0x00005567a1300340 in bdrv_aligned_pwritev
> > >      (child=0x5567a2b5b070, req=0x7ff5f4ff9db0, offset=472363008, bytes=20480, align=512, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0) at block/io.c:1980
> > >  #11 0x00005567a1300b29 in bdrv_co_pwritev_part
> > >      (child=0x5567a2b5b070, offset=472363008, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, flags=0)
> > >      at block/io.c:2137
> > >  #12 0x00005567a12baba1 in qcow2_co_pwritev_task
> > >      (bs=0x5567a2b92740, file_cluster_offset=472317952, offset=487305216, bytes=20480, qiov=0x7ff72c0425b8, qiov_offset=0, l2meta=0x0) at block/qcow2.c:2444
> > >  #13 0x00005567a12bacdb in qcow2_co_pwritev_task_entry (task=0x5567a2b48540) at block/qcow2.c:2475
> > >  #14 0x00005567a13167d8 in aio_task_co (opaque=0x5567a2b48540) at block/aio_task.c:45
> > >  #15 0x00005567a13cf00c in coroutine_trampoline (i0=738245600, i1=32759) at util/coroutine-ucontext.c:115
> > >  #16 0x00007ff73eb622e0 in __start_context () at /lib64/libc.so.6
> > >  #17 0x00007ff6626f1350 in  ()
> > >  #18 0x0000000000000000 in  ()
> > > <------>
> > >
> > > This is also known to cause crashes with this message (assertion
> > > failed):
> > >
> > >  aio_co_schedule: Co-routine was already scheduled in 'aio_co_schedule'
> > >
> > > RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1812765
> > > Signed-off-by: Sergio Lopez <slp@redhat.com>
> > > ---
> > >  hw/block/dataplane/virtio-blk.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> > > index 1b52e8159c..f1c7ba69c0 100644
> > > --- a/hw/block/dataplane/virtio-blk.c
> > > +++ b/hw/block/dataplane/virtio-blk.c
> > > @@ -214,12 +214,17 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
> > >      vblk->dataplane_started = true;
> > >      trace_virtio_blk_data_plane_start(s);
> > >
> > > +    /* Prevent requests from getting queued on the old context */
> > > +    blk_set_disable_request_queuing(s->conf->conf.blk, true);
> > > +
> > >      r = blk_set_aio_context(s->conf->conf.blk, s->ctx, &local_err);
> > >      if (r < 0) {
> > >          error_report_err(local_err);
> > >          goto fail_guest_notifiers;
> > >      }
> > >
> > > +    blk_set_disable_request_queuing(s->conf->conf.blk, false);
> > > +
> >
> > Why are we even getting new requests that could possibly be queued? This
> > is in virtio_blk_data_plane_start/stop(), so clearly the device should
> > be idle and the virtio queues shouldn't be processed at all at the same
> > time?
> 
> Requests are processed and queued by virtio_blk_dma_restart_bh(),
> which is a BH registered by virtio_blk_dma_restart_cb(), which in turn
> is a callback for VM state change events registered with
> qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s).
> 
> This is the back trace of a request being queued this way:
> 
> (gdb) bt
> #0  0x000055cd35ce7490 in virtio_blk_submit_multireq (blk=0x55cd38cd24d0, mrb=0x7ffd25b365c0)
>     at /root/src/qemu/hw/block/virtio-blk.c:448
> #1  0x000055cd35ce7d5f in virtio_blk_handle_request (req=0x7f5a4c022c80, mrb=0x7ffd25b365c0)
>     at /root/src/qemu/hw/block/virtio-blk.c:686
> #2  0x000055cd35ce8602 in virtio_blk_dma_restart_bh (opaque=0x55cd38ccf4c0)
>     at /root/src/qemu/hw/block/virtio-blk.c:836
> #3  0x000055cd360dc4f5 in aio_bh_call (bh=0x55cd3880b300) at /root/src/qemu/util/async.c:164
> #4  0x000055cd360dc4f5 in aio_bh_poll (ctx=ctx@entry=0x55cd3780bff0) at /root/src/qemu/util/async.c:164
> #5  0x000055cd360dfd29 in aio_poll (ctx=ctx@entry=0x55cd3780bff0, blocking=blocking@entry=true)
>     at /root/src/qemu/util/aio-posix.c:650
> #6  0x000055cd3603a645 in bdrv_do_drained_begin
>     (poll=true, ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55cd3788d550)
>     at /root/src/qemu/block/io.c:429
> #7  0x000055cd3603a645 in bdrv_do_drained_begin
>     (bs=0x55cd3788d550, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at /root/src/qemu/block/io.c:395
> #8  0x000055cd35fde969 in bdrv_set_aio_context_ignore
>     (bs=bs@entry=0x55cd3788d550, new_context=new_context@entry=0x55cd37810e40, ignore=ignore@entry=0x7ffd25b36890) at /root/src/qemu/block.c:6309
> #9  0x000055cd35fdef33 in bdrv_child_try_set_aio_context
>     (bs=bs@entry=0x55cd3788d550, ctx=0x55cd37810e40, ignore_child=<optimized out>, errp=<optimized out>)
>     at /root/src/qemu/block.c:6431
> #10 0x000055cd3602919b in blk_do_set_aio_context
>     (blk=0x55cd38cd24d0, new_context=0x55cd37810e40, update_root_node=<optimized out>, errp=<optimized out>) at /root/src/qemu/block/block-backend.c:2016
> #11 0x000055cd35cef54f in virtio_blk_data_plane_start (vdev=<optimized out>)
>     at /root/src/qemu/hw/block/dataplane/virtio-blk.c:220

Ugh. Thanks, I see now what's going on.

We were relying on blk_set_aio_context() to complete the requests, and
since BlockBackend queues the requests instead of completing them during
drained_begin, we end up executing the requests after drained_end in the
main loop context, but the block nodes have meanwhile moved to the
iothread context.

Of course, this was already a bad hack before, witnessed by the FIXME
comment in virtio_blk_dma_restart_cb().

> I'm not familiar with the code, but I guess we need this BH to ensure
> we process the virtqueue after resuming.

The BH was introduced in commit 213189ab65d. The commit message has a
very detailed problem description, but to summarise, if I understand
correctly, the main purpose is that we don't end up stopping the VM due
to an I/O error while we are only in the middle of sending VM state
handlers events that the VM is about to be continued.

> We could perhaps schedule the BH in a different way, or add a flag to
> VirtIOBlock so virtio_blk_data_plane_[start|stop] would instruct
> virtio_blk_dma_restart_bh() to avoid processing the virtqueue,
> rescheduling itself instead. How does this would look to you?

Yes, I think if dataplane is enabled, we should really run the restarted
requests in the iothread. virtio_blk_dma_restart_cb() is called too
early, before the dataplane is running, so I guess we need some way to
tell the dataplane that it should run the request restarting logic of
virtio_blk_dma_restart_bh() after it has started (no additional BH
needed there). Your flag sounds like the obvious way to get this.

In fact, would it hurt to do this just unconditionally without a flag?
s->rq should be NULL when there is nothing to restart.

And then we can skip the same logic in virtio_blk_dma_restart_cb() if we
know that a dataplane is going to be started.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-06-02 13:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02  8:11 [PATCH] virtio-blk: Disable request queuing while switching contexts Sergio Lopez
2020-06-02 11:23 ` Kevin Wolf
2020-06-02 12:18   ` Sergio Lopez
2020-06-02 13:04     ` Kevin Wolf [this message]
2020-06-02 14:51       ` Sergio Lopez
2020-06-02 15:30       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200602130433.GH5940@linux.fritz.box \
    --to=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=slp@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.