From: Kevin Wolf <kwolf@redhat.com>
To: qemu-devel@nongnu.org
Cc: kwolf@redhat.com
Subject: [Qemu-devel] [PULL 16/62] docs/multiple-iothreads.txt: add documentation on IOThread programming
Date: Fri, 8 Aug 2014 19:39:17 +0200 [thread overview]
Message-ID: <1407519603-6635-17-git-send-email-kwolf@redhat.com> (raw)
In-Reply-To: <1407519603-6635-1-git-send-email-kwolf@redhat.com>
From: Stefan Hajnoczi <stefanha@redhat.com>
This document explains how IOThreads and the main loop are related,
especially how to write code that can run in an IOThread. Currently
only virtio-blk-data-plane uses these techniques. The next obvious
target is virtio-scsi; there has also been work on virtio-net.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
docs/multiple-iothreads.txt | 134 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 134 insertions(+)
create mode 100644 docs/multiple-iothreads.txt
diff --git a/docs/multiple-iothreads.txt b/docs/multiple-iothreads.txt
new file mode 100644
index 0000000..40b8419
--- /dev/null
+++ b/docs/multiple-iothreads.txt
@@ -0,0 +1,134 @@
+Copyright (c) 2014 Red Hat Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later. See
+the COPYING file in the top-level directory.
+
+
+This document explains the IOThread feature and how to write code that runs
+outside the QEMU global mutex.
+
+The main loop and IOThreads
+---------------------------
+QEMU is an event-driven program that can do several things at once using an
+event loop. The VNC server and the QMP monitor are both processed from the
+same event loop, which monitors their file descriptors until they become
+readable and then invokes a callback.
+
+The default event loop is called the main loop (see main-loop.c). It is
+possible to create additional event loop threads using -object
+iothread,id=my-iothread.
+
+Side note: The main loop and IOThread are both event loops but their code is
+not shared completely. Sometimes it is useful to remember that although they
+are conceptually similar they are currently not interchangeable.
+
+Why IOThreads are useful
+------------------------
+IOThreads allow the user to control the placement of work. The main loop is a
+scalability bottleneck on hosts with many CPUs. Work can be spread across
+several IOThreads instead of just one main loop. When set up correctly this
+can improve I/O latency and reduce jitter seen by the guest.
+
+The main loop is also deeply associated with the QEMU global mutex, which is a
+scalability bottleneck in itself. vCPU threads and the main loop use the QEMU
+global mutex to serialize execution of QEMU code. This mutex is necessary
+because a lot of QEMU's code historically was not thread-safe.
+
+The fact that all I/O processing is done in a single main loop and that the
+QEMU global mutex is contended by all vCPU threads and the main loop explain
+why it is desirable to place work into IOThreads.
+
+The experimental virtio-blk data-plane implementation has been benchmarked and
+shows these effects:
+ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
+
+How to program for IOThreads
+----------------------------
+The main difference between legacy code and new code that can run in an
+IOThread is dealing explicitly with the event loop object, AioContext
+(see include/block/aio.h). Code that only works in the main loop
+implicitly uses the main loop's AioContext. Code that supports running
+in IOThreads must be aware of its AioContext.
+
+AioContext supports the following services:
+ * File descriptor monitoring (read/write/error on POSIX hosts)
+ * Event notifiers (inter-thread signalling)
+ * Timers
+ * Bottom Halves (BH) deferred callbacks
+
+There are several old APIs that use the main loop AioContext:
+ * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
+ * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
+ * LEGACY timer_new_ms() - create a timer
+ * LEGACY qemu_bh_new() - create a BH
+ * LEGACY qemu_aio_wait() - run an event loop iteration
+
+Since they implicitly work on the main loop they cannot be used in code that
+runs in an IOThread. They might cause a crash or deadlock if called from an
+IOThread since the QEMU global mutex is not held.
+
+Instead, use the AioContext functions directly (see include/block/aio.h):
+ * aio_set_fd_handler() - monitor a file descriptor
+ * aio_set_event_notifier() - monitor an event notifier
+ * aio_timer_new() - create a timer
+ * aio_bh_new() - create a BH
+ * aio_poll() - run an event loop iteration
+
+The AioContext can be obtained from the IOThread using
+iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
+Code that takes an AioContext argument works both in IOThreads or the main
+loop, depending on which AioContext instance the caller passes in.
+
+How to synchronize with an IOThread
+-----------------------------------
+AioContext is not thread-safe so some rules must be followed when using file
+descriptors, event notifiers, timers, or BHs across threads:
+
+1. AioContext functions can be called safely from file descriptor, event
+notifier, timer, or BH callbacks invoked by the AioContext. No locking is
+necessary.
+
+2. Other threads wishing to access the AioContext must use
+aio_context_acquire()/aio_context_release() for mutual exclusion. Once the
+context is acquired no other thread can access it or run event loop iterations
+in this AioContext.
+
+aio_context_acquire()/aio_context_release() calls may be nested. This
+means you can call them if you're not sure whether #1 applies.
+
+There is currently no lock ordering rule if a thread needs to acquire multiple
+AioContexts simultaneously. Therefore, it is only safe for code holding the
+QEMU global mutex to acquire other AioContexts.
+
+Side note: the best way to schedule a function call across threads is to create
+a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No
+acquire/release or locking is needed for the qemu_bh_schedule() call. But be
+sure to acquire the AioContext for aio_bh_new() if necessary.
+
+The relationship between AioContext and the block layer
+-------------------------------------------------------
+The AioContext originates from the QEMU block layer because it provides a
+scoped way of running event loop iterations until all work is done. This
+feature is used to complete all in-flight block I/O requests (see
+bdrv_drain_all()). Nowadays AioContext is a generic event loop that can be
+used by any QEMU subsystem.
+
+The block layer has support for AioContext integrated. Each BlockDriverState
+is associated with an AioContext using bdrv_set_aio_context() and
+bdrv_get_aio_context(). This allows block layer code to process I/O inside the
+right AioContext. Other subsystems may wish to follow a similar approach.
+
+Block layer code must therefore expect to run in an IOThread and avoid using
+old APIs that implicitly use the main loop. See the "How to program for
+IOThreads" above for information on how to do that.
+
+If main loop code such as a QMP function wishes to access a BlockDriverState it
+must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
+IOThread does not run in parallel.
+
+Long-running jobs (usually in the form of coroutines) are best scheduled in the
+BlockDriverState's AioContext to avoid the need to acquire/release around each
+bdrv_*() call. Be aware that there is currently no mechanism to get notified
+when bdrv_set_aio_context() moves this BlockDriverState to a different
+AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
+may need to add this if you want to support long-running jobs.
--
1.8.3.1
next prev parent reply other threads:[~2014-08-08 17:40 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-08 17:39 [Qemu-devel] [PULL 00/62] Block patches Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 01/62] nbd: Drop nbd_can_read() Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 02/62] block: Add AIO context notifiers Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 03/62] nbd: Follow the BDS' AIO context Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 04/62] block: New bdrv_nb_sectors() Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 05/62] block: Use bdrv_nb_sectors() in bdrv_make_zero() Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 06/62] block: Use bdrv_nb_sectors() in bdrv_aligned_preadv() Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 07/62] block: Use bdrv_nb_sectors() in bdrv_co_get_block_status() Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 08/62] block: Use bdrv_nb_sectors() in img_convert() Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 09/62] block: Use bdrv_nb_sectors() where sectors, not bytes are wanted Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 10/62] block: Drop superfluous aligning of bdrv_getlength()'s value Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 11/62] qemu-img: Make img_convert() get image size just once per image Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 12/62] block: Avoid bdrv_get_geometry() where errors should be detected Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 13/62] docs: Make the recommendation for the backing file name position a requirement Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 14/62] configure: explicitly state version requirements to devel packages Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 15/62] xen_disk: fix possible null-ptr dereference Kevin Wolf
2014-08-08 17:39 ` Kevin Wolf [this message]
2014-08-08 17:39 ` [Qemu-devel] [PULL 17/62] qmp: hide "hotplugged" device property from device-list-properties Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 18/62] qdev-monitor: include QOM properties in -device FOO, help output Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 19/62] qemu-iotests: Add data pattern in version3 VMDK sample image in 059 Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 20/62] vmdk: Optimize cluster allocation Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 21/62] qemu-img info: show nocow info Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 22/62] block: Support Archipelago as a QEMU block backend Kevin Wolf
2015-04-09 3:48 ` Andreas Färber
2015-04-09 12:48 ` Chrysostomos Nanakos
2015-04-09 14:05 ` Stefan Hajnoczi
2015-04-09 14:08 ` Chrysostomos Nanakos
2014-08-08 17:39 ` [Qemu-devel] [PULL 23/62] block/archipelago: Implement bdrv_parse_filename() Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 24/62] block/archipelago: Add support for creating images Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 25/62] QMP: Add support for Archipelago Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 26/62] qemu-iotests: add support for Archipelago protocol Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 27/62] coroutine: make pool size dynamic Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 28/62] block: bump coroutine pool size for drives Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 29/62] thread-pool: avoid per-thread-pool EventNotifier Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 30/62] thread-pool: avoid deadlock in nested aio_poll() calls Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 31/62] block: vhdx - add error check Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 32/62] block: VHDX endian fixes Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 33/62] test-coroutine: add baseline test that times the cost of function calls Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 34/62] block: allow bdrv_unref() to be passed NULL pointers Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 35/62] block: vdi - use block layer ops in vdi_create, instead of posix calls Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 36/62] block: use the standard 'ret' instead of 'result' Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 37/62] block: vpc - use block layer ops in vpc_create, instead of posix calls Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 38/62] block: iotest - update 084 to test static VDI image creation Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 39/62] block: Introduce qemu_try_blockalign() Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 40/62] block: Handle failure for potentially large allocations Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 41/62] bochs: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 42/62] cloop: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 43/62] curl: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 44/62] dmg: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 45/62] iscsi: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 46/62] nfs: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 47/62] parallels: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 48/62] qcow1: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 49/62] qcow2: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 50/62] qed: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 51/62] raw-posix: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 52/62] raw-win32: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 53/62] rbd: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 54/62] vdi: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 55/62] vhdx: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 56/62] vmdk: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 57/62] vpc: " Kevin Wolf
2014-08-08 17:39 ` [Qemu-devel] [PULL 58/62] mirror: " Kevin Wolf
2014-08-08 17:40 ` [Qemu-devel] [PULL 59/62] qcow2: Return useful error code in refcount_init() Kevin Wolf
2014-08-08 17:40 ` [Qemu-devel] [PULL 60/62] qcow2: Catch !*host_offset for data allocation Kevin Wolf
2014-08-08 17:40 ` [Qemu-devel] [PULL 61/62] iotests: Add test for image header overlap Kevin Wolf
2014-08-08 17:40 ` [Qemu-devel] [PULL 62/62] block: Catch !bs->drv in bdrv_check() Kevin Wolf
2014-08-15 12:41 ` [Qemu-devel] [PULL 00/62] Block patches Peter Maydell
2014-08-15 13:10 ` Kevin Wolf
2014-08-15 15:34 ` Peter Maydell
2014-08-15 20:21 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1407519603-6635-17-git-send-email-kwolf@redhat.com \
--to=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).