* [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code @ 2014-05-08 10:16 Stefan Hajnoczi 2014-05-08 11:33 ` Fam Zheng ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Stefan Hajnoczi @ 2014-05-08 10:16 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Max Reitz Here is background on the latest dataplane work in my "[PATCH v2 00/25] dataplane: use QEMU block layer" series. It's necessary for anyone who wants to build on top of it. Please leave feedback or questions and I'll submit a docs/ patch with the final version of this document. This document explains the IOThread feature and how to write code that runs outside the QEMU global mutex. The main loop and IOThreads --------------------------- QEMU is an event-driven program that can do several things at once using an event loop. The VNC server and the QMP monitor are both processed from the same event loop which monitors their file descriptors until they become readable and then invokes a callback. The default event loop is called the main loop (see main-loop.c). It is possible to create additional event loop threads using -object iothread,id=my-iothread. Side note: The main loop and IOThread are both event loops but their code is not shared completely. Sometimes it is useful to remember that although they are conceptually similar they are currently not interchangeable. Why IOThreads are useful ------------------------ IOThreads allow the user to control the placement of work. The main loop is a scalability bottleneck on hosts with many CPUs. Work can be spread across several IOThreads instead of just one main loop. When set up correctly this can improve I/O latency and reduce jitter seen by the guest. The main loop is also deeply associated with the QEMU global mutex, which is a scalability bottleneck in itself. vCPU threads and the main loop use the QEMU global mutex to serialize execution of QEMU code. This mutex is necessary because a lot of QEMU's code historically was not thread-safe. The fact that all I/O processing is done in a single main loop and that the QEMU global mutex is contended by all vCPU threads and the main loop explain why it is desirable to place work into IOThreads. The experimental virtio-blk data-plane implementation has been benchmarked and shows these effects: ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf How to program for IOThreads ---------------------------- The main difference between legacy code and new code that can run in an IOThread is dealing explicitly with the event loop object, AioContext (see include/block/aio.h). Code that only works in the main loop implicitly uses the main loop's AioContext. Code that supports running in IOThreads must be aware of its AioContext. AioContext supports the following services: * File descriptor monitoring (read/write/error) * Event notifiers (inter-thread signalling) * Timers * Bottom Halves (BH) deferred callbacks There are several old APIs that use the main loop AioContext: * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier * LEGACY timer_new_ms() - create a timer * LEGACY qemu_bh_new() - create a BH * LEGACY qemu_aio_wait() - run an event loop iteration Since they implicitly work on the main loop they cannot be used in code that runs in an IOThread. They might cause a crash or deadlock if called from an IOThread since the QEMU global mutex is not held. Instead, use the AioContext functions directly (see include/block/aio.h): * aio_set_fd_handler() - monitor a file descriptor * aio_set_event_notifier() - monitor an event notifier * aio_timer_new() - create a timer * aio_bh_new() - create a BH * aio_poll() - run an event loop iteration The AioContext can be obtained from the IOThread using iothread_get_aio_context() or for the main loop using qemu_get_aio_context(). This way your code works both in IOThreads or the main loop. How to synchronize with an IOThread ----------------------------------- AioContext is not thread-safe so some rules must be followed when using file descriptors, event notifiers, timers, or BHs across threads: 1. AioContext functions can be called safely from file descriptor, event notifier, timer, or BH callbacks invoked by the AioContext. No locking is necessary. 2. Other threads wishing to access the AioContext must use aio_context_acquire()/aio_context_release() for mutual exclusion. Once the context is acquired no other thread can access it or run event loop iterations in this AioContext. aio_context_acquire()/aio_context_release() calls may be nested. This means you can call them if you're not sure whether #1 applies. Side note: the best way to schedule a function call across threads is to create a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No acquire/release or locking is needed for the qemu_bh_schedule() call. But be sure to acquire the AioContext for aio_bh_new() if necessary. The relationship between AioContext and the block layer ------------------------------------------------------- The AioContext originates from the QEMU block layer because it provides a scoped way of running event loop iterations until all work is done. This feature is used to complete all in-flight block I/O requests (see bdrv_drain_all()). Nowadays AioContext is a generic event loop that can be used by any QEMU subsystem. The block layer has support for AioContext integrated. Each BlockDriverState is associated with an AioContext using bdrv_set_aio_context() and bdrv_get_aio_context(). This allows block layer code to process I/O inside the right AioContext. Other subsystems may wish to follow a similar approach. If main loop code such as a QMP function wishes to access a BlockDriverState it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the IOThread does not run in parallel. Long-running jobs (usually in the form of coroutines) are best scheduled in the BlockDriverState's AioContext to avoid the need to acquire/release around each bdrv_*() call. Be aware that there is currently no mechanism to get notified when bdrv_set_aio_context() moves this BlockDriverState to a different AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you may need to add this if you want to support long-running jobs. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 10:16 [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code Stefan Hajnoczi @ 2014-05-08 11:33 ` Fam Zheng 2014-05-08 11:56 ` Stefan Hajnoczi 2014-05-08 13:08 ` Kevin Wolf 2014-05-08 13:44 ` Dr. David Alan Gilbert 2 siblings, 1 reply; 11+ messages in thread From: Fam Zheng @ 2014-05-08 11:33 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Max Reitz Great document, thanks for writing this! I have a few questions below. On Thu, 05/08 12:16, Stefan Hajnoczi wrote: > Here is background on the latest dataplane work in my "[PATCH v2 00/25] > dataplane: use QEMU block layer" series. It's necessary for anyone who wants > to build on top of it. Please leave feedback or questions and I'll submit a > docs/ patch with the final version of this document. > > > This document explains the IOThread feature and how to write code that runs > outside the QEMU global mutex. > > The main loop and IOThreads > --------------------------- > QEMU is an event-driven program that can do several things at once using an > event loop. The VNC server and the QMP monitor are both processed from the > same event loop which monitors their file descriptors until they become > readable and then invokes a callback. > > The default event loop is called the main loop (see main-loop.c). It is > possible to create additional event loop threads using -object > iothread,id=my-iothread. Is dataplane the only user for this now? > > Side note: The main loop and IOThread are both event loops but their code is > not shared completely. Sometimes it is useful to remember that although they > are conceptually similar they are currently not interchangeable. > > Why IOThreads are useful > ------------------------ > IOThreads allow the user to control the placement of work. The main loop is a > scalability bottleneck on hosts with many CPUs. Work can be spread across > several IOThreads instead of just one main loop. When set up correctly this > can improve I/O latency and reduce jitter seen by the guest. > > The main loop is also deeply associated with the QEMU global mutex, which is a > scalability bottleneck in itself. vCPU threads and the main loop use the QEMU > global mutex to serialize execution of QEMU code. This mutex is necessary > because a lot of QEMU's code historically was not thread-safe. > > The fact that all I/O processing is done in a single main loop and that the > QEMU global mutex is contended by all vCPU threads and the main loop explain > why it is desirable to place work into IOThreads. > > The experimental virtio-blk data-plane implementation has been benchmarked and > shows these effects: > ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf > > How to program for IOThreads > ---------------------------- > The main difference between legacy code and new code that can run in an > IOThread is dealing explicitly with the event loop object, AioContext > (see include/block/aio.h). Code that only works in the main loop > implicitly uses the main loop's AioContext. Code that supports running > in IOThreads must be aware of its AioContext. > > AioContext supports the following services: > * File descriptor monitoring (read/write/error) > * Event notifiers (inter-thread signalling) > * Timers > * Bottom Halves (BH) deferred callbacks > > There are several old APIs that use the main loop AioContext: > * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor > * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier > * LEGACY timer_new_ms() - create a timer > * LEGACY qemu_bh_new() - create a BH > * LEGACY qemu_aio_wait() - run an event loop iteration > > Since they implicitly work on the main loop they cannot be used in code that > runs in an IOThread. They might cause a crash or deadlock if called from an > IOThread since the QEMU global mutex is not held. > > Instead, use the AioContext functions directly (see include/block/aio.h): > * aio_set_fd_handler() - monitor a file descriptor > * aio_set_event_notifier() - monitor an event notifier > * aio_timer_new() - create a timer > * aio_bh_new() - create a BH > * aio_poll() - run an event loop iteration > > The AioContext can be obtained from the IOThread using > iothread_get_aio_context() or for the main loop using qemu_get_aio_context(). > This way your code works both in IOThreads or the main loop. I think such code knows about its iothread, so iothread_get_aio_context is enough, why need to mention (using) qemu_get_aio_context here? > > How to synchronize with an IOThread > ----------------------------------- > AioContext is not thread-safe so some rules must be followed when using file > descriptors, event notifiers, timers, or BHs across threads: > > 1. AioContext functions can be called safely from file descriptor, event > notifier, timer, or BH callbacks invoked by the AioContext. No locking is > necessary. > > 2. Other threads wishing to access the AioContext must use > aio_context_acquire()/aio_context_release() for mutual exclusion. Once the > context is acquired no other thread can access it or run event loop iterations > in this AioContext. > > aio_context_acquire()/aio_context_release() calls may be nested. This > means you can call them if you're not sure whether #1 applies. > > Side note: the best way to schedule a function call across threads is to create > a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No > acquire/release or locking is needed for the qemu_bh_schedule() call. But be > sure to acquire the AioContext for aio_bh_new() if necessary. > > The relationship between AioContext and the block layer > ------------------------------------------------------- > The AioContext originates from the QEMU block layer because it provides a > scoped way of running event loop iterations until all work is done. This > feature is used to complete all in-flight block I/O requests (see > bdrv_drain_all()). Nowadays AioContext is a generic event loop that can be > used by any QEMU subsystem. There was a concern about lock ordering, currently we only acquire contexts from main loop and vCPU threads, so we're safe. Do we enforce this rule? If we use this reentrant lock in others parts of QEMU, what are the rules then? > > The block layer has support for AioContext integrated. Each BlockDriverState > is associated with an AioContext using bdrv_set_aio_context() and > bdrv_get_aio_context(). This allows block layer code to process I/O inside the > right AioContext. Other subsystems may wish to follow a similar approach. > > If main loop code such as a QMP function wishes to access a BlockDriverState it > must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the > IOThread does not run in parallel. Does it imply that adding aio_context_acquire and aio_context_release protection inside a bdrv_* function makes it (at least in the sense of IOThreads) thread-safe? > > Long-running jobs (usually in the form of coroutines) are best scheduled in the > BlockDriverState's AioContext to avoid the need to acquire/release around each > bdrv_*() call. Be aware that there is currently no mechanism to get notified > when bdrv_set_aio_context() moves this BlockDriverState to a different > AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you > may need to add this if you want to support long-running jobs. Is block job a case of this? Looks like a subtask of adding support of block jobs in dataplane. Thanks, Fam ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 11:33 ` Fam Zheng @ 2014-05-08 11:56 ` Stefan Hajnoczi 2014-05-08 12:12 ` Paolo Bonzini 0 siblings, 1 reply; 11+ messages in thread From: Stefan Hajnoczi @ 2014-05-08 11:56 UTC (permalink / raw) To: Fam Zheng Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, Max Reitz On Thu, May 8, 2014 at 1:33 PM, Fam Zheng <famz@redhat.com> wrote: > On Thu, 05/08 12:16, Stefan Hajnoczi wrote: >> Here is background on the latest dataplane work in my "[PATCH v2 00/25] >> dataplane: use QEMU block layer" series. It's necessary for anyone who wants >> to build on top of it. Please leave feedback or questions and I'll submit a >> docs/ patch with the final version of this document. >> >> >> This document explains the IOThread feature and how to write code that runs >> outside the QEMU global mutex. >> >> The main loop and IOThreads >> --------------------------- >> QEMU is an event-driven program that can do several things at once using an >> event loop. The VNC server and the QMP monitor are both processed from the >> same event loop which monitors their file descriptors until they become >> readable and then invokes a callback. >> >> The default event loop is called the main loop (see main-loop.c). It is >> possible to create additional event loop threads using -object >> iothread,id=my-iothread. > > Is dataplane the only user for this now? Yes, and neither dataplane (x-data-plane=on) nor IOThread (-object iothread,id=<name>) are finalized. There was a discussion about -object and QOM on the mailing list a while back. We reached the conclusion that -object shouldn't be a supported command-line interface, it should be used for testing, development, etc. So an -iothread option still needs to be added. >> >> Side note: The main loop and IOThread are both event loops but their code is >> not shared completely. Sometimes it is useful to remember that although they >> are conceptually similar they are currently not interchangeable. >> >> Why IOThreads are useful >> ------------------------ >> IOThreads allow the user to control the placement of work. The main loop is a >> scalability bottleneck on hosts with many CPUs. Work can be spread across >> several IOThreads instead of just one main loop. When set up correctly this >> can improve I/O latency and reduce jitter seen by the guest. >> >> The main loop is also deeply associated with the QEMU global mutex, which is a >> scalability bottleneck in itself. vCPU threads and the main loop use the QEMU >> global mutex to serialize execution of QEMU code. This mutex is necessary >> because a lot of QEMU's code historically was not thread-safe. >> >> The fact that all I/O processing is done in a single main loop and that the >> QEMU global mutex is contended by all vCPU threads and the main loop explain >> why it is desirable to place work into IOThreads. >> >> The experimental virtio-blk data-plane implementation has been benchmarked and >> shows these effects: >> ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf >> >> How to program for IOThreads >> ---------------------------- >> The main difference between legacy code and new code that can run in an >> IOThread is dealing explicitly with the event loop object, AioContext >> (see include/block/aio.h). Code that only works in the main loop >> implicitly uses the main loop's AioContext. Code that supports running >> in IOThreads must be aware of its AioContext. >> >> AioContext supports the following services: >> * File descriptor monitoring (read/write/error) >> * Event notifiers (inter-thread signalling) >> * Timers >> * Bottom Halves (BH) deferred callbacks >> >> There are several old APIs that use the main loop AioContext: >> * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor >> * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier >> * LEGACY timer_new_ms() - create a timer >> * LEGACY qemu_bh_new() - create a BH >> * LEGACY qemu_aio_wait() - run an event loop iteration >> >> Since they implicitly work on the main loop they cannot be used in code that >> runs in an IOThread. They might cause a crash or deadlock if called from an >> IOThread since the QEMU global mutex is not held. >> >> Instead, use the AioContext functions directly (see include/block/aio.h): >> * aio_set_fd_handler() - monitor a file descriptor >> * aio_set_event_notifier() - monitor an event notifier >> * aio_timer_new() - create a timer >> * aio_bh_new() - create a BH >> * aio_poll() - run an event loop iteration >> >> The AioContext can be obtained from the IOThread using >> iothread_get_aio_context() or for the main loop using qemu_get_aio_context(). >> This way your code works both in IOThreads or the main loop. > > I think such code knows about its iothread, so iothread_get_aio_context is > enough, why need to mention (using) qemu_get_aio_context here? I want to encourage people to write code that works in *any* AioContext, not just IOThreads and not just the main loop. When someone has written code that works in IOThreads it will probably look like this: void do_my_thing(MyObject *obj, AioContext *aio_context); If you want to call do_my_thing() from the main loop you need to know about qemu_get_aio_context() so you can call it: do_my_thing(obj, qemu_get_aio_context()); >> >> How to synchronize with an IOThread >> ----------------------------------- >> AioContext is not thread-safe so some rules must be followed when using file >> descriptors, event notifiers, timers, or BHs across threads: >> >> 1. AioContext functions can be called safely from file descriptor, event >> notifier, timer, or BH callbacks invoked by the AioContext. No locking is >> necessary. >> >> 2. Other threads wishing to access the AioContext must use >> aio_context_acquire()/aio_context_release() for mutual exclusion. Once the >> context is acquired no other thread can access it or run event loop iterations >> in this AioContext. >> >> aio_context_acquire()/aio_context_release() calls may be nested. This >> means you can call them if you're not sure whether #1 applies. >> >> Side note: the best way to schedule a function call across threads is to create >> a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No >> acquire/release or locking is needed for the qemu_bh_schedule() call. But be >> sure to acquire the AioContext for aio_bh_new() if necessary. >> >> The relationship between AioContext and the block layer >> ------------------------------------------------------- >> The AioContext originates from the QEMU block layer because it provides a >> scoped way of running event loop iterations until all work is done. This >> feature is used to complete all in-flight block I/O requests (see >> bdrv_drain_all()). Nowadays AioContext is a generic event loop that can be >> used by any QEMU subsystem. > > There was a concern about lock ordering, currently we only acquire contexts > from main loop and vCPU threads, so we're safe. Do we enforce this rule? If we > use this reentrant lock in others parts of QEMU, what are the rules then? AioContext acquire/release is a reentrant lock (RFifoLock). This is useful since it makes it easier to write composable code that doesn't deadlock if called inside a context that already has the AioContext acquired. Regarding lock ordering, there is currently no reason for IOThread code (virtio-blk data-plane) to acquire another AioContext. It only needs its own BlockDriverState AioContext. That's why we don't need to worry about lock ordering problems - only the main loop will acquire another AioContext. I will add a note about lock ordering. >> >> The block layer has support for AioContext integrated. Each BlockDriverState >> is associated with an AioContext using bdrv_set_aio_context() and >> bdrv_get_aio_context(). This allows block layer code to process I/O inside the >> right AioContext. Other subsystems may wish to follow a similar approach. >> >> If main loop code such as a QMP function wishes to access a BlockDriverState it >> must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the >> IOThread does not run in parallel. > > Does it imply that adding aio_context_acquire and aio_context_release > protection inside a bdrv_* function makes it (at least in the sense of > IOThreads) thread-safe? Yes. The AioContext lock is what protects the BlockDriverState. >> >> Long-running jobs (usually in the form of coroutines) are best scheduled in the >> BlockDriverState's AioContext to avoid the need to acquire/release around each >> bdrv_*() call. Be aware that there is currently no mechanism to get notified >> when bdrv_set_aio_context() moves this BlockDriverState to a different >> AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you >> may need to add this if you want to support long-running jobs. > > Is block job a case of this? Looks like a subtask of adding support of block > jobs in dataplane. Yes, they are currently not available when x-data-plane=on is used because it sets bdrv_in_use. Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 11:56 ` Stefan Hajnoczi @ 2014-05-08 12:12 ` Paolo Bonzini 0 siblings, 0 replies; 11+ messages in thread From: Paolo Bonzini @ 2014-05-08 12:12 UTC (permalink / raw) To: Stefan Hajnoczi, Fam Zheng Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz Il 08/05/2014 13:56, Stefan Hajnoczi ha scritto: >> > Is dataplane the only user for this now? > Yes, and neither dataplane (x-data-plane=on) nor IOThread (-object > iothread,id=<name>) are finalized. > > There was a discussion about -object and QOM on the mailing list a > while back. We reached the conclusion that -object shouldn't be a > supported command-line interface, it should be used for testing, > development, etc. So an -iothread option still needs to be added. Actually I think that wasn't the conclusion. "-object" is a supported command-line interface; we also support hotplug/unplug nowadays for it, and the implementation makes QMP entirely typesafe unlike netdev_add and device_add. We're using it for iothreads and virtio-rng backends, and we'll add memory backends in 2.1. However, the agreement was that "QMP methods" are the preferred interface to work with objects. Properties and qom-get/qom-set are not the way to build a command-line interface. Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 10:16 [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code Stefan Hajnoczi 2014-05-08 11:33 ` Fam Zheng @ 2014-05-08 13:08 ` Kevin Wolf 2014-05-08 13:10 ` Paolo Bonzini 2014-05-08 13:44 ` Dr. David Alan Gilbert 2 siblings, 1 reply; 11+ messages in thread From: Kevin Wolf @ 2014-05-08 13:08 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Paolo Bonzini, Fam Zheng, qemu-devel, Max Reitz Am 08.05.2014 um 12:16 hat Stefan Hajnoczi geschrieben: > Side note: The main loop and IOThread are both event loops but their code is > not shared completely. Sometimes it is useful to remember that although they > are conceptually similar they are currently not interchangeable. We need to be careful with the terminology. The choice made here, that the main loop thread is not an IOThread, is somewhat unfortunate, because traditionally, the "I/O thread" has been what the main loop thread is called (in contrast to vcpu threads). Kevin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 13:08 ` Kevin Wolf @ 2014-05-08 13:10 ` Paolo Bonzini 2014-05-08 13:18 ` Stefan Hajnoczi 0 siblings, 1 reply; 11+ messages in thread From: Paolo Bonzini @ 2014-05-08 13:10 UTC (permalink / raw) To: Kevin Wolf, Stefan Hajnoczi; +Cc: Fam Zheng, qemu-devel, Max Reitz Il 08/05/2014 15:08, Kevin Wolf ha scritto: > Am 08.05.2014 um 12:16 hat Stefan Hajnoczi geschrieben: >> Side note: The main loop and IOThread are both event loops but their code is >> not shared completely. Sometimes it is useful to remember that although they >> are conceptually similar they are currently not interchangeable. > > We need to be careful with the terminology. The choice made here, that > the main loop thread is not an IOThread, is somewhat unfortunate, > because traditionally, the "I/O thread" has been what the main loop > thread is called (in contrast to vcpu threads). Note that the main loop thread could definitely be an IOThread, since it has an AioContext and the IOThread is just a QOM veneer for AioContext. It's just not done yet. Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 13:10 ` Paolo Bonzini @ 2014-05-08 13:18 ` Stefan Hajnoczi 0 siblings, 0 replies; 11+ messages in thread From: Stefan Hajnoczi @ 2014-05-08 13:18 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi, Max Reitz On Thu, May 08, 2014 at 03:10:48PM +0200, Paolo Bonzini wrote: > Il 08/05/2014 15:08, Kevin Wolf ha scritto: > >Am 08.05.2014 um 12:16 hat Stefan Hajnoczi geschrieben: > >>Side note: The main loop and IOThread are both event loops but their code is > >>not shared completely. Sometimes it is useful to remember that although they > >>are conceptually similar they are currently not interchangeable. > > > >We need to be careful with the terminology. The choice made here, that > >the main loop thread is not an IOThread, is somewhat unfortunate, > >because traditionally, the "I/O thread" has been what the main loop > >thread is called (in contrast to vcpu threads). > > Note that the main loop thread could definitely be an IOThread, > since it has an AioContext and the IOThread is just a QOM veneer for > AioContext. It's just not done yet. Yes, it's definitely a goal to unify the event loop implementations in QEMU. The main loop is more than just an AioContext, it's a glib event loop. The AioContext is a custom event loop that can be added to a glib event loop (as a GSource) but also supports BHs and high-resolution timers. Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 10:16 [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code Stefan Hajnoczi 2014-05-08 11:33 ` Fam Zheng 2014-05-08 13:08 ` Kevin Wolf @ 2014-05-08 13:44 ` Dr. David Alan Gilbert 2014-05-08 14:42 ` Stefan Hajnoczi 2 siblings, 1 reply; 11+ messages in thread From: Dr. David Alan Gilbert @ 2014-05-08 13:44 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Max Reitz * Stefan Hajnoczi (stefanha@redhat.com) wrote: <snip> > How to synchronize with an IOThread > ----------------------------------- > AioContext is not thread-safe so some rules must be followed when using file > descriptors, event notifiers, timers, or BHs across threads: > > 1. AioContext functions can be called safely from file descriptor, event > notifier, timer, or BH callbacks invoked by the AioContext. No locking is > necessary. > > 2. Other threads wishing to access the AioContext must use > aio_context_acquire()/aio_context_release() for mutual exclusion. Once the > context is acquired no other thread can access it or run event loop iterations > in this AioContext. > > aio_context_acquire()/aio_context_release() calls may be nested. This > means you can call them if you're not sure whether #1 applies. > > Side note: the best way to schedule a function call across threads is to create > a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No > acquire/release or locking is needed for the qemu_bh_schedule() call. But be > sure to acquire the AioContext for aio_bh_new() if necessary. How do these IOThreads pause during migration? Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls? Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 13:44 ` Dr. David Alan Gilbert @ 2014-05-08 14:42 ` Stefan Hajnoczi 2014-05-08 18:58 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 11+ messages in thread From: Stefan Hajnoczi @ 2014-05-08 14:42 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Kevin Wolf, Fam Zheng, qemu-devel, Max Reitz, Stefan Hajnoczi, Paolo Bonzini On Thu, May 8, 2014 at 3:44 PM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Stefan Hajnoczi (stefanha@redhat.com) wrote: > > <snip> > >> How to synchronize with an IOThread >> ----------------------------------- >> AioContext is not thread-safe so some rules must be followed when using file >> descriptors, event notifiers, timers, or BHs across threads: >> >> 1. AioContext functions can be called safely from file descriptor, event >> notifier, timer, or BH callbacks invoked by the AioContext. No locking is >> necessary. >> >> 2. Other threads wishing to access the AioContext must use >> aio_context_acquire()/aio_context_release() for mutual exclusion. Once the >> context is acquired no other thread can access it or run event loop iterations >> in this AioContext. >> >> aio_context_acquire()/aio_context_release() calls may be nested. This >> means you can call them if you're not sure whether #1 applies. >> >> Side note: the best way to schedule a function call across threads is to create >> a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No >> acquire/release or locking is needed for the qemu_bh_schedule() call. But be >> sure to acquire the AioContext for aio_bh_new() if necessary. > > How do these IOThreads pause during migration? > Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls? Currently the only IOThread user is virtio-blk data-plane. It has a VM state change listener registered that will stop using the IOThread during migration. In the future we'll have to do more than that: It is possible to suspend all IOThreads simply by looping over IOThread objects and calling aio_context_acquire() on their AioContext. You can release the AioContexts when you are done. This would be suitable for a "stop the world" operation for migration hand-over. For smaller one-off operations like block-migration.c it may also make sense to acquire/release the AioContext. But that's not necessary today since dataplane is disabled during migration. Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 14:42 ` Stefan Hajnoczi @ 2014-05-08 18:58 ` Dr. David Alan Gilbert 2014-05-09 8:20 ` Stefan Hajnoczi 0 siblings, 1 reply; 11+ messages in thread From: Dr. David Alan Gilbert @ 2014-05-08 18:58 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, Max Reitz, Stefan Hajnoczi, Paolo Bonzini * Stefan Hajnoczi (stefanha@gmail.com) wrote: > On Thu, May 8, 2014 at 3:44 PM, Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > * Stefan Hajnoczi (stefanha@redhat.com) wrote: > > > > <snip> > > > >> How to synchronize with an IOThread > >> ----------------------------------- > >> AioContext is not thread-safe so some rules must be followed when using file > >> descriptors, event notifiers, timers, or BHs across threads: > >> > >> 1. AioContext functions can be called safely from file descriptor, event > >> notifier, timer, or BH callbacks invoked by the AioContext. No locking is > >> necessary. > >> > >> 2. Other threads wishing to access the AioContext must use > >> aio_context_acquire()/aio_context_release() for mutual exclusion. Once the > >> context is acquired no other thread can access it or run event loop iterations > >> in this AioContext. > >> > >> aio_context_acquire()/aio_context_release() calls may be nested. This > >> means you can call them if you're not sure whether #1 applies. > >> > >> Side note: the best way to schedule a function call across threads is to create > >> a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No > >> acquire/release or locking is needed for the qemu_bh_schedule() call. But be > >> sure to acquire the AioContext for aio_bh_new() if necessary. > > > > How do these IOThreads pause during migration? > > Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls? > > Currently the only IOThread user is virtio-blk data-plane. It has a > VM state change listener registered that will stop using the IOThread > during migration. > > In the future we'll have to do more than that: > It is possible to suspend all IOThreads simply by looping over > IOThread objects and calling aio_context_acquire() on their > AioContext. You can release the AioContexts when you are done. This > would be suitable for a "stop the world" operation for migration > hand-over. That worries me for two reasons: 1) I'm assuming there is some subtlety so that it doesn't deadlock when another thread is trying to get a couple of contexts. 2) The migration code that has to pause everything is reasonably time critical (OK not super critical - but it worries if it gains more than a few ms). Doing something to each thread in series where that thread might have to finish up a transaction sounds like it could add together to be quite large. > For smaller one-off operations like block-migration.c it may also make > sense to acquire/release the AioContext. But that's not necessary > today since dataplane is disabled during migration. I guess it's probably right to hide this behind some interface on the Aio stuff that migration can call and it can worry about speed, and locking order etc. I also would we end up wanting some IOThreads to continue - e.g. could we be using them for transport of the migration stream or are they strictly for the guests use? Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code 2014-05-08 18:58 ` Dr. David Alan Gilbert @ 2014-05-09 8:20 ` Stefan Hajnoczi 0 siblings, 0 replies; 11+ messages in thread From: Stefan Hajnoczi @ 2014-05-09 8:20 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Kevin Wolf, Fam Zheng, qemu-devel, Max Reitz, Stefan Hajnoczi, Paolo Bonzini On Thu, May 08, 2014 at 07:58:11PM +0100, Dr. David Alan Gilbert wrote: > * Stefan Hajnoczi (stefanha@gmail.com) wrote: > > On Thu, May 8, 2014 at 3:44 PM, Dr. David Alan Gilbert > > <dgilbert@redhat.com> wrote: > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote: > > > > > > <snip> > > > > > >> How to synchronize with an IOThread > > >> ----------------------------------- > > >> AioContext is not thread-safe so some rules must be followed when using file > > >> descriptors, event notifiers, timers, or BHs across threads: > > >> > > >> 1. AioContext functions can be called safely from file descriptor, event > > >> notifier, timer, or BH callbacks invoked by the AioContext. No locking is > > >> necessary. > > >> > > >> 2. Other threads wishing to access the AioContext must use > > >> aio_context_acquire()/aio_context_release() for mutual exclusion. Once the > > >> context is acquired no other thread can access it or run event loop iterations > > >> in this AioContext. > > >> > > >> aio_context_acquire()/aio_context_release() calls may be nested. This > > >> means you can call them if you're not sure whether #1 applies. > > >> > > >> Side note: the best way to schedule a function call across threads is to create > > >> a BH in the target AioContext beforehand and then call qemu_bh_schedule(). No > > >> acquire/release or locking is needed for the qemu_bh_schedule() call. But be > > >> sure to acquire the AioContext for aio_bh_new() if necessary. > > > > > > How do these IOThreads pause during migration? > > > Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls? > > > > Currently the only IOThread user is virtio-blk data-plane. It has a > > VM state change listener registered that will stop using the IOThread > > during migration. > > > > In the future we'll have to do more than that: > > It is possible to suspend all IOThreads simply by looping over > > IOThread objects and calling aio_context_acquire() on their > > AioContext. You can release the AioContexts when you are done. This > > would be suitable for a "stop the world" operation for migration > > hand-over. > > That worries me for two reasons: > 1) I'm assuming there is some subtlety so that it doesn't deadlock when > another thread is trying to get a couple of contexts. Only the main loop acquires contexts, that's why there is no lock ordering problem. > 2) The migration code that has to pause everything is reasonably time > critical (OK not super critical - but it worries if it gains more than a few > ms). Doing something to each thread in series where that thread might > have to finish up a transaction sounds like it could add together to be quite > large. It's no different from today where we need to bdrv_drain_all(); bdrv_flush_all(). That's a synchronous operation that can take a while. > > For smaller one-off operations like block-migration.c it may also make > > sense to acquire/release the AioContext. But that's not necessary > > today since dataplane is disabled during migration. > > I guess it's probably right to hide this behind some interface on the Aio stuff > that migration can call and it can worry about speed, and locking order etc. > > I also would we end up wanting some IOThreads to continue - e.g. could we be using > them for transport of the migration stream or are they strictly for the guests > use? IOThreads are just threads running AioContext event loops. They are generic and could be used for stuff I/O intensive stuff like migration or the VNC server. Stefan ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-05-09 8:20 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-08 10:16 [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code Stefan Hajnoczi 2014-05-08 11:33 ` Fam Zheng 2014-05-08 11:56 ` Stefan Hajnoczi 2014-05-08 12:12 ` Paolo Bonzini 2014-05-08 13:08 ` Kevin Wolf 2014-05-08 13:10 ` Paolo Bonzini 2014-05-08 13:18 ` Stefan Hajnoczi 2014-05-08 13:44 ` Dr. David Alan Gilbert 2014-05-08 14:42 ` Stefan Hajnoczi 2014-05-08 18:58 ` Dr. David Alan Gilbert 2014-05-09 8:20 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).