qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
@ 2014-05-08 10:16 Stefan Hajnoczi
  2014-05-08 11:33 ` Fam Zheng
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2014-05-08 10:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, Max Reitz

Here is background on the latest dataplane work in my "[PATCH v2 00/25]
dataplane: use QEMU block layer" series.  It's necessary for anyone who wants
to build on top of it.  Please leave feedback or questions and I'll submit a
docs/ patch with the final version of this document.


This document explains the IOThread feature and how to write code that runs
outside the QEMU global mutex.

The main loop and IOThreads
---------------------------
QEMU is an event-driven program that can do several things at once using an
event loop.  The VNC server and the QMP monitor are both processed from the
same event loop which monitors their file descriptors until they become
readable and then invokes a callback.

The default event loop is called the main loop (see main-loop.c).  It is
possible to create additional event loop threads using -object
iothread,id=my-iothread.

Side note: The main loop and IOThread are both event loops but their code is
not shared completely.  Sometimes it is useful to remember that although they
are conceptually similar they are currently not interchangeable.

Why IOThreads are useful
------------------------
IOThreads allow the user to control the placement of work.  The main loop is a
scalability bottleneck on hosts with many CPUs.  Work can be spread across
several IOThreads instead of just one main loop.  When set up correctly this
can improve I/O latency and reduce jitter seen by the guest.

The main loop is also deeply associated with the QEMU global mutex, which is a
scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
global mutex to serialize execution of QEMU code.  This mutex is necessary
because a lot of QEMU's code historically was not thread-safe.

The fact that all I/O processing is done in a single main loop and that the
QEMU global mutex is contended by all vCPU threads and the main loop explain
why it is desirable to place work into IOThreads.

The experimental virtio-blk data-plane implementation has been benchmarked and
shows these effects:
ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf

How to program for IOThreads
----------------------------
The main difference between legacy code and new code that can run in an
IOThread is dealing explicitly with the event loop object, AioContext
(see include/block/aio.h).  Code that only works in the main loop
implicitly uses the main loop's AioContext.  Code that supports running
in IOThreads must be aware of its AioContext.

AioContext supports the following services:
 * File descriptor monitoring (read/write/error)
 * Event notifiers (inter-thread signalling)
 * Timers
 * Bottom Halves (BH) deferred callbacks

There are several old APIs that use the main loop AioContext:
 * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
 * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
 * LEGACY timer_new_ms() - create a timer
 * LEGACY qemu_bh_new() - create a BH
 * LEGACY qemu_aio_wait() - run an event loop iteration

Since they implicitly work on the main loop they cannot be used in code that
runs in an IOThread.  They might cause a crash or deadlock if called from an
IOThread since the QEMU global mutex is not held.

Instead, use the AioContext functions directly (see include/block/aio.h):
 * aio_set_fd_handler() - monitor a file descriptor
 * aio_set_event_notifier() - monitor an event notifier
 * aio_timer_new() - create a timer
 * aio_bh_new() - create a BH
 * aio_poll() - run an event loop iteration

The AioContext can be obtained from the IOThread using
iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
This way your code works both in IOThreads or the main loop.

How to synchronize with an IOThread
-----------------------------------
AioContext is not thread-safe so some rules must be followed when using file
descriptors, event notifiers, timers, or BHs across threads:

1. AioContext functions can be called safely from file descriptor, event
notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
necessary.

2. Other threads wishing to access the AioContext must use
aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
context is acquired no other thread can access it or run event loop iterations
in this AioContext.

aio_context_acquire()/aio_context_release() calls may be nested.  This
means you can call them if you're not sure whether #1 applies.

Side note: the best way to schedule a function call across threads is to create
a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
sure to acquire the AioContext for aio_bh_new() if necessary.

The relationship between AioContext and the block layer
-------------------------------------------------------
The AioContext originates from the QEMU block layer because it provides a
scoped way of running event loop iterations until all work is done.  This
feature is used to complete all in-flight block I/O requests (see
bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
used by any QEMU subsystem.

The block layer has support for AioContext integrated.  Each BlockDriverState
is associated with an AioContext using bdrv_set_aio_context() and
bdrv_get_aio_context().  This allows block layer code to process I/O inside the
right AioContext.  Other subsystems may wish to follow a similar approach.

If main loop code such as a QMP function wishes to access a BlockDriverState it
must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
IOThread does not run in parallel.

Long-running jobs (usually in the form of coroutines) are best scheduled in the
BlockDriverState's AioContext to avoid the need to acquire/release around each
bdrv_*() call.  Be aware that there is currently no mechanism to get notified
when bdrv_set_aio_context() moves this BlockDriverState to a different
AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
may need to add this if you want to support long-running jobs.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 10:16 [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code Stefan Hajnoczi
@ 2014-05-08 11:33 ` Fam Zheng
  2014-05-08 11:56   ` Stefan Hajnoczi
  2014-05-08 13:08 ` Kevin Wolf
  2014-05-08 13:44 ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 11+ messages in thread
From: Fam Zheng @ 2014-05-08 11:33 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Max Reitz

Great document, thanks for writing this! I have a few questions below.

On Thu, 05/08 12:16, Stefan Hajnoczi wrote:
> Here is background on the latest dataplane work in my "[PATCH v2 00/25]
> dataplane: use QEMU block layer" series.  It's necessary for anyone who wants
> to build on top of it.  Please leave feedback or questions and I'll submit a
> docs/ patch with the final version of this document.
> 
> 
> This document explains the IOThread feature and how to write code that runs
> outside the QEMU global mutex.
> 
> The main loop and IOThreads
> ---------------------------
> QEMU is an event-driven program that can do several things at once using an
> event loop.  The VNC server and the QMP monitor are both processed from the
> same event loop which monitors their file descriptors until they become
> readable and then invokes a callback.
> 
> The default event loop is called the main loop (see main-loop.c).  It is
> possible to create additional event loop threads using -object
> iothread,id=my-iothread.

Is dataplane the only user for this now?

> 
> Side note: The main loop and IOThread are both event loops but their code is
> not shared completely.  Sometimes it is useful to remember that although they
> are conceptually similar they are currently not interchangeable.
> 
> Why IOThreads are useful
> ------------------------
> IOThreads allow the user to control the placement of work.  The main loop is a
> scalability bottleneck on hosts with many CPUs.  Work can be spread across
> several IOThreads instead of just one main loop.  When set up correctly this
> can improve I/O latency and reduce jitter seen by the guest.
> 
> The main loop is also deeply associated with the QEMU global mutex, which is a
> scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
> global mutex to serialize execution of QEMU code.  This mutex is necessary
> because a lot of QEMU's code historically was not thread-safe.
> 
> The fact that all I/O processing is done in a single main loop and that the
> QEMU global mutex is contended by all vCPU threads and the main loop explain
> why it is desirable to place work into IOThreads.
> 
> The experimental virtio-blk data-plane implementation has been benchmarked and
> shows these effects:
> ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
> 
> How to program for IOThreads
> ----------------------------
> The main difference between legacy code and new code that can run in an
> IOThread is dealing explicitly with the event loop object, AioContext
> (see include/block/aio.h).  Code that only works in the main loop
> implicitly uses the main loop's AioContext.  Code that supports running
> in IOThreads must be aware of its AioContext.
> 
> AioContext supports the following services:
>  * File descriptor monitoring (read/write/error)
>  * Event notifiers (inter-thread signalling)
>  * Timers
>  * Bottom Halves (BH) deferred callbacks
> 
> There are several old APIs that use the main loop AioContext:
>  * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
>  * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
>  * LEGACY timer_new_ms() - create a timer
>  * LEGACY qemu_bh_new() - create a BH
>  * LEGACY qemu_aio_wait() - run an event loop iteration
> 
> Since they implicitly work on the main loop they cannot be used in code that
> runs in an IOThread.  They might cause a crash or deadlock if called from an
> IOThread since the QEMU global mutex is not held.
> 
> Instead, use the AioContext functions directly (see include/block/aio.h):
>  * aio_set_fd_handler() - monitor a file descriptor
>  * aio_set_event_notifier() - monitor an event notifier
>  * aio_timer_new() - create a timer
>  * aio_bh_new() - create a BH
>  * aio_poll() - run an event loop iteration
> 
> The AioContext can be obtained from the IOThread using
> iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
> This way your code works both in IOThreads or the main loop.

I think such code knows about its iothread, so iothread_get_aio_context is
enough, why need to mention (using) qemu_get_aio_context here?

> 
> How to synchronize with an IOThread
> -----------------------------------
> AioContext is not thread-safe so some rules must be followed when using file
> descriptors, event notifiers, timers, or BHs across threads:
> 
> 1. AioContext functions can be called safely from file descriptor, event
> notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
> necessary.
> 
> 2. Other threads wishing to access the AioContext must use
> aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
> context is acquired no other thread can access it or run event loop iterations
> in this AioContext.
> 
> aio_context_acquire()/aio_context_release() calls may be nested.  This
> means you can call them if you're not sure whether #1 applies.
> 
> Side note: the best way to schedule a function call across threads is to create
> a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
> acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
> sure to acquire the AioContext for aio_bh_new() if necessary.
> 
> The relationship between AioContext and the block layer
> -------------------------------------------------------
> The AioContext originates from the QEMU block layer because it provides a
> scoped way of running event loop iterations until all work is done.  This
> feature is used to complete all in-flight block I/O requests (see
> bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
> used by any QEMU subsystem.

There was a concern about lock ordering, currently we only acquire contexts
from main loop and vCPU threads, so we're safe. Do we enforce this rule? If we
use this reentrant lock in others parts of QEMU, what are the rules then?

> 
> The block layer has support for AioContext integrated.  Each BlockDriverState
> is associated with an AioContext using bdrv_set_aio_context() and
> bdrv_get_aio_context().  This allows block layer code to process I/O inside the
> right AioContext.  Other subsystems may wish to follow a similar approach.
> 
> If main loop code such as a QMP function wishes to access a BlockDriverState it
> must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
> IOThread does not run in parallel.

Does it imply that adding aio_context_acquire and aio_context_release
protection inside a bdrv_* function makes it (at least in the sense of
IOThreads) thread-safe?

> 
> Long-running jobs (usually in the form of coroutines) are best scheduled in the
> BlockDriverState's AioContext to avoid the need to acquire/release around each
> bdrv_*() call.  Be aware that there is currently no mechanism to get notified
> when bdrv_set_aio_context() moves this BlockDriverState to a different
> AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
> may need to add this if you want to support long-running jobs.

Is block job a case of this? Looks like a subtask of adding support of block
jobs in dataplane.

Thanks,
Fam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 11:33 ` Fam Zheng
@ 2014-05-08 11:56   ` Stefan Hajnoczi
  2014-05-08 12:12     ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2014-05-08 11:56 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, Max Reitz

On Thu, May 8, 2014 at 1:33 PM, Fam Zheng <famz@redhat.com> wrote:
> On Thu, 05/08 12:16, Stefan Hajnoczi wrote:
>> Here is background on the latest dataplane work in my "[PATCH v2 00/25]
>> dataplane: use QEMU block layer" series.  It's necessary for anyone who wants
>> to build on top of it.  Please leave feedback or questions and I'll submit a
>> docs/ patch with the final version of this document.
>>
>>
>> This document explains the IOThread feature and how to write code that runs
>> outside the QEMU global mutex.
>>
>> The main loop and IOThreads
>> ---------------------------
>> QEMU is an event-driven program that can do several things at once using an
>> event loop.  The VNC server and the QMP monitor are both processed from the
>> same event loop which monitors their file descriptors until they become
>> readable and then invokes a callback.
>>
>> The default event loop is called the main loop (see main-loop.c).  It is
>> possible to create additional event loop threads using -object
>> iothread,id=my-iothread.
>
> Is dataplane the only user for this now?

Yes, and neither dataplane (x-data-plane=on) nor IOThread (-object
iothread,id=<name>) are finalized.

There was a discussion about -object and QOM on the mailing list a
while back.  We reached the conclusion that -object shouldn't be a
supported command-line interface, it should be used for testing,
development, etc.  So an -iothread option still needs to be added.

>>
>> Side note: The main loop and IOThread are both event loops but their code is
>> not shared completely.  Sometimes it is useful to remember that although they
>> are conceptually similar they are currently not interchangeable.
>>
>> Why IOThreads are useful
>> ------------------------
>> IOThreads allow the user to control the placement of work.  The main loop is a
>> scalability bottleneck on hosts with many CPUs.  Work can be spread across
>> several IOThreads instead of just one main loop.  When set up correctly this
>> can improve I/O latency and reduce jitter seen by the guest.
>>
>> The main loop is also deeply associated with the QEMU global mutex, which is a
>> scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
>> global mutex to serialize execution of QEMU code.  This mutex is necessary
>> because a lot of QEMU's code historically was not thread-safe.
>>
>> The fact that all I/O processing is done in a single main loop and that the
>> QEMU global mutex is contended by all vCPU threads and the main loop explain
>> why it is desirable to place work into IOThreads.
>>
>> The experimental virtio-blk data-plane implementation has been benchmarked and
>> shows these effects:
>> ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
>>
>> How to program for IOThreads
>> ----------------------------
>> The main difference between legacy code and new code that can run in an
>> IOThread is dealing explicitly with the event loop object, AioContext
>> (see include/block/aio.h).  Code that only works in the main loop
>> implicitly uses the main loop's AioContext.  Code that supports running
>> in IOThreads must be aware of its AioContext.
>>
>> AioContext supports the following services:
>>  * File descriptor monitoring (read/write/error)
>>  * Event notifiers (inter-thread signalling)
>>  * Timers
>>  * Bottom Halves (BH) deferred callbacks
>>
>> There are several old APIs that use the main loop AioContext:
>>  * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
>>  * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
>>  * LEGACY timer_new_ms() - create a timer
>>  * LEGACY qemu_bh_new() - create a BH
>>  * LEGACY qemu_aio_wait() - run an event loop iteration
>>
>> Since they implicitly work on the main loop they cannot be used in code that
>> runs in an IOThread.  They might cause a crash or deadlock if called from an
>> IOThread since the QEMU global mutex is not held.
>>
>> Instead, use the AioContext functions directly (see include/block/aio.h):
>>  * aio_set_fd_handler() - monitor a file descriptor
>>  * aio_set_event_notifier() - monitor an event notifier
>>  * aio_timer_new() - create a timer
>>  * aio_bh_new() - create a BH
>>  * aio_poll() - run an event loop iteration
>>
>> The AioContext can be obtained from the IOThread using
>> iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
>> This way your code works both in IOThreads or the main loop.
>
> I think such code knows about its iothread, so iothread_get_aio_context is
> enough, why need to mention (using) qemu_get_aio_context here?

I want to encourage people to write code that works in *any*
AioContext, not just IOThreads and not just the main loop.  When
someone has written code that works in IOThreads it will probably look
like this:

void do_my_thing(MyObject *obj, AioContext *aio_context);

If you want to call do_my_thing() from the main loop you need to know
about qemu_get_aio_context() so you can call it:

do_my_thing(obj, qemu_get_aio_context());

>>
>> How to synchronize with an IOThread
>> -----------------------------------
>> AioContext is not thread-safe so some rules must be followed when using file
>> descriptors, event notifiers, timers, or BHs across threads:
>>
>> 1. AioContext functions can be called safely from file descriptor, event
>> notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
>> necessary.
>>
>> 2. Other threads wishing to access the AioContext must use
>> aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
>> context is acquired no other thread can access it or run event loop iterations
>> in this AioContext.
>>
>> aio_context_acquire()/aio_context_release() calls may be nested.  This
>> means you can call them if you're not sure whether #1 applies.
>>
>> Side note: the best way to schedule a function call across threads is to create
>> a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
>> acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
>> sure to acquire the AioContext for aio_bh_new() if necessary.
>>
>> The relationship between AioContext and the block layer
>> -------------------------------------------------------
>> The AioContext originates from the QEMU block layer because it provides a
>> scoped way of running event loop iterations until all work is done.  This
>> feature is used to complete all in-flight block I/O requests (see
>> bdrv_drain_all()).  Nowadays AioContext is a generic event loop that can be
>> used by any QEMU subsystem.
>
> There was a concern about lock ordering, currently we only acquire contexts
> from main loop and vCPU threads, so we're safe. Do we enforce this rule? If we
> use this reentrant lock in others parts of QEMU, what are the rules then?

AioContext acquire/release is a reentrant lock (RFifoLock).  This is
useful since it makes it easier to write composable code that doesn't
deadlock if called inside a context that already has the AioContext
acquired.

Regarding lock ordering, there is currently no reason for IOThread
code (virtio-blk data-plane) to acquire another AioContext.  It only
needs its own BlockDriverState AioContext.  That's why we don't need
to worry about lock ordering problems - only the main loop will
acquire another AioContext.

I will add a note about lock ordering.

>>
>> The block layer has support for AioContext integrated.  Each BlockDriverState
>> is associated with an AioContext using bdrv_set_aio_context() and
>> bdrv_get_aio_context().  This allows block layer code to process I/O inside the
>> right AioContext.  Other subsystems may wish to follow a similar approach.
>>
>> If main loop code such as a QMP function wishes to access a BlockDriverState it
>> must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure the
>> IOThread does not run in parallel.
>
> Does it imply that adding aio_context_acquire and aio_context_release
> protection inside a bdrv_* function makes it (at least in the sense of
> IOThreads) thread-safe?

Yes.  The AioContext lock is what protects the BlockDriverState.

>>
>> Long-running jobs (usually in the form of coroutines) are best scheduled in the
>> BlockDriverState's AioContext to avoid the need to acquire/release around each
>> bdrv_*() call.  Be aware that there is currently no mechanism to get notified
>> when bdrv_set_aio_context() moves this BlockDriverState to a different
>> AioContext (see bdrv_detach_aio_context()/bdrv_attach_aio_context()), so you
>> may need to add this if you want to support long-running jobs.
>
> Is block job a case of this? Looks like a subtask of adding support of block
> jobs in dataplane.

Yes, they are currently not available when x-data-plane=on is used
because it sets bdrv_in_use.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 11:56   ` Stefan Hajnoczi
@ 2014-05-08 12:12     ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2014-05-08 12:12 UTC (permalink / raw)
  To: Stefan Hajnoczi, Fam Zheng
  Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Max Reitz

Il 08/05/2014 13:56, Stefan Hajnoczi ha scritto:
>> > Is dataplane the only user for this now?
> Yes, and neither dataplane (x-data-plane=on) nor IOThread (-object
> iothread,id=<name>) are finalized.
>
> There was a discussion about -object and QOM on the mailing list a
> while back.  We reached the conclusion that -object shouldn't be a
> supported command-line interface, it should be used for testing,
> development, etc.  So an -iothread option still needs to be added.

Actually I think that wasn't the conclusion.

"-object" is a supported command-line interface; we also support 
hotplug/unplug nowadays for it, and the implementation makes QMP 
entirely typesafe unlike netdev_add and device_add.  We're using it for 
iothreads and virtio-rng backends, and we'll add memory backends in 2.1.

However, the agreement was that "QMP methods" are the preferred 
interface to work with objects.  Properties and qom-get/qom-set are not 
the way to build a command-line interface.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 10:16 [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code Stefan Hajnoczi
  2014-05-08 11:33 ` Fam Zheng
@ 2014-05-08 13:08 ` Kevin Wolf
  2014-05-08 13:10   ` Paolo Bonzini
  2014-05-08 13:44 ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 11+ messages in thread
From: Kevin Wolf @ 2014-05-08 13:08 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, Fam Zheng, qemu-devel, Max Reitz

Am 08.05.2014 um 12:16 hat Stefan Hajnoczi geschrieben:
> Side note: The main loop and IOThread are both event loops but their code is
> not shared completely.  Sometimes it is useful to remember that although they
> are conceptually similar they are currently not interchangeable.

We need to be careful with the terminology. The choice made here, that
the main loop thread is not an IOThread, is somewhat unfortunate,
because traditionally, the "I/O thread" has been what the main loop
thread is called (in contrast to vcpu threads).

Kevin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 13:08 ` Kevin Wolf
@ 2014-05-08 13:10   ` Paolo Bonzini
  2014-05-08 13:18     ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2014-05-08 13:10 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi; +Cc: Fam Zheng, qemu-devel, Max Reitz

Il 08/05/2014 15:08, Kevin Wolf ha scritto:
> Am 08.05.2014 um 12:16 hat Stefan Hajnoczi geschrieben:
>> Side note: The main loop and IOThread are both event loops but their code is
>> not shared completely.  Sometimes it is useful to remember that although they
>> are conceptually similar they are currently not interchangeable.
>
> We need to be careful with the terminology. The choice made here, that
> the main loop thread is not an IOThread, is somewhat unfortunate,
> because traditionally, the "I/O thread" has been what the main loop
> thread is called (in contrast to vcpu threads).

Note that the main loop thread could definitely be an IOThread, since it 
has an AioContext and the IOThread is just a QOM veneer for AioContext. 
  It's just not done yet.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 13:10   ` Paolo Bonzini
@ 2014-05-08 13:18     ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2014-05-08 13:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi, Max Reitz

On Thu, May 08, 2014 at 03:10:48PM +0200, Paolo Bonzini wrote:
> Il 08/05/2014 15:08, Kevin Wolf ha scritto:
> >Am 08.05.2014 um 12:16 hat Stefan Hajnoczi geschrieben:
> >>Side note: The main loop and IOThread are both event loops but their code is
> >>not shared completely.  Sometimes it is useful to remember that although they
> >>are conceptually similar they are currently not interchangeable.
> >
> >We need to be careful with the terminology. The choice made here, that
> >the main loop thread is not an IOThread, is somewhat unfortunate,
> >because traditionally, the "I/O thread" has been what the main loop
> >thread is called (in contrast to vcpu threads).
> 
> Note that the main loop thread could definitely be an IOThread,
> since it has an AioContext and the IOThread is just a QOM veneer for
> AioContext.  It's just not done yet.

Yes, it's definitely a goal to unify the event loop implementations in
QEMU.  The main loop is more than just an AioContext, it's a glib event
loop.  The AioContext is a custom event loop that can be added to a glib
event loop (as a GSource) but also supports BHs and high-resolution
timers.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 10:16 [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code Stefan Hajnoczi
  2014-05-08 11:33 ` Fam Zheng
  2014-05-08 13:08 ` Kevin Wolf
@ 2014-05-08 13:44 ` Dr. David Alan Gilbert
  2014-05-08 14:42   ` Stefan Hajnoczi
  2 siblings, 1 reply; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2014-05-08 13:44 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Max Reitz

* Stefan Hajnoczi (stefanha@redhat.com) wrote:

<snip>

> How to synchronize with an IOThread
> -----------------------------------
> AioContext is not thread-safe so some rules must be followed when using file
> descriptors, event notifiers, timers, or BHs across threads:
> 
> 1. AioContext functions can be called safely from file descriptor, event
> notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
> necessary.
> 
> 2. Other threads wishing to access the AioContext must use
> aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
> context is acquired no other thread can access it or run event loop iterations
> in this AioContext.
> 
> aio_context_acquire()/aio_context_release() calls may be nested.  This
> means you can call them if you're not sure whether #1 applies.
> 
> Side note: the best way to schedule a function call across threads is to create
> a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
> acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
> sure to acquire the AioContext for aio_bh_new() if necessary.

How do these IOThreads pause during migration?
Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls?

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 13:44 ` Dr. David Alan Gilbert
@ 2014-05-08 14:42   ` Stefan Hajnoczi
  2014-05-08 18:58     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2014-05-08 14:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

On Thu, May 8, 2014 at 3:44 PM, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
>
> <snip>
>
>> How to synchronize with an IOThread
>> -----------------------------------
>> AioContext is not thread-safe so some rules must be followed when using file
>> descriptors, event notifiers, timers, or BHs across threads:
>>
>> 1. AioContext functions can be called safely from file descriptor, event
>> notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
>> necessary.
>>
>> 2. Other threads wishing to access the AioContext must use
>> aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
>> context is acquired no other thread can access it or run event loop iterations
>> in this AioContext.
>>
>> aio_context_acquire()/aio_context_release() calls may be nested.  This
>> means you can call them if you're not sure whether #1 applies.
>>
>> Side note: the best way to schedule a function call across threads is to create
>> a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
>> acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
>> sure to acquire the AioContext for aio_bh_new() if necessary.
>
> How do these IOThreads pause during migration?
> Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls?

Currently the only IOThread user is virtio-blk data-plane.  It has a
VM state change listener registered that will stop using the IOThread
during migration.

In the future we'll have to do more than that:
It is possible to suspend all IOThreads simply by looping over
IOThread objects and calling aio_context_acquire() on their
AioContext.  You can release the AioContexts when you are done.  This
would be suitable for a "stop the world" operation for migration
hand-over.

For smaller one-off operations like block-migration.c it may also make
sense to acquire/release the AioContext.  But that's not necessary
today since dataplane is disabled during migration.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 14:42   ` Stefan Hajnoczi
@ 2014-05-08 18:58     ` Dr. David Alan Gilbert
  2014-05-09  8:20       ` Stefan Hajnoczi
  0 siblings, 1 reply; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2014-05-08 18:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

* Stefan Hajnoczi (stefanha@gmail.com) wrote:
> On Thu, May 8, 2014 at 3:44 PM, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> >
> > <snip>
> >
> >> How to synchronize with an IOThread
> >> -----------------------------------
> >> AioContext is not thread-safe so some rules must be followed when using file
> >> descriptors, event notifiers, timers, or BHs across threads:
> >>
> >> 1. AioContext functions can be called safely from file descriptor, event
> >> notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
> >> necessary.
> >>
> >> 2. Other threads wishing to access the AioContext must use
> >> aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
> >> context is acquired no other thread can access it or run event loop iterations
> >> in this AioContext.
> >>
> >> aio_context_acquire()/aio_context_release() calls may be nested.  This
> >> means you can call them if you're not sure whether #1 applies.
> >>
> >> Side note: the best way to schedule a function call across threads is to create
> >> a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
> >> acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
> >> sure to acquire the AioContext for aio_bh_new() if necessary.
> >
> > How do these IOThreads pause during migration?
> > Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls?
> 
> Currently the only IOThread user is virtio-blk data-plane.  It has a
> VM state change listener registered that will stop using the IOThread
> during migration.
> 
> In the future we'll have to do more than that:
> It is possible to suspend all IOThreads simply by looping over
> IOThread objects and calling aio_context_acquire() on their
> AioContext.  You can release the AioContexts when you are done.  This
> would be suitable for a "stop the world" operation for migration
> hand-over.

That worries me for two reasons:
   1) I'm assuming there is some subtlety so that it doesn't deadlock when
     another thread is trying to get a couple of contexts.
   2) The migration code that has to pause everything is reasonably time
      critical (OK not super critical - but it worries if it gains more than a few
      ms).   Doing something to each thread in series where that thread might
      have to finish up a transaction sounds like it could add together to be quite
      large.

> For smaller one-off operations like block-migration.c it may also make
> sense to acquire/release the AioContext.  But that's not necessary
> today since dataplane is disabled during migration.

I guess it's probably right to hide this behind some interface on the Aio stuff
that migration can call and it can worry about speed, and locking order etc.

I also would we end up wanting some IOThreads to continue - e.g. could we be using
them for transport of the migration stream or are they strictly for the guests
use?

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code
  2014-05-08 18:58     ` Dr. David Alan Gilbert
@ 2014-05-09  8:20       ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2014-05-09  8:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kevin Wolf, Fam Zheng, qemu-devel, Max Reitz, Stefan Hajnoczi,
	Paolo Bonzini

On Thu, May 08, 2014 at 07:58:11PM +0100, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@gmail.com) wrote:
> > On Thu, May 8, 2014 at 3:44 PM, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > >
> > > <snip>
> > >
> > >> How to synchronize with an IOThread
> > >> -----------------------------------
> > >> AioContext is not thread-safe so some rules must be followed when using file
> > >> descriptors, event notifiers, timers, or BHs across threads:
> > >>
> > >> 1. AioContext functions can be called safely from file descriptor, event
> > >> notifier, timer, or BH callbacks invoked by the AioContext.  No locking is
> > >> necessary.
> > >>
> > >> 2. Other threads wishing to access the AioContext must use
> > >> aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
> > >> context is acquired no other thread can access it or run event loop iterations
> > >> in this AioContext.
> > >>
> > >> aio_context_acquire()/aio_context_release() calls may be nested.  This
> > >> means you can call them if you're not sure whether #1 applies.
> > >>
> > >> Side note: the best way to schedule a function call across threads is to create
> > >> a BH in the target AioContext beforehand and then call qemu_bh_schedule().  No
> > >> acquire/release or locking is needed for the qemu_bh_schedule() call.  But be
> > >> sure to acquire the AioContext for aio_bh_new() if necessary.
> > >
> > > How do these IOThreads pause during migration?
> > > Are they paused by the 'qemu_mutex_lock_iothread' that the migration thread calls?
> > 
> > Currently the only IOThread user is virtio-blk data-plane.  It has a
> > VM state change listener registered that will stop using the IOThread
> > during migration.
> > 
> > In the future we'll have to do more than that:
> > It is possible to suspend all IOThreads simply by looping over
> > IOThread objects and calling aio_context_acquire() on their
> > AioContext.  You can release the AioContexts when you are done.  This
> > would be suitable for a "stop the world" operation for migration
> > hand-over.
> 
> That worries me for two reasons:
>    1) I'm assuming there is some subtlety so that it doesn't deadlock when
>      another thread is trying to get a couple of contexts.

Only the main loop acquires contexts, that's why there is no lock
ordering problem.

>    2) The migration code that has to pause everything is reasonably time
>       critical (OK not super critical - but it worries if it gains more than a few
>       ms).   Doing something to each thread in series where that thread might
>       have to finish up a transaction sounds like it could add together to be quite
>       large.

It's no different from today where we need to bdrv_drain_all();
bdrv_flush_all().  That's a synchronous operation that can take a while.

> > For smaller one-off operations like block-migration.c it may also make
> > sense to acquire/release the AioContext.  But that's not necessary
> > today since dataplane is disabled during migration.
> 
> I guess it's probably right to hide this behind some interface on the Aio stuff
> that migration can call and it can worry about speed, and locking order etc.
> 
> I also would we end up wanting some IOThreads to continue - e.g. could we be using
> them for transport of the migration stream or are they strictly for the guests
> use?

IOThreads are just threads running AioContext event loops.  They are
generic and could be used for stuff I/O intensive stuff like migration
or the VNC server.

Stefan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-05-09  8:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-08 10:16 [Qemu-devel] [RFC] dataplane: IOThreads and writing dataplane-capable code Stefan Hajnoczi
2014-05-08 11:33 ` Fam Zheng
2014-05-08 11:56   ` Stefan Hajnoczi
2014-05-08 12:12     ` Paolo Bonzini
2014-05-08 13:08 ` Kevin Wolf
2014-05-08 13:10   ` Paolo Bonzini
2014-05-08 13:18     ` Stefan Hajnoczi
2014-05-08 13:44 ` Dr. David Alan Gilbert
2014-05-08 14:42   ` Stefan Hajnoczi
2014-05-08 18:58     ` Dr. David Alan Gilbert
2014-05-09  8:20       ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).