From: Stefan Hajnoczi <stefanha@gmail.com>
To: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Ping Fan Liu <pingfank@linux.vnet.ibm.com>,
Alex Bligh <alex@alex.org.uk>,
Michael Roth <mdroth@linux.vnet.ibm.com>,
qemu-devel <qemu-devel@nongnu.org>,
Anthony Liguori <anthony@codemonkey.ws>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes
Date: Thu, 22 Aug 2013 13:40:51 +0200 [thread overview]
Message-ID: <20130822114051.GB27613@stefanha-thinkpad.redhat.com> (raw)
In-Reply-To: <52148975.1060102@linux.vnet.ibm.com>
On Wed, Aug 21, 2013 at 05:33:41PM +0800, Wenchao Xia wrote:
> 于 2013-8-21 16:45, Stefan Hajnoczi 写道:
> >On Tue, Aug 20, 2013 at 05:59:58PM +0800, Wenchao Xia wrote:
> >>于 2013-8-16 16:12, Wenchao Xia 写道:
> >>>于 2013-8-16 15:15, Wenchao Xia 写道:
> >>>>于 2013-8-16 0:32, Michael Roth 写道:
> >>>>>Quoting Michael Roth (2013-08-15 10:23:20)
> >>>>>>Quoting Wenchao Xia (2013-08-13 03:44:39)
> >>>>>>>于 2013-8-13 1:01, Michael Roth 写道:
> >>>>>>>>Quoting Paolo Bonzini (2013-08-12 02:30:28)
> >>>>>>>>>>1) rename AioContext to AioSource.
> >>>>>>>>>> This is my major purpose, which declare it is not a "context"
> >>>>>>>>>>concept,
> >>>>>>>>>>and GMainContext is the entity represent the thread's activity.
> >>>>>>>>>
> >>>>>>>>>Note that the nested event loops in QEMU are _very_ different from
> >>>>>>>>>glib nested event loops. In QEMU, nested event loops only run block
> >>>>>>>>>layer events. In glib, they run all events. That's why you need
> >>>>>>>>>AioContext.
> >>>>>>>>
> >>>>>>>>Would it be possible to use glib for our nested loops as well by just
> >>>>>>>>setting a higher priority for the AioContext GSource?
> >>>>>>>>
> >>>>>>>>Stefan and I were considering how we could make use of his "drop
> >>>>>>>>ioflush" patches to use a common mechanism to register fd events, but
> >>>>>>>>still allow us to distinguish between AioContext and non-AioContext
> >>>>>>>>for nested loops. I was originally thinking of using prepare()
> >>>>>>>>functions
> >>>>>>>>to filter out non-AioContext events, but that requires we implement
> >>>>>>>>on GSource's with that in mind, and non make use of pre-baked ones
> >>>>>>>>like GIOChannel's, and bakes block stuff into every event source
> >>>>>>>>implementation.
> >>>>>>>>
> >>>>>>> Besides priority, also g_source_set_can_recurse() can help.
> >>>>>>> With a deeper think, I found a harder problem:
> >>>>>>>g_main_context_acquire() and g_main_context_release(). In release,
> >>>>>>>pending BH/IO call back need to be cleared, but this action can't
> >>>>>>>be triggered automatically when user call g_main_context_release().
> >>>>>>
> >>>>>>I don't understand why this is a requirement, gmctx_acquire/release
> >>>>>>ensure
> >>>>>>that only one thread attempts to iterate the main loop at a time. this
> >>>>>>isn't currently an issue in qemu, and if we re-implemented
> >>>>>>qemu_aio_wait()
> >>>>>>to use the same glib interfaces, the tracking of in-flight requests
> >>>>>>would
> >>>>>>be moved to the block layer via Stefan's 'drop io_flush' patches, which
> >>>>>>moves that block-specific logic out of the main loop/AioContext GSource
> >>>>>>by design. Are there other areas where you see this as a problem?
> >>>>>
> >>>>>I think I understand better what you're referring to, you mean that
> >>>>>if qemu_aio_wait was called, and was implementated to essentially call
> >>>>>g_main_context_iterate(), that after, say, 1 iteration, the
> >>>>>iothread/dataplane thread might acquire the main loop and dispatch
> >>>>>block/non-block events between qemu_aio_wait() returned? The simple
> >>>>>approach would be to have qemu_aio_wait() call
> >>>>>g_main_context_acquire/release
> >>>>>at the start end of the function, which would ensure that this never
> >>>>>happens.
> >>>>>
> >>>> qemu_aio_wait() is relative simple to resolve by
> >>>>g_main_context_acquire(), but also shows additional code needed
> >>>>for a thread switch:
> >>>>(guess following is the plan to implement qemu_aio_wait())
> >>>>qemu_aio_wait(GMainContext *ctx)
> >>>>{
> >>>> return g_main_context(ctx, PRI_BLOCK);
> >>>>}
> >>>>at caller:
> >>>>{
> >>>> while (qemu_aio_wait(ctx) {
> >>>> ;
> >>>> }
> >>>> g_main_context_release(ctx);
> >>>>}
> >>>> The above code shows that, in *ctx switch operation, there is
> >>>>more than glib's own event loop API envolved, qemu_aio_wait(). So
> >>>>it referenced to a question: what data structure
> >>>>should be used to represent context concept and control the thread
> >>>>switching behavior? It is better to have a clear layer, GMainContext or
> >>>>GlibQContext, instead of GMainContext plus custom function. The caller
> >>>>reference to at least two: nested user block layer, flat user above
> >>>>block layer.
> >>>> In my opinion, this problem is brought by Gsource AioContext, Take
> >>>>the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as
> >>>>an example, there are aync following operations involved for AioContext:
> >>>>1 the bdrv_cb() will be executed in bdrv_co_em_bh().
> >>>>2 bdrv_co_io_em_complete() will be executed in event_notfier_ready().
> >>>>3 aio_worker() will be executed in worker_thread().
> >>>> Operation 2 and 3 are tracked by block layer's queue after Stefan's
> >>>>dropping io_flush() series.
> >>>> Now if we stick to GMainContext to represent context concept,
> >>>>then when thread B want to aquire GMainContext used by thread A,
> >>>>all works setupped by A should be finished before B aquire it,
> >>>>otherwise B will execute some function supposed to work in A. The
> >>>>work refers to the three kind of aync functions above.
> >>>> For this issue, we can solve it by different means:
> >>>>1 the event loop API doesn't guarentee work setupped by thread A
> >>>>will always be finished in A. This put a limitation to caller to
> >>>>consider thread switching. I talked on IRC with Stefan, and thinks
> >>>>it is possible for the nested user block layer, but I still want to
> >>>>avoid this to the flat user above block layer.
> >>>>2 ask caller to finish all pending operations.
> >>>> 2.1 glib GMainContext API plus custom API such as aio_wait(). This is
> >>>>bad that detail under GMainContext is exposed to caller. Since
> >>>>operation 1 mentioned before is not tracked yet, to make sure bdrv_cb()
> >>>>is called in thread setupped it, 1 need to be added in the track
> >>>>queue, or in the call chain of aio_wait(), check the queue plus check
> >>>>operation 1. Perhaps add a custom function ask caller to call as
> >>>>context_work_flush()?
> >>> If a well named API do the flush work present, using Glib API plus
> >>>it seems also OK, and can avoid wrapper. I guess
> >>>bdrv_drain_all(GMainContext *ctx, ...) can do it.
> >>>
> >> I haven't found a good answer in gstream, but want to show
> >>some idea from my understanding.
> >>
> >>Following is a brief picture of the current event loop in qemu,
> >>Alex's timer for AioContext is also drawn here:
> >>
> >> ========================
> >> || I/O thread in vl.c ||
> >> ========================
> >> |
> >> run loop |
> >> |
> >>==================== |
> >>|| other || qemu_set_fd_handler2() =====================
> >>|| ||-----------------------------|| Main_loop ||
> >>||(vnc, migration)|| | =====================
> >>==================== | GLib |
> >> | event loop API|
> >> qemu_set_fd_handler()| |
> >> ----------------- ====================
> >> | || GMainContext ||
> >> | ====================
> >>========== | (should it be removed?) |
> >>|| hw ||-------------------------------------- |GSouce
> >>========== | | |Attach
> >> | main_loop_tlg| |
> >> qemu_bh_***()| | |
> >> | | |
> >> ======|===============|=======================
> >> || | | ||
> >>=========== || ====== ================== ======= ||
> >>|| block ||---------------|| | BH | | TimerListGroup | | Fd | ||
> >>=========== qemu_bh_***()|| ====== ================== ======= ||
> >> qemu_aio_wait()|| ||
> >> qemu_aio_set_fd_handler()|| AioContext ||
> >> || (block layer's event collection) ||
> >> =============================================
> >>
> >>
> >> The main issue here is that components are tightly bind together and
> >>no clear layer represent the thread and event loop API. Block and hw
> >>code are inter acting with AioContext, so both GMainContext and
> >>AioContext are playing the role. I hope to form a library for block,
> >>So need to pick up one to provide event loop, the choice seems to be:
> >>1 GMainContext.
> >>2 AioContext.
> >>3 Encapsulation, such as GlibQContext.
> >>
> >> 1) and 2) would not be perfect since non standard glib event loop will
> >>be exposed, 3) will shows a unified interface similar to glib main loop,
> >>but more code adjust. After some thinking, I guess AioContext is the
> >>easiest way, which represent the block's own event loop, and give up
> >>using glib event loop at this level, just add custom API as
> >>block_iterate(). Briefly, bdrv_read will becomes:
> >>int bdrv_read(AioContext *ctx, ....);
> >
> >I don't understand why you want to add AioContext *ctx to bdrv_read().
> >The synchronous APIs already work fine since no event loop integration
> >is necessary at all (the event loop is hidden inside the synchronous
> >function).
> >
> OK... I used a wrong example. It should be bdrv_aio_readv(). At this
> level, do you think AioContext * should be used, instead of
> GMainContext *?
Maybe but...
> >Since AioContext provides a GSource, integrating with an application's
> >glib event loop should also be easy. The only hard part is timers,
> >since we use nanosecond timers - there we should just round up to
> >millisecond granularity to libqblock. The nanosecond timers aren't
> >critical in libqblock, only for running guests.
...we already have a GSource that applications can integrate into their
event loop. Why do you want to modify the code instead of just exposing
the GSource?
Stefan
next prev parent reply other threads:[~2013-08-22 11:41 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-10 3:24 [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes Wenchao Xia
2013-08-10 8:03 ` Paolo Bonzini
2013-08-12 6:46 ` Wenchao Xia
2013-08-12 7:30 ` Paolo Bonzini
2013-08-12 17:01 ` Michael Roth
2013-08-13 8:44 ` Wenchao Xia
2013-08-15 15:23 ` Michael Roth
2013-08-15 16:32 ` Michael Roth
2013-08-16 7:15 ` Wenchao Xia
2013-08-16 8:12 ` Wenchao Xia
2013-08-20 9:59 ` Wenchao Xia
2013-08-20 17:54 ` Alex Bligh
2013-08-21 8:45 ` Stefan Hajnoczi
2013-08-21 9:33 ` Wenchao Xia
2013-08-22 11:40 ` Stefan Hajnoczi [this message]
2013-08-21 10:06 ` Alex Bligh
2013-08-10 10:15 ` Alex Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130822114051.GB27613@stefanha-thinkpad.redhat.com \
--to=stefanha@gmail.com \
--cc=alex@alex.org.uk \
--cc=anthony@codemonkey.ws \
--cc=kwolf@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=pingfank@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=xiawenc@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).