From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57864) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC4ox-0008G1-2n for qemu-devel@nongnu.org; Wed, 21 Aug 2013 05:35:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VC4on-0002ul-Tm for qemu-devel@nongnu.org; Wed, 21 Aug 2013 05:35:18 -0400 Received: from e23smtp07.au.ibm.com ([202.81.31.140]:53794) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC4on-0002sr-29 for qemu-devel@nongnu.org; Wed, 21 Aug 2013 05:35:09 -0400 Received: from /spool/local by e23smtp07.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 21 Aug 2013 19:21:36 +1000 Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21]) by d23dlp03.au.ibm.com (Postfix) with ESMTP id 0E8E83578052 for ; Wed, 21 Aug 2013 19:35:02 +1000 (EST) Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r7L9YkOR3735880 for ; Wed, 21 Aug 2013 19:34:51 +1000 Received: from d23av01.au.ibm.com (localhost [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r7L9YuVv030638 for ; Wed, 21 Aug 2013 19:34:56 +1000 Message-ID: <52148975.1060102@linux.vnet.ibm.com> Date: Wed, 21 Aug 2013 17:33:41 +0800 From: Wenchao Xia MIME-Version: 1.0 References: <5205F3CA.1060009@redhat.com> <520884C0.60208@linux.vnet.ibm.com> <788101843.267798.1376292628614.JavaMail.root@redhat.com> <20130812170118.22636.27943@loki> <5209F1F7.6040202@linux.vnet.ibm.com> <20130815152320.16563.51609@loki> <20130815163207.16563.45553@loki> <520DD198.60708@linux.vnet.ibm.com> <520DDED6.2060401@linux.vnet.ibm.com> <52133E1E.7090403@linux.vnet.ibm.com> <20130821084553.GB10379@stefanha-thinkpad.redhat.com> In-Reply-To: <20130821084553.GB10379@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Kevin Wolf , Ping Fan Liu , Alex Bligh , Michael Roth , qemu-devel , Anthony Liguori , Paolo Bonzini 于 2013-8-21 16:45, Stefan Hajnoczi 写道: > On Tue, Aug 20, 2013 at 05:59:58PM +0800, Wenchao Xia wrote: >> 于 2013-8-16 16:12, Wenchao Xia 写道: >>> 于 2013-8-16 15:15, Wenchao Xia 写道: >>>> 于 2013-8-16 0:32, Michael Roth 写道: >>>>> Quoting Michael Roth (2013-08-15 10:23:20) >>>>>> Quoting Wenchao Xia (2013-08-13 03:44:39) >>>>>>> 于 2013-8-13 1:01, Michael Roth 写道: >>>>>>>> Quoting Paolo Bonzini (2013-08-12 02:30:28) >>>>>>>>>> 1) rename AioContext to AioSource. >>>>>>>>>> This is my major purpose, which declare it is not a "context" >>>>>>>>>> concept, >>>>>>>>>> and GMainContext is the entity represent the thread's activity. >>>>>>>>> >>>>>>>>> Note that the nested event loops in QEMU are _very_ different from >>>>>>>>> glib nested event loops. In QEMU, nested event loops only run block >>>>>>>>> layer events. In glib, they run all events. That's why you need >>>>>>>>> AioContext. >>>>>>>> >>>>>>>> Would it be possible to use glib for our nested loops as well by just >>>>>>>> setting a higher priority for the AioContext GSource? >>>>>>>> >>>>>>>> Stefan and I were considering how we could make use of his "drop >>>>>>>> ioflush" patches to use a common mechanism to register fd events, but >>>>>>>> still allow us to distinguish between AioContext and non-AioContext >>>>>>>> for nested loops. I was originally thinking of using prepare() >>>>>>>> functions >>>>>>>> to filter out non-AioContext events, but that requires we implement >>>>>>>> on GSource's with that in mind, and non make use of pre-baked ones >>>>>>>> like GIOChannel's, and bakes block stuff into every event source >>>>>>>> implementation. >>>>>>>> >>>>>>> Besides priority, also g_source_set_can_recurse() can help. >>>>>>> With a deeper think, I found a harder problem: >>>>>>> g_main_context_acquire() and g_main_context_release(). In release, >>>>>>> pending BH/IO call back need to be cleared, but this action can't >>>>>>> be triggered automatically when user call g_main_context_release(). >>>>>> >>>>>> I don't understand why this is a requirement, gmctx_acquire/release >>>>>> ensure >>>>>> that only one thread attempts to iterate the main loop at a time. this >>>>>> isn't currently an issue in qemu, and if we re-implemented >>>>>> qemu_aio_wait() >>>>>> to use the same glib interfaces, the tracking of in-flight requests >>>>>> would >>>>>> be moved to the block layer via Stefan's 'drop io_flush' patches, which >>>>>> moves that block-specific logic out of the main loop/AioContext GSource >>>>>> by design. Are there other areas where you see this as a problem? >>>>> >>>>> I think I understand better what you're referring to, you mean that >>>>> if qemu_aio_wait was called, and was implementated to essentially call >>>>> g_main_context_iterate(), that after, say, 1 iteration, the >>>>> iothread/dataplane thread might acquire the main loop and dispatch >>>>> block/non-block events between qemu_aio_wait() returned? The simple >>>>> approach would be to have qemu_aio_wait() call >>>>> g_main_context_acquire/release >>>>> at the start end of the function, which would ensure that this never >>>>> happens. >>>>> >>>> qemu_aio_wait() is relative simple to resolve by >>>> g_main_context_acquire(), but also shows additional code needed >>>> for a thread switch: >>>> (guess following is the plan to implement qemu_aio_wait()) >>>> qemu_aio_wait(GMainContext *ctx) >>>> { >>>> return g_main_context(ctx, PRI_BLOCK); >>>> } >>>> at caller: >>>> { >>>> while (qemu_aio_wait(ctx) { >>>> ; >>>> } >>>> g_main_context_release(ctx); >>>> } >>>> The above code shows that, in *ctx switch operation, there is >>>> more than glib's own event loop API envolved, qemu_aio_wait(). So >>>> it referenced to a question: what data structure >>>> should be used to represent context concept and control the thread >>>> switching behavior? It is better to have a clear layer, GMainContext or >>>> GlibQContext, instead of GMainContext plus custom function. The caller >>>> reference to at least two: nested user block layer, flat user above >>>> block layer. >>>> In my opinion, this problem is brought by Gsource AioContext, Take >>>> the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as >>>> an example, there are aync following operations involved for AioContext: >>>> 1 the bdrv_cb() will be executed in bdrv_co_em_bh(). >>>> 2 bdrv_co_io_em_complete() will be executed in event_notfier_ready(). >>>> 3 aio_worker() will be executed in worker_thread(). >>>> Operation 2 and 3 are tracked by block layer's queue after Stefan's >>>> dropping io_flush() series. >>>> Now if we stick to GMainContext to represent context concept, >>>> then when thread B want to aquire GMainContext used by thread A, >>>> all works setupped by A should be finished before B aquire it, >>>> otherwise B will execute some function supposed to work in A. The >>>> work refers to the three kind of aync functions above. >>>> For this issue, we can solve it by different means: >>>> 1 the event loop API doesn't guarentee work setupped by thread A >>>> will always be finished in A. This put a limitation to caller to >>>> consider thread switching. I talked on IRC with Stefan, and thinks >>>> it is possible for the nested user block layer, but I still want to >>>> avoid this to the flat user above block layer. >>>> 2 ask caller to finish all pending operations. >>>> 2.1 glib GMainContext API plus custom API such as aio_wait(). This is >>>> bad that detail under GMainContext is exposed to caller. Since >>>> operation 1 mentioned before is not tracked yet, to make sure bdrv_cb() >>>> is called in thread setupped it, 1 need to be added in the track >>>> queue, or in the call chain of aio_wait(), check the queue plus check >>>> operation 1. Perhaps add a custom function ask caller to call as >>>> context_work_flush()? >>> If a well named API do the flush work present, using Glib API plus >>> it seems also OK, and can avoid wrapper. I guess >>> bdrv_drain_all(GMainContext *ctx, ...) can do it. >>> >> I haven't found a good answer in gstream, but want to show >> some idea from my understanding. >> >> Following is a brief picture of the current event loop in qemu, >> Alex's timer for AioContext is also drawn here: >> >> ======================== >> || I/O thread in vl.c || >> ======================== >> | >> run loop | >> | >> ==================== | >> || other || qemu_set_fd_handler2() ===================== >> || ||-----------------------------|| Main_loop || >> ||(vnc, migration)|| | ===================== >> ==================== | GLib | >> | event loop API| >> qemu_set_fd_handler()| | >> ----------------- ==================== >> | || GMainContext || >> | ==================== >> ========== | (should it be removed?) | >> || hw ||-------------------------------------- |GSouce >> ========== | | |Attach >> | main_loop_tlg| | >> qemu_bh_***()| | | >> | | | >> ======|===============|======================= >> || | | || >> =========== || ====== ================== ======= || >> || block ||---------------|| | BH | | TimerListGroup | | Fd | || >> =========== qemu_bh_***()|| ====== ================== ======= || >> qemu_aio_wait()|| || >> qemu_aio_set_fd_handler()|| AioContext || >> || (block layer's event collection) || >> ============================================= >> >> >> The main issue here is that components are tightly bind together and >> no clear layer represent the thread and event loop API. Block and hw >> code are inter acting with AioContext, so both GMainContext and >> AioContext are playing the role. I hope to form a library for block, >> So need to pick up one to provide event loop, the choice seems to be: >> 1 GMainContext. >> 2 AioContext. >> 3 Encapsulation, such as GlibQContext. >> >> 1) and 2) would not be perfect since non standard glib event loop will >> be exposed, 3) will shows a unified interface similar to glib main loop, >> but more code adjust. After some thinking, I guess AioContext is the >> easiest way, which represent the block's own event loop, and give up >> using glib event loop at this level, just add custom API as >> block_iterate(). Briefly, bdrv_read will becomes: >> int bdrv_read(AioContext *ctx, ....); > > I don't understand why you want to add AioContext *ctx to bdrv_read(). > The synchronous APIs already work fine since no event loop integration > is necessary at all (the event loop is hidden inside the synchronous > function). > OK... I used a wrong example. It should be bdrv_aio_readv(). At this level, do you think AioContext * should be used, instead of GMainContext *? > Since AioContext provides a GSource, integrating with an application's > glib event loop should also be easy. The only hard part is timers, > since we use nanosecond timers - there we should just round up to > millisecond granularity to libqblock. The nanosecond timers aren't > critical in libqblock, only for running guests. > > Stefan > -- Best Regards Wenchao Xia