From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45251) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VAF9D-0005EI-Mk for qemu-devel@nongnu.org; Fri, 16 Aug 2013 04:12:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VAF94-0003ae-Ml for qemu-devel@nongnu.org; Fri, 16 Aug 2013 04:12:39 -0400 Received: from e23smtp08.au.ibm.com ([202.81.31.141]:43132) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VAF93-0003aK-Sm for qemu-devel@nongnu.org; Fri, 16 Aug 2013 04:12:30 -0400 Received: from /spool/local by e23smtp08.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 16 Aug 2013 18:09:16 +1000 Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [9.190.234.120]) by d23dlp01.au.ibm.com (Postfix) with ESMTP id A7D322CE8052 for ; Fri, 16 Aug 2013 18:12:24 +1000 (EST) Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r7G7uWmX49545226 for ; Fri, 16 Aug 2013 17:56:33 +1000 Received: from d23av03.au.ibm.com (localhost [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r7G8CNJ3002730 for ; Fri, 16 Aug 2013 18:12:23 +1000 Message-ID: <520DDED6.2060401@linux.vnet.ibm.com> Date: Fri, 16 Aug 2013 16:12:06 +0800 From: Wenchao Xia MIME-Version: 1.0 References: <5205B251.2060907@linux.vnet.ibm.com> <5205F3CA.1060009@redhat.com> <520884C0.60208@linux.vnet.ibm.com> <788101843.267798.1376292628614.JavaMail.root@redhat.com> <20130812170118.22636.27943@loki> <5209F1F7.6040202@linux.vnet.ibm.com> <20130815152320.16563.51609@loki> <20130815163207.16563.45553@loki> <520DD198.60708@linux.vnet.ibm.com> In-Reply-To: <520DD198.60708@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth Cc: Kevin Wolf , Anthony Liguori , Ping Fan Liu , Stefan Hajnoczi , qemu-devel , Paolo Bonzini 于 2013-8-16 15:15, Wenchao Xia 写道: > 于 2013-8-16 0:32, Michael Roth 写道: >> Quoting Michael Roth (2013-08-15 10:23:20) >>> Quoting Wenchao Xia (2013-08-13 03:44:39) >>>> 于 2013-8-13 1:01, Michael Roth 写道: >>>>> Quoting Paolo Bonzini (2013-08-12 02:30:28) >>>>>>> 1) rename AioContext to AioSource. >>>>>>> This is my major purpose, which declare it is not a "context" >>>>>>> concept, >>>>>>> and GMainContext is the entity represent the thread's activity. >>>>>> >>>>>> Note that the nested event loops in QEMU are _very_ different from >>>>>> glib nested event loops. In QEMU, nested event loops only run block >>>>>> layer events. In glib, they run all events. That's why you need >>>>>> AioContext. >>>>> >>>>> Would it be possible to use glib for our nested loops as well by just >>>>> setting a higher priority for the AioContext GSource? >>>>> >>>>> Stefan and I were considering how we could make use of his "drop >>>>> ioflush" patches to use a common mechanism to register fd events, but >>>>> still allow us to distinguish between AioContext and non-AioContext >>>>> for nested loops. I was originally thinking of using prepare() >>>>> functions >>>>> to filter out non-AioContext events, but that requires we implement >>>>> on GSource's with that in mind, and non make use of pre-baked ones >>>>> like GIOChannel's, and bakes block stuff into every event source >>>>> implementation. >>>>> >>>> Besides priority, also g_source_set_can_recurse() can help. >>>> With a deeper think, I found a harder problem: >>>> g_main_context_acquire() and g_main_context_release(). In release, >>>> pending BH/IO call back need to be cleared, but this action can't >>>> be triggered automatically when user call g_main_context_release(). >>> >>> I don't understand why this is a requirement, gmctx_acquire/release >>> ensure >>> that only one thread attempts to iterate the main loop at a time. this >>> isn't currently an issue in qemu, and if we re-implemented >>> qemu_aio_wait() >>> to use the same glib interfaces, the tracking of in-flight requests >>> would >>> be moved to the block layer via Stefan's 'drop io_flush' patches, which >>> moves that block-specific logic out of the main loop/AioContext GSource >>> by design. Are there other areas where you see this as a problem? >> >> I think I understand better what you're referring to, you mean that >> if qemu_aio_wait was called, and was implementated to essentially call >> g_main_context_iterate(), that after, say, 1 iteration, the >> iothread/dataplane thread might acquire the main loop and dispatch >> block/non-block events between qemu_aio_wait() returned? The simple >> approach would be to have qemu_aio_wait() call >> g_main_context_acquire/release >> at the start end of the function, which would ensure that this never >> happens. >> > qemu_aio_wait() is relative simple to resolve by > g_main_context_acquire(), but also shows additional code needed > for a thread switch: > (guess following is the plan to implement qemu_aio_wait()) > qemu_aio_wait(GMainContext *ctx) > { > return g_main_context(ctx, PRI_BLOCK); > } > at caller: > { > while (qemu_aio_wait(ctx) { > ; > } > g_main_context_release(ctx); > } > The above code shows that, in *ctx switch operation, there is > more than glib's own event loop API envolved, qemu_aio_wait(). So > it referenced to a question: what data structure > should be used to represent context concept and control the thread > switching behavior? It is better to have a clear layer, GMainContext or > GlibQContext, instead of GMainContext plus custom function. The caller > reference to at least two: nested user block layer, flat user above > block layer. > In my opinion, this problem is brought by Gsource AioContext, Take > the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as > an example, there are aync following operations involved for AioContext: > 1 the bdrv_cb() will be executed in bdrv_co_em_bh(). > 2 bdrv_co_io_em_complete() will be executed in event_notfier_ready(). > 3 aio_worker() will be executed in worker_thread(). > Operation 2 and 3 are tracked by block layer's queue after Stefan's > dropping io_flush() series. > Now if we stick to GMainContext to represent context concept, > then when thread B want to aquire GMainContext used by thread A, > all works setupped by A should be finished before B aquire it, > otherwise B will execute some function supposed to work in A. The > work refers to the three kind of aync functions above. > For this issue, we can solve it by different means: > 1 the event loop API doesn't guarentee work setupped by thread A > will always be finished in A. This put a limitation to caller to > consider thread switching. I talked on IRC with Stefan, and thinks > it is possible for the nested user block layer, but I still want to > avoid this to the flat user above block layer. > 2 ask caller to finish all pending operations. > 2.1 glib GMainContext API plus custom API such as aio_wait(). This is > bad that detail under GMainContext is exposed to caller. Since > operation 1 mentioned before is not tracked yet, to make sure bdrv_cb() > is called in thread setupped it, 1 need to be added in the track > queue, or in the call chain of aio_wait(), check the queue plus check > operation 1. Perhaps add a custom function ask caller to call as > context_work_flush()? If a well named API do the flush work present, using Glib API plus it seems also OK, and can avoid wrapper. I guess bdrv_drain_all(GMainContext *ctx, ...) can do it. > 2.2 Use GlibQContext API. In this way all operation is hidden, > in whose release() the pending work will be finished. In this way > a clear layer represent event loop can be exposed. > > A clear layer represent event loop will make caller code clean, > especially helpful when try to expose block layer with AIO as a > public library. > Personally I hope not to use AioContext as context concept, add > iterate APIs around it will be tricky, since actually it is a GSource, > it will bring troubles to integrate with glib's event loop from my > intuition. So I hope to rename AioContext to AioSource, or break it > as a common case in QContext. Then I got bdrv_read(GlibQContext ctx*). > > With a talk with Stefan, I plan to read gstream doc to see how it > deal with such problem. > >>> >>>> For the above reason, I tend to think, maybe we should t wrap >>>> all of >>>> Glib's mainloop into custom encapsulation, such as QContext, Add the >>>> aio poll logic in q_context_release(). Use QContext * in every caller >>>> to hide GMainContext *, so QContext layer play the role of clear >>>> event loop API. >>>> >>>>> Priorities didn't cross my mind though, but it seems pretty >>>>> straightfoward... >>>>> >>>>> AioContext could then just be a container of sorts for managing >>>>> bottom-halves and AioContext FDs and binding them to the proper >>>>> GMainContext/MainLoop, but the underlying GSources could >>>>> still be driven by a normal glib-based mainloop, just with a specific >>>>> priority in the nested case. >>>>> >>>>>> >>>>>>> 2) Break AioSource into FdSource and BhSource. >>>>>>> This make custom code less and simpler, one Gsource for one >>>>>>> kind of >>>>>>> job. It is not necessary but IMHO it will make things clear when add >>>>>>> more things into main loop: add a new Gsource sub class, avoid to >>>>>>> always have relationship with AioContext. >>>>>> >>>>>> But this is only complicating things work since users rely on both >>>>>> file- >>>>>> descriptor APIs and bottom half APIs. >>>>> >>>>> Taking things a step further, maybe AioContext can stop being a >>>>> block-specific construct, but actually be the "QContext" we've >>>>> discussed in the past for managing multiple event loops. All >>>>> the block stuff would be hidden away in the GSource priority. >>>>> >>>>> For instance, >>>>> >>>>> #ifndef _WIN32 >>>>> >>>>> qemu_aio_set_fd_handler(fd, ...): >>>>> aio_set_fd_handler(qemu_aio_context, fd, ..., >>>>> QEMU_PRIORITY_BLOCK) >>>>> >>>>> qemu_set_fd_handler(fd, ...): >>>>> aio_set_fd_handler(qemu_aio_context, fd, ..., >>>>> G_PRIORITY_DEFAULT) >>>>> >>>>> #else >>>>> >>>>> qemu_add_wait_object(fd, ...): >>>>> add_wait_object(qemu_aio_context, fd, ...) >>>>> >>>>> qemu_set_fd_handler(fd, ...): >>>>> set_socket_handler(qemu_aio_context, fd, ..., >>>>> G_PRIORITY_DEFAULT) >>>>> >>>>> #endif >>>>> >>>>> qemu_bh_schedule: >>>>> bh_schedule(qemu_aio_context, ...) >>>>> >>>>> etc... >>>>> >>>>> I'll be sending patches this week for moving >>>>> add_wait_object/qemu_set_fd_handler to GSources, the non-global >>>>> ones use >>>>> GMainContext * to specify a non-default thread/context, but can be >>>>> easily >>>>> changed, or we can just do aioctx->g_main_context at the call sites. >>>>> There's some nice possibilities in using the former though: avoiding >>>>> O(n) lookups for stuff like finding the GSource for a particular >>>>> event/event type, for instance, by storing pointers to the GSource or >>>>> some kind of hashmap lookup. But probably better to discuss that >>>>> aspect >>>>> with some context so I'll try to get those patches out soon. >>>>> >>>>>> >>>>>>>>> More reasons: >>>>>>>>> When I thinking how to bind library code to a thread >>>>>>>>> context, it may >>>>>>>>> need to add Context's concept into API of block.c. If I use >>>>>>>>> AioContext, >>>>>>>>> there will need a wrapper API to run the event loop. But If I got >>>>>>>>> glib's GmainContext, things become simple. >>>>>> >>>>>> You already have it because AioContext is a GSource. You do not need >>>>>> to expose the AioContext, except as a GSource. >>>>>> >>>> I think expose GmainContext * or QContext *, is better than >>>> GSource *. >>>> >>>> int bdrv_read(GMainContext *ctx, >>>> BlockDriverState *bs, >>>> int64_t sector_num, >>>> uint8_t *buf, >>>> int nb_sectors) >>>> >>>>>> Paolo >>>>> >>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Wenchao Xia >> > > -- Best Regards Wenchao Xia