From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59573) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VAEGg-0008UW-Qr for qemu-devel@nongnu.org; Fri, 16 Aug 2013 03:16:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VAEGX-0008Mi-Pk for qemu-devel@nongnu.org; Fri, 16 Aug 2013 03:16:18 -0400 Received: from e28smtp07.in.ibm.com ([122.248.162.7]:37870) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VAEGW-0008LM-QL for qemu-devel@nongnu.org; Fri, 16 Aug 2013 03:16:09 -0400 Received: from /spool/local by e28smtp07.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 16 Aug 2013 12:37:14 +0530 Received: from d28relay01.in.ibm.com (d28relay01.in.ibm.com [9.184.220.58]) by d28dlp03.in.ibm.com (Postfix) with ESMTP id C728C1258054 for ; Fri, 16 Aug 2013 12:45:43 +0530 (IST) Received: from d28av05.in.ibm.com (d28av05.in.ibm.com [9.184.220.67]) by d28relay01.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r7G7HODf35127428 for ; Fri, 16 Aug 2013 12:47:24 +0530 Received: from d28av05.in.ibm.com (localhost [127.0.0.1]) by d28av05.in.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id r7G7G0Fw021826 for ; Fri, 16 Aug 2013 12:46:00 +0530 Message-ID: <520DD198.60708@linux.vnet.ibm.com> Date: Fri, 16 Aug 2013 15:15:36 +0800 From: Wenchao Xia MIME-Version: 1.0 References: <5205B251.2060907@linux.vnet.ibm.com> <5205F3CA.1060009@redhat.com> <520884C0.60208@linux.vnet.ibm.com> <788101843.267798.1376292628614.JavaMail.root@redhat.com> <20130812170118.22636.27943@loki> <5209F1F7.6040202@linux.vnet.ibm.com> <20130815152320.16563.51609@loki> <20130815163207.16563.45553@loki> In-Reply-To: <20130815163207.16563.45553@loki> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth Cc: Kevin Wolf , Anthony Liguori , Ping Fan Liu , Stefan Hajnoczi , qemu-devel , Paolo Bonzini 于 2013-8-16 0:32, Michael Roth 写道: > Quoting Michael Roth (2013-08-15 10:23:20) >> Quoting Wenchao Xia (2013-08-13 03:44:39) >>> 于 2013-8-13 1:01, Michael Roth 写道: >>>> Quoting Paolo Bonzini (2013-08-12 02:30:28) >>>>>> 1) rename AioContext to AioSource. >>>>>> This is my major purpose, which declare it is not a "context" concept, >>>>>> and GMainContext is the entity represent the thread's activity. >>>>> >>>>> Note that the nested event loops in QEMU are _very_ different from >>>>> glib nested event loops. In QEMU, nested event loops only run block >>>>> layer events. In glib, they run all events. That's why you need >>>>> AioContext. >>>> >>>> Would it be possible to use glib for our nested loops as well by just >>>> setting a higher priority for the AioContext GSource? >>>> >>>> Stefan and I were considering how we could make use of his "drop >>>> ioflush" patches to use a common mechanism to register fd events, but >>>> still allow us to distinguish between AioContext and non-AioContext >>>> for nested loops. I was originally thinking of using prepare() functions >>>> to filter out non-AioContext events, but that requires we implement >>>> on GSource's with that in mind, and non make use of pre-baked ones >>>> like GIOChannel's, and bakes block stuff into every event source >>>> implementation. >>>> >>> Besides priority, also g_source_set_can_recurse() can help. >>> With a deeper think, I found a harder problem: >>> g_main_context_acquire() and g_main_context_release(). In release, >>> pending BH/IO call back need to be cleared, but this action can't >>> be triggered automatically when user call g_main_context_release(). >> >> I don't understand why this is a requirement, gmctx_acquire/release ensure >> that only one thread attempts to iterate the main loop at a time. this >> isn't currently an issue in qemu, and if we re-implemented qemu_aio_wait() >> to use the same glib interfaces, the tracking of in-flight requests would >> be moved to the block layer via Stefan's 'drop io_flush' patches, which >> moves that block-specific logic out of the main loop/AioContext GSource >> by design. Are there other areas where you see this as a problem? > > I think I understand better what you're referring to, you mean that > if qemu_aio_wait was called, and was implementated to essentially call > g_main_context_iterate(), that after, say, 1 iteration, the > iothread/dataplane thread might acquire the main loop and dispatch > block/non-block events between qemu_aio_wait() returned? The simple > approach would be to have qemu_aio_wait() call g_main_context_acquire/release > at the start end of the function, which would ensure that this never > happens. > qemu_aio_wait() is relative simple to resolve by g_main_context_acquire(), but also shows additional code needed for a thread switch: (guess following is the plan to implement qemu_aio_wait()) qemu_aio_wait(GMainContext *ctx) { return g_main_context(ctx, PRI_BLOCK); } at caller: { while (qemu_aio_wait(ctx) { ; } g_main_context_release(ctx); } The above code shows that, in *ctx switch operation, there is more than glib's own event loop API envolved, qemu_aio_wait(). So it referenced to a question: what data structure should be used to represent context concept and control the thread switching behavior? It is better to have a clear layer, GMainContext or GlibQContext, instead of GMainContext plus custom function. The caller reference to at least two: nested user block layer, flat user above block layer. In my opinion, this problem is brought by Gsource AioContext, Take the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as an example, there are aync following operations involved for AioContext: 1 the bdrv_cb() will be executed in bdrv_co_em_bh(). 2 bdrv_co_io_em_complete() will be executed in event_notfier_ready(). 3 aio_worker() will be executed in worker_thread(). Operation 2 and 3 are tracked by block layer's queue after Stefan's dropping io_flush() series. Now if we stick to GMainContext to represent context concept, then when thread B want to aquire GMainContext used by thread A, all works setupped by A should be finished before B aquire it, otherwise B will execute some function supposed to work in A. The work refers to the three kind of aync functions above. For this issue, we can solve it by different means: 1 the event loop API doesn't guarentee work setupped by thread A will always be finished in A. This put a limitation to caller to consider thread switching. I talked on IRC with Stefan, and thinks it is possible for the nested user block layer, but I still want to avoid this to the flat user above block layer. 2 ask caller to finish all pending operations. 2.1 glib GMainContext API plus custom API such as aio_wait(). This is bad that detail under GMainContext is exposed to caller. Since operation 1 mentioned before is not tracked yet, to make sure bdrv_cb() is called in thread setupped it, 1 need to be added in the track queue, or in the call chain of aio_wait(), check the queue plus check operation 1. Perhaps add a custom function ask caller to call as context_work_flush()? 2.2 Use GlibQContext API. In this way all operation is hidden, in whose release() the pending work will be finished. In this way a clear layer represent event loop can be exposed. A clear layer represent event loop will make caller code clean, especially helpful when try to expose block layer with AIO as a public library. Personally I hope not to use AioContext as context concept, add iterate APIs around it will be tricky, since actually it is a GSource, it will bring troubles to integrate with glib's event loop from my intuition. So I hope to rename AioContext to AioSource, or break it as a common case in QContext. Then I got bdrv_read(GlibQContext ctx*). With a talk with Stefan, I plan to read gstream doc to see how it deal with such problem. >> >>> For the above reason, I tend to think, maybe we should t wrap all of >>> Glib's mainloop into custom encapsulation, such as QContext, Add the >>> aio poll logic in q_context_release(). Use QContext * in every caller >>> to hide GMainContext *, so QContext layer play the role of clear >>> event loop API. >>> >>>> Priorities didn't cross my mind though, but it seems pretty >>>> straightfoward... >>>> >>>> AioContext could then just be a container of sorts for managing >>>> bottom-halves and AioContext FDs and binding them to the proper >>>> GMainContext/MainLoop, but the underlying GSources could >>>> still be driven by a normal glib-based mainloop, just with a specific >>>> priority in the nested case. >>>> >>>>> >>>>>> 2) Break AioSource into FdSource and BhSource. >>>>>> This make custom code less and simpler, one Gsource for one kind of >>>>>> job. It is not necessary but IMHO it will make things clear when add >>>>>> more things into main loop: add a new Gsource sub class, avoid to >>>>>> always have relationship with AioContext. >>>>> >>>>> But this is only complicating things work since users rely on both file- >>>>> descriptor APIs and bottom half APIs. >>>> >>>> Taking things a step further, maybe AioContext can stop being a >>>> block-specific construct, but actually be the "QContext" we've >>>> discussed in the past for managing multiple event loops. All >>>> the block stuff would be hidden away in the GSource priority. >>>> >>>> For instance, >>>> >>>> #ifndef _WIN32 >>>> >>>> qemu_aio_set_fd_handler(fd, ...): >>>> aio_set_fd_handler(qemu_aio_context, fd, ..., QEMU_PRIORITY_BLOCK) >>>> >>>> qemu_set_fd_handler(fd, ...): >>>> aio_set_fd_handler(qemu_aio_context, fd, ..., G_PRIORITY_DEFAULT) >>>> >>>> #else >>>> >>>> qemu_add_wait_object(fd, ...): >>>> add_wait_object(qemu_aio_context, fd, ...) >>>> >>>> qemu_set_fd_handler(fd, ...): >>>> set_socket_handler(qemu_aio_context, fd, ..., G_PRIORITY_DEFAULT) >>>> >>>> #endif >>>> >>>> qemu_bh_schedule: >>>> bh_schedule(qemu_aio_context, ...) >>>> >>>> etc... >>>> >>>> I'll be sending patches this week for moving >>>> add_wait_object/qemu_set_fd_handler to GSources, the non-global ones use >>>> GMainContext * to specify a non-default thread/context, but can be easily >>>> changed, or we can just do aioctx->g_main_context at the call sites. >>>> There's some nice possibilities in using the former though: avoiding >>>> O(n) lookups for stuff like finding the GSource for a particular >>>> event/event type, for instance, by storing pointers to the GSource or >>>> some kind of hashmap lookup. But probably better to discuss that aspect >>>> with some context so I'll try to get those patches out soon. >>>> >>>>> >>>>>>>> More reasons: >>>>>>>> When I thinking how to bind library code to a thread context, it may >>>>>>>> need to add Context's concept into API of block.c. If I use AioContext, >>>>>>>> there will need a wrapper API to run the event loop. But If I got >>>>>>>> glib's GmainContext, things become simple. >>>>> >>>>> You already have it because AioContext is a GSource. You do not need >>>>> to expose the AioContext, except as a GSource. >>>>> >>> I think expose GmainContext * or QContext *, is better than >>> GSource *. >>> >>> int bdrv_read(GMainContext *ctx, >>> BlockDriverState *bs, >>> int64_t sector_num, >>> uint8_t *buf, >>> int nb_sectors) >>> >>>>> Paolo >>>> >>> >>> >>> -- >>> Best Regards >>> >>> Wenchao Xia > -- Best Regards Wenchao Xia