From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45251)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <xiawenc@linux.vnet.ibm.com>) id 1VAF9D-0005EI-Mk
	for qemu-devel@nongnu.org; Fri, 16 Aug 2013 04:12:48 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <xiawenc@linux.vnet.ibm.com>) id 1VAF94-0003ae-Ml
	for qemu-devel@nongnu.org; Fri, 16 Aug 2013 04:12:39 -0400
Received: from e23smtp08.au.ibm.com ([202.81.31.141]:43132)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <xiawenc@linux.vnet.ibm.com>) id 1VAF93-0003aK-Sm
	for qemu-devel@nongnu.org; Fri, 16 Aug 2013 04:12:30 -0400
Received: from /spool/local
	by e23smtp08.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <xiawenc@linux.vnet.ibm.com>;
	Fri, 16 Aug 2013 18:09:16 +1000
Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [9.190.234.120])
	by d23dlp01.au.ibm.com (Postfix) with ESMTP id A7D322CE8052
	for <qemu-devel@nongnu.org>; Fri, 16 Aug 2013 18:12:24 +1000 (EST)
Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97])
	by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r7G7uWmX49545226
	for <qemu-devel@nongnu.org>; Fri, 16 Aug 2013 17:56:33 +1000
Received: from d23av03.au.ibm.com (localhost [127.0.0.1])
	by d23av03.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id
	r7G8CNJ3002730
	for <qemu-devel@nongnu.org>; Fri, 16 Aug 2013 18:12:23 +1000
Message-ID: <520DDED6.2060401@linux.vnet.ibm.com>
Date: Fri, 16 Aug 2013 16:12:06 +0800
From: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <5205B251.2060907@linux.vnet.ibm.com>
	<5205F3CA.1060009@redhat.com>	<520884C0.60208@linux.vnet.ibm.com>
	<788101843.267798.1376292628614.JavaMail.root@redhat.com>
	<20130812170118.22636.27943@loki>	<5209F1F7.6040202@linux.vnet.ibm.com>
	<20130815152320.16563.51609@loki> <20130815163207.16563.45553@loki>
	<520DD198.60708@linux.vnet.ibm.com>
In-Reply-To: <520DD198.60708@linux.vnet.ibm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Anthony Liguori <aliguori@us.ibm.com>, Ping Fan Liu <pingfank@linux.vnet.ibm.com>, Stefan Hajnoczi <stefanha@gmail.com>, qemu-devel <qemu-devel@nongnu.org>, Paolo Bonzini <pbonzini@redhat.com>

于 2013-8-16 15:15, Wenchao Xia 写道:
> 于 2013-8-16 0:32, Michael Roth 写道:
>> Quoting Michael Roth (2013-08-15 10:23:20)
>>> Quoting Wenchao Xia (2013-08-13 03:44:39)
>>>> 于 2013-8-13 1:01, Michael Roth 写道:
>>>>> Quoting Paolo Bonzini (2013-08-12 02:30:28)
>>>>>>> 1) rename AioContext to AioSource.
>>>>>>>     This is my major purpose, which declare it is not a "context"
>>>>>>> concept,
>>>>>>> and GMainContext is the entity represent the thread's activity.
>>>>>>
>>>>>> Note that the nested event loops in QEMU are _very_ different from
>>>>>> glib nested event loops.  In QEMU, nested event loops only run block
>>>>>> layer events.  In glib, they run all events.  That's why you need
>>>>>> AioContext.
>>>>>
>>>>> Would it be possible to use glib for our nested loops as well by just
>>>>> setting a higher priority for the AioContext GSource?
>>>>>
>>>>> Stefan and I were considering how we could make use of his "drop
>>>>> ioflush" patches to use a common mechanism to register fd events, but
>>>>> still allow us to distinguish between AioContext and non-AioContext
>>>>> for nested loops. I was originally thinking of using prepare()
>>>>> functions
>>>>> to filter out non-AioContext events, but that requires we implement
>>>>> on GSource's with that in mind, and non make use of pre-baked ones
>>>>> like GIOChannel's, and bakes block stuff into every event source
>>>>> implementation.
>>>>>
>>>>     Besides priority, also g_source_set_can_recurse() can help.
>>>>     With a deeper think, I found a harder problem:
>>>> g_main_context_acquire() and g_main_context_release(). In release,
>>>> pending BH/IO call back need to be cleared, but this action can't
>>>> be triggered automatically when user call g_main_context_release().
>>>
>>> I don't understand why this is a requirement, gmctx_acquire/release
>>> ensure
>>> that only one thread attempts to iterate the main loop at a time. this
>>> isn't currently an issue in qemu, and if we re-implemented
>>> qemu_aio_wait()
>>> to use the same glib interfaces, the tracking of in-flight requests
>>> would
>>> be moved to the block layer via Stefan's 'drop io_flush' patches, which
>>> moves that block-specific logic out of the main loop/AioContext GSource
>>> by design. Are there other areas where you see this as a problem?
>>
>> I think I understand better what you're referring to, you mean that
>> if qemu_aio_wait was called, and was implementated to essentially call
>> g_main_context_iterate(), that after, say, 1 iteration, the
>> iothread/dataplane thread might acquire the main loop and dispatch
>> block/non-block events between qemu_aio_wait() returned? The simple
>> approach would be to have qemu_aio_wait() call
>> g_main_context_acquire/release
>> at the start end of the function, which would ensure that this never
>> happens.
>>
>    qemu_aio_wait() is relative simple to resolve by
> g_main_context_acquire(), but also shows additional code needed
> for a thread switch:
> (guess following is the plan to implement qemu_aio_wait())
> qemu_aio_wait(GMainContext *ctx)
> {
>      return g_main_context(ctx, PRI_BLOCK);
> }
> at caller:
> {
>      while (qemu_aio_wait(ctx) {
>          ;
>      }
>      g_main_context_release(ctx);
> }
>    The above code shows that, in *ctx switch operation, there is
> more than glib's own event loop API envolved, qemu_aio_wait(). So
> it referenced to a question: what data structure
> should be used to represent context concept and control the thread
> switching behavior?  It is better to have a clear layer, GMainContext or
> GlibQContext, instead of GMainContext plus custom function. The caller
> reference to at least two: nested user block layer, flat user above
> block layer.
>    In my opinion, this problem is brought by Gsource AioContext, Take
> the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as
> an example, there are aync following operations involved for AioContext:
> 1 the bdrv_cb() will be executed in bdrv_co_em_bh().
> 2 bdrv_co_io_em_complete() will be executed in event_notfier_ready().
> 3 aio_worker() will be executed in worker_thread().
>    Operation 2 and 3 are tracked by block layer's queue after Stefan's
> dropping io_flush() series.
>    Now if we stick to GMainContext to represent context concept,
> then when thread B want to aquire GMainContext used by thread A,
> all works setupped by A should be finished before B aquire it,
> otherwise B will execute some function supposed to work in A. The
> work refers to the three kind of aync functions above.
>    For this issue, we can solve it by different means:
> 1 the event loop API doesn't guarentee work setupped by thread A
> will always be finished in A. This put a limitation to caller to
> consider thread switching. I talked on IRC with Stefan, and thinks
> it is possible for the nested user block layer, but I still want to
> avoid this to the flat user above block layer.
> 2 ask caller to finish all pending operations.
>    2.1 glib GMainContext API plus custom API such as aio_wait(). This is
> bad that detail under GMainContext is exposed to caller. Since
> operation 1 mentioned before is not tracked yet, to make sure bdrv_cb()
> is called in thread setupped it, 1 need to be added in the track
> queue, or in the call chain of aio_wait(), check the queue plus check
> operation 1. Perhaps add a custom function ask caller to call as
> context_work_flush()?
   If a well named API do the flush work present, using Glib API plus
it seems also OK, and can avoid wrapper. I guess
bdrv_drain_all(GMainContext *ctx, ...) can do it.


>    2.2 Use GlibQContext API. In this way all operation is hidden,
> in whose release() the pending work will be finished. In this way
> a clear layer represent event loop can be exposed.
>
>    A clear layer represent event loop will make caller code clean,
> especially helpful when try to expose block layer with AIO as a
> public library.
>    Personally I hope not to use AioContext as context concept, add
> iterate APIs around it will be tricky, since actually it is a GSource,
> it will bring troubles to integrate with glib's event loop from my
> intuition. So I hope to rename AioContext to AioSource, or break it
> as a common case in QContext. Then I got bdrv_read(GlibQContext ctx*).
>
>    With a talk with Stefan, I plan to read gstream doc to see how it
> deal with such problem.
>
>>>
>>>>     For the above reason, I tend to think, maybe we should t wrap
>>>> all of
>>>> Glib's mainloop into custom encapsulation, such as QContext, Add the
>>>> aio poll logic in q_context_release(). Use QContext * in every caller
>>>> to hide GMainContext *, so QContext layer play the role of clear
>>>> event loop API.
>>>>
>>>>> Priorities didn't cross my mind though, but it seems pretty
>>>>> straightfoward...
>>>>>
>>>>> AioContext could then just be a container of sorts for managing
>>>>> bottom-halves and AioContext FDs and binding them to the proper
>>>>> GMainContext/MainLoop, but the underlying GSources could
>>>>> still be driven by a normal glib-based mainloop, just with a specific
>>>>> priority in the nested case.
>>>>>
>>>>>>
>>>>>>> 2) Break AioSource into FdSource and BhSource.
>>>>>>>     This make custom code less and simpler, one Gsource for one
>>>>>>> kind of
>>>>>>> job. It is not necessary but IMHO it will make things clear when add
>>>>>>> more things into main loop: add a new Gsource sub class, avoid to
>>>>>>> always have relationship with AioContext.
>>>>>>
>>>>>> But this is only complicating things work since users rely on both
>>>>>> file-
>>>>>> descriptor APIs and bottom half APIs.
>>>>>
>>>>> Taking things a step further, maybe AioContext can stop being a
>>>>> block-specific construct, but actually be the "QContext" we've
>>>>> discussed in the past for managing multiple event loops. All
>>>>> the block stuff would be hidden away in the GSource priority.
>>>>>
>>>>> For instance,
>>>>>
>>>>> #ifndef _WIN32
>>>>>
>>>>> qemu_aio_set_fd_handler(fd, ...):
>>>>>       aio_set_fd_handler(qemu_aio_context, fd, ...,
>>>>> QEMU_PRIORITY_BLOCK)
>>>>>
>>>>> qemu_set_fd_handler(fd, ...):
>>>>>       aio_set_fd_handler(qemu_aio_context, fd, ...,
>>>>> G_PRIORITY_DEFAULT)
>>>>>
>>>>> #else
>>>>>
>>>>> qemu_add_wait_object(fd, ...):
>>>>>       add_wait_object(qemu_aio_context, fd, ...)
>>>>>
>>>>> qemu_set_fd_handler(fd, ...):
>>>>>       set_socket_handler(qemu_aio_context, fd, ...,
>>>>> G_PRIORITY_DEFAULT)
>>>>>
>>>>> #endif
>>>>>
>>>>> qemu_bh_schedule:
>>>>>       bh_schedule(qemu_aio_context, ...)
>>>>>
>>>>> etc...
>>>>>
>>>>> I'll be sending patches this week for moving
>>>>> add_wait_object/qemu_set_fd_handler to GSources, the non-global
>>>>> ones use
>>>>> GMainContext * to specify a non-default thread/context, but can be
>>>>> easily
>>>>> changed, or we can just do aioctx->g_main_context at the call sites.
>>>>> There's some nice possibilities in using the former though: avoiding
>>>>> O(n) lookups for stuff like finding the GSource for a particular
>>>>> event/event type, for instance, by storing pointers to the GSource or
>>>>> some kind of hashmap lookup. But probably better to discuss that
>>>>> aspect
>>>>> with some context so I'll try to get those patches out soon.
>>>>>
>>>>>>
>>>>>>>>>      More reasons:
>>>>>>>>>      When I thinking how to bind library code to a thread
>>>>>>>>> context, it may
>>>>>>>>> need to add Context's concept into API of block.c. If I use
>>>>>>>>> AioContext,
>>>>>>>>> there will need a wrapper API to run the event loop. But If I got
>>>>>>>>> glib's GmainContext, things become simple.
>>>>>>
>>>>>> You already have it because AioContext is a GSource.  You do not need
>>>>>> to expose the AioContext, except as a GSource.
>>>>>>
>>>>     I think expose GmainContext * or QContext *, is better than
>>>> GSource *.
>>>>
>>>> int bdrv_read(GMainContext *ctx,
>>>>                 BlockDriverState *bs,
>>>>                 int64_t sector_num,
>>>>                 uint8_t *buf,
>>>>                 int nb_sectors)
>>>>
>>>>>> Paolo
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Wenchao Xia
>>
>
>


-- 
Best Regards

Wenchao Xia