From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:57864)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <xiawenc@linux.vnet.ibm.com>) id 1VC4ox-0008G1-2n
	for qemu-devel@nongnu.org; Wed, 21 Aug 2013 05:35:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <xiawenc@linux.vnet.ibm.com>) id 1VC4on-0002ul-Tm
	for qemu-devel@nongnu.org; Wed, 21 Aug 2013 05:35:18 -0400
Received: from e23smtp07.au.ibm.com ([202.81.31.140]:53794)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <xiawenc@linux.vnet.ibm.com>) id 1VC4on-0002sr-29
	for qemu-devel@nongnu.org; Wed, 21 Aug 2013 05:35:09 -0400
Received: from /spool/local
	by e23smtp07.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <xiawenc@linux.vnet.ibm.com>;
	Wed, 21 Aug 2013 19:21:36 +1000
Received: from d23relay03.au.ibm.com (d23relay03.au.ibm.com [9.190.235.21])
	by d23dlp03.au.ibm.com (Postfix) with ESMTP id 0E8E83578052
	for <qemu-devel@nongnu.org>; Wed, 21 Aug 2013 19:35:02 +1000 (EST)
Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96])
	by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	r7L9YkOR3735880
	for <qemu-devel@nongnu.org>; Wed, 21 Aug 2013 19:34:51 +1000
Received: from d23av01.au.ibm.com (localhost [127.0.0.1])
	by d23av01.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id
	r7L9YuVv030638
	for <qemu-devel@nongnu.org>; Wed, 21 Aug 2013 19:34:56 +1000
Message-ID: <52148975.1060102@linux.vnet.ibm.com>
Date: Wed, 21 Aug 2013 17:33:41 +0800
From: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <5205F3CA.1060009@redhat.com> <520884C0.60208@linux.vnet.ibm.com>
	<788101843.267798.1376292628614.JavaMail.root@redhat.com>
	<20130812170118.22636.27943@loki>	<5209F1F7.6040202@linux.vnet.ibm.com>
	<20130815152320.16563.51609@loki> <20130815163207.16563.45553@loki>
	<520DD198.60708@linux.vnet.ibm.com>	<520DDED6.2060401@linux.vnet.ibm.com>
	<52133E1E.7090403@linux.vnet.ibm.com>
	<20130821084553.GB10379@stefanha-thinkpad.redhat.com>
In-Reply-To: <20130821084553.GB10379@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Ping Fan Liu <pingfank@linux.vnet.ibm.com>, Alex Bligh <alex@alex.org.uk>, Michael Roth <mdroth@linux.vnet.ibm.com>, qemu-devel <qemu-devel@nongnu.org>, Anthony Liguori <anthony@codemonkey.ws>, Paolo Bonzini <pbonzini@redhat.com>

于 2013-8-21 16:45, Stefan Hajnoczi 写道:
> On Tue, Aug 20, 2013 at 05:59:58PM +0800, Wenchao Xia wrote:
>> 于 2013-8-16 16:12, Wenchao Xia 写道:
>>> 于 2013-8-16 15:15, Wenchao Xia 写道:
>>>> 于 2013-8-16 0:32, Michael Roth 写道:
>>>>> Quoting Michael Roth (2013-08-15 10:23:20)
>>>>>> Quoting Wenchao Xia (2013-08-13 03:44:39)
>>>>>>> 于 2013-8-13 1:01, Michael Roth 写道:
>>>>>>>> Quoting Paolo Bonzini (2013-08-12 02:30:28)
>>>>>>>>>> 1) rename AioContext to AioSource.
>>>>>>>>>>     This is my major purpose, which declare it is not a "context"
>>>>>>>>>> concept,
>>>>>>>>>> and GMainContext is the entity represent the thread's activity.
>>>>>>>>>
>>>>>>>>> Note that the nested event loops in QEMU are _very_ different from
>>>>>>>>> glib nested event loops.  In QEMU, nested event loops only run block
>>>>>>>>> layer events.  In glib, they run all events.  That's why you need
>>>>>>>>> AioContext.
>>>>>>>>
>>>>>>>> Would it be possible to use glib for our nested loops as well by just
>>>>>>>> setting a higher priority for the AioContext GSource?
>>>>>>>>
>>>>>>>> Stefan and I were considering how we could make use of his "drop
>>>>>>>> ioflush" patches to use a common mechanism to register fd events, but
>>>>>>>> still allow us to distinguish between AioContext and non-AioContext
>>>>>>>> for nested loops. I was originally thinking of using prepare()
>>>>>>>> functions
>>>>>>>> to filter out non-AioContext events, but that requires we implement
>>>>>>>> on GSource's with that in mind, and non make use of pre-baked ones
>>>>>>>> like GIOChannel's, and bakes block stuff into every event source
>>>>>>>> implementation.
>>>>>>>>
>>>>>>>     Besides priority, also g_source_set_can_recurse() can help.
>>>>>>>     With a deeper think, I found a harder problem:
>>>>>>> g_main_context_acquire() and g_main_context_release(). In release,
>>>>>>> pending BH/IO call back need to be cleared, but this action can't
>>>>>>> be triggered automatically when user call g_main_context_release().
>>>>>>
>>>>>> I don't understand why this is a requirement, gmctx_acquire/release
>>>>>> ensure
>>>>>> that only one thread attempts to iterate the main loop at a time. this
>>>>>> isn't currently an issue in qemu, and if we re-implemented
>>>>>> qemu_aio_wait()
>>>>>> to use the same glib interfaces, the tracking of in-flight requests
>>>>>> would
>>>>>> be moved to the block layer via Stefan's 'drop io_flush' patches, which
>>>>>> moves that block-specific logic out of the main loop/AioContext GSource
>>>>>> by design. Are there other areas where you see this as a problem?
>>>>>
>>>>> I think I understand better what you're referring to, you mean that
>>>>> if qemu_aio_wait was called, and was implementated to essentially call
>>>>> g_main_context_iterate(), that after, say, 1 iteration, the
>>>>> iothread/dataplane thread might acquire the main loop and dispatch
>>>>> block/non-block events between qemu_aio_wait() returned? The simple
>>>>> approach would be to have qemu_aio_wait() call
>>>>> g_main_context_acquire/release
>>>>> at the start end of the function, which would ensure that this never
>>>>> happens.
>>>>>
>>>>    qemu_aio_wait() is relative simple to resolve by
>>>> g_main_context_acquire(), but also shows additional code needed
>>>> for a thread switch:
>>>> (guess following is the plan to implement qemu_aio_wait())
>>>> qemu_aio_wait(GMainContext *ctx)
>>>> {
>>>>      return g_main_context(ctx, PRI_BLOCK);
>>>> }
>>>> at caller:
>>>> {
>>>>      while (qemu_aio_wait(ctx) {
>>>>          ;
>>>>      }
>>>>      g_main_context_release(ctx);
>>>> }
>>>>    The above code shows that, in *ctx switch operation, there is
>>>> more than glib's own event loop API envolved, qemu_aio_wait(). So
>>>> it referenced to a question: what data structure
>>>> should be used to represent context concept and control the thread
>>>> switching behavior?  It is better to have a clear layer, GMainContext or
>>>> GlibQContext, instead of GMainContext plus custom function. The caller
>>>> reference to at least two: nested user block layer, flat user above
>>>> block layer.
>>>>    In my opinion, this problem is brought by Gsource AioContext, Take
>>>> the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as
>>>> an example, there are aync following operations involved for AioContext:
>>>> 1 the bdrv_cb() will be executed in bdrv_co_em_bh().
>>>> 2 bdrv_co_io_em_complete() will be executed in event_notfier_ready().
>>>> 3 aio_worker() will be executed in worker_thread().
>>>>    Operation 2 and 3 are tracked by block layer's queue after Stefan's
>>>> dropping io_flush() series.
>>>>    Now if we stick to GMainContext to represent context concept,
>>>> then when thread B want to aquire GMainContext used by thread A,
>>>> all works setupped by A should be finished before B aquire it,
>>>> otherwise B will execute some function supposed to work in A. The
>>>> work refers to the three kind of aync functions above.
>>>>    For this issue, we can solve it by different means:
>>>> 1 the event loop API doesn't guarentee work setupped by thread A
>>>> will always be finished in A. This put a limitation to caller to
>>>> consider thread switching. I talked on IRC with Stefan, and thinks
>>>> it is possible for the nested user block layer, but I still want to
>>>> avoid this to the flat user above block layer.
>>>> 2 ask caller to finish all pending operations.
>>>>    2.1 glib GMainContext API plus custom API such as aio_wait(). This is
>>>> bad that detail under GMainContext is exposed to caller. Since
>>>> operation 1 mentioned before is not tracked yet, to make sure bdrv_cb()
>>>> is called in thread setupped it, 1 need to be added in the track
>>>> queue, or in the call chain of aio_wait(), check the queue plus check
>>>> operation 1. Perhaps add a custom function ask caller to call as
>>>> context_work_flush()?
>>>    If a well named API do the flush work present, using Glib API plus
>>> it seems also OK, and can avoid wrapper. I guess
>>> bdrv_drain_all(GMainContext *ctx, ...) can do it.
>>>
>>    I haven't found a good answer in gstream, but want to show
>> some idea from my understanding.
>>
>> Following is a brief picture of the current event loop in qemu,
>> Alex's timer for AioContext is also drawn here:
>>
>>                                                ========================
>>                                                || I/O thread in vl.c ||
>>                                                ========================
>>                                                            |
>>                                                   run loop |
>>                                                            |
>> ====================                                      |
>> ||    other       || qemu_set_fd_handler2()      =====================
>> ||                ||-----------------------------||   Main_loop     ||
>> ||(vnc, migration)||                   |         =====================
>> ====================                   |          GLib    |
>>                                         |    event loop API|
>>                    qemu_set_fd_handler()|                  |
>>                         -----------------          ====================
>>                         |                          ||  GMainContext  ||
>>                         |                          ====================
>> ==========             |   (should it be removed?)        |
>> ||  hw  ||--------------------------------------          |GSouce
>> ==========                      |               |         |Attach
>>                                  |  main_loop_tlg|         |
>>                     qemu_bh_***()|               |         |
>>                                  |               |         |
>>                            ======|===============|=======================
>>                            ||    |               |                     ||
>> ===========               ||  ======   ==================   =======   ||
>> || block ||---------------||  | BH |   | TimerListGroup |   | Fd  |   ||
>> ===========  qemu_bh_***()||  ======   ==================   =======   ||
>>             qemu_aio_wait()||                                          ||
>>   qemu_aio_set_fd_handler()||  AioContext                              ||
>>                            || (block layer's event collection)         ||
>>                             =============================================
>>
>>
>>    The main issue here is that components are tightly bind together and
>> no clear layer represent the thread and event loop API. Block and hw
>> code are inter acting with AioContext, so both GMainContext and
>> AioContext are playing the role. I hope to form a library for block,
>> So need to pick up one to provide event loop, the choice seems to be:
>> 1 GMainContext.
>> 2 AioContext.
>> 3 Encapsulation, such as GlibQContext.
>>
>>    1) and 2) would not be perfect since non standard glib event loop will
>> be exposed, 3) will shows a unified interface similar to glib main loop,
>> but more code adjust. After some thinking, I guess AioContext is the
>> easiest way, which represent the block's own event loop, and give up
>> using glib event loop at this level, just add custom API as
>> block_iterate(). Briefly, bdrv_read will becomes:
>> int bdrv_read(AioContext *ctx, ....);
>
> I don't understand why you want to add AioContext *ctx to bdrv_read().
> The synchronous APIs already work fine since no event loop integration
> is necessary at all (the event loop is hidden inside the synchronous
> function).
>
   OK... I used a wrong example. It should be bdrv_aio_readv(). At this
level, do you think AioContext * should be used, instead of
GMainContext *?

> Since AioContext provides a GSource, integrating with an application's
> glib event loop should also be easy.  The only hard part is timers,
> since we use nanosecond timers - there we should just round up to
> millisecond granularity to libqblock.  The nanosecond timers aren't
> critical in libqblock, only for running guests.
>
> Stefan
>


-- 
Best Regards

Wenchao Xia