All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Ping Fan Liu <pingfank@linux.vnet.ibm.com>,
	Alex Bligh <alex@alex.org.uk>,
	Michael Roth <mdroth@linux.vnet.ibm.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Anthony Liguori <anthony@codemonkey.ws>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes
Date: Wed, 21 Aug 2013 17:33:41 +0800	[thread overview]
Message-ID: <52148975.1060102@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130821084553.GB10379@stefanha-thinkpad.redhat.com>

于 2013-8-21 16:45, Stefan Hajnoczi 写道:
> On Tue, Aug 20, 2013 at 05:59:58PM +0800, Wenchao Xia wrote:
>> 于 2013-8-16 16:12, Wenchao Xia 写道:
>>> 于 2013-8-16 15:15, Wenchao Xia 写道:
>>>> 于 2013-8-16 0:32, Michael Roth 写道:
>>>>> Quoting Michael Roth (2013-08-15 10:23:20)
>>>>>> Quoting Wenchao Xia (2013-08-13 03:44:39)
>>>>>>> 于 2013-8-13 1:01, Michael Roth 写道:
>>>>>>>> Quoting Paolo Bonzini (2013-08-12 02:30:28)
>>>>>>>>>> 1) rename AioContext to AioSource.
>>>>>>>>>>     This is my major purpose, which declare it is not a "context"
>>>>>>>>>> concept,
>>>>>>>>>> and GMainContext is the entity represent the thread's activity.
>>>>>>>>>
>>>>>>>>> Note that the nested event loops in QEMU are _very_ different from
>>>>>>>>> glib nested event loops.  In QEMU, nested event loops only run block
>>>>>>>>> layer events.  In glib, they run all events.  That's why you need
>>>>>>>>> AioContext.
>>>>>>>>
>>>>>>>> Would it be possible to use glib for our nested loops as well by just
>>>>>>>> setting a higher priority for the AioContext GSource?
>>>>>>>>
>>>>>>>> Stefan and I were considering how we could make use of his "drop
>>>>>>>> ioflush" patches to use a common mechanism to register fd events, but
>>>>>>>> still allow us to distinguish between AioContext and non-AioContext
>>>>>>>> for nested loops. I was originally thinking of using prepare()
>>>>>>>> functions
>>>>>>>> to filter out non-AioContext events, but that requires we implement
>>>>>>>> on GSource's with that in mind, and non make use of pre-baked ones
>>>>>>>> like GIOChannel's, and bakes block stuff into every event source
>>>>>>>> implementation.
>>>>>>>>
>>>>>>>     Besides priority, also g_source_set_can_recurse() can help.
>>>>>>>     With a deeper think, I found a harder problem:
>>>>>>> g_main_context_acquire() and g_main_context_release(). In release,
>>>>>>> pending BH/IO call back need to be cleared, but this action can't
>>>>>>> be triggered automatically when user call g_main_context_release().
>>>>>>
>>>>>> I don't understand why this is a requirement, gmctx_acquire/release
>>>>>> ensure
>>>>>> that only one thread attempts to iterate the main loop at a time. this
>>>>>> isn't currently an issue in qemu, and if we re-implemented
>>>>>> qemu_aio_wait()
>>>>>> to use the same glib interfaces, the tracking of in-flight requests
>>>>>> would
>>>>>> be moved to the block layer via Stefan's 'drop io_flush' patches, which
>>>>>> moves that block-specific logic out of the main loop/AioContext GSource
>>>>>> by design. Are there other areas where you see this as a problem?
>>>>>
>>>>> I think I understand better what you're referring to, you mean that
>>>>> if qemu_aio_wait was called, and was implementated to essentially call
>>>>> g_main_context_iterate(), that after, say, 1 iteration, the
>>>>> iothread/dataplane thread might acquire the main loop and dispatch
>>>>> block/non-block events between qemu_aio_wait() returned? The simple
>>>>> approach would be to have qemu_aio_wait() call
>>>>> g_main_context_acquire/release
>>>>> at the start end of the function, which would ensure that this never
>>>>> happens.
>>>>>
>>>>    qemu_aio_wait() is relative simple to resolve by
>>>> g_main_context_acquire(), but also shows additional code needed
>>>> for a thread switch:
>>>> (guess following is the plan to implement qemu_aio_wait())
>>>> qemu_aio_wait(GMainContext *ctx)
>>>> {
>>>>      return g_main_context(ctx, PRI_BLOCK);
>>>> }
>>>> at caller:
>>>> {
>>>>      while (qemu_aio_wait(ctx) {
>>>>          ;
>>>>      }
>>>>      g_main_context_release(ctx);
>>>> }
>>>>    The above code shows that, in *ctx switch operation, there is
>>>> more than glib's own event loop API envolved, qemu_aio_wait(). So
>>>> it referenced to a question: what data structure
>>>> should be used to represent context concept and control the thread
>>>> switching behavior?  It is better to have a clear layer, GMainContext or
>>>> GlibQContext, instead of GMainContext plus custom function. The caller
>>>> reference to at least two: nested user block layer, flat user above
>>>> block layer.
>>>>    In my opinion, this problem is brought by Gsource AioContext, Take
>>>> the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as
>>>> an example, there are aync following operations involved for AioContext:
>>>> 1 the bdrv_cb() will be executed in bdrv_co_em_bh().
>>>> 2 bdrv_co_io_em_complete() will be executed in event_notfier_ready().
>>>> 3 aio_worker() will be executed in worker_thread().
>>>>    Operation 2 and 3 are tracked by block layer's queue after Stefan's
>>>> dropping io_flush() series.
>>>>    Now if we stick to GMainContext to represent context concept,
>>>> then when thread B want to aquire GMainContext used by thread A,
>>>> all works setupped by A should be finished before B aquire it,
>>>> otherwise B will execute some function supposed to work in A. The
>>>> work refers to the three kind of aync functions above.
>>>>    For this issue, we can solve it by different means:
>>>> 1 the event loop API doesn't guarentee work setupped by thread A
>>>> will always be finished in A. This put a limitation to caller to
>>>> consider thread switching. I talked on IRC with Stefan, and thinks
>>>> it is possible for the nested user block layer, but I still want to
>>>> avoid this to the flat user above block layer.
>>>> 2 ask caller to finish all pending operations.
>>>>    2.1 glib GMainContext API plus custom API such as aio_wait(). This is
>>>> bad that detail under GMainContext is exposed to caller. Since
>>>> operation 1 mentioned before is not tracked yet, to make sure bdrv_cb()
>>>> is called in thread setupped it, 1 need to be added in the track
>>>> queue, or in the call chain of aio_wait(), check the queue plus check
>>>> operation 1. Perhaps add a custom function ask caller to call as
>>>> context_work_flush()?
>>>    If a well named API do the flush work present, using Glib API plus
>>> it seems also OK, and can avoid wrapper. I guess
>>> bdrv_drain_all(GMainContext *ctx, ...) can do it.
>>>
>>    I haven't found a good answer in gstream, but want to show
>> some idea from my understanding.
>>
>> Following is a brief picture of the current event loop in qemu,
>> Alex's timer for AioContext is also drawn here:
>>
>>                                                ========================
>>                                                || I/O thread in vl.c ||
>>                                                ========================
>>                                                            |
>>                                                   run loop |
>>                                                            |
>> ====================                                      |
>> ||    other       || qemu_set_fd_handler2()      =====================
>> ||                ||-----------------------------||   Main_loop     ||
>> ||(vnc, migration)||                   |         =====================
>> ====================                   |          GLib    |
>>                                         |    event loop API|
>>                    qemu_set_fd_handler()|                  |
>>                         -----------------          ====================
>>                         |                          ||  GMainContext  ||
>>                         |                          ====================
>> ==========             |   (should it be removed?)        |
>> ||  hw  ||--------------------------------------          |GSouce
>> ==========                      |               |         |Attach
>>                                  |  main_loop_tlg|         |
>>                     qemu_bh_***()|               |         |
>>                                  |               |         |
>>                            ======|===============|=======================
>>                            ||    |               |                     ||
>> ===========               ||  ======   ==================   =======   ||
>> || block ||---------------||  | BH |   | TimerListGroup |   | Fd  |   ||
>> ===========  qemu_bh_***()||  ======   ==================   =======   ||
>>             qemu_aio_wait()||                                          ||
>>   qemu_aio_set_fd_handler()||  AioContext                              ||
>>                            || (block layer's event collection)         ||
>>                             =============================================
>>
>>
>>    The main issue here is that components are tightly bind together and
>> no clear layer represent the thread and event loop API. Block and hw
>> code are inter acting with AioContext, so both GMainContext and
>> AioContext are playing the role. I hope to form a library for block,
>> So need to pick up one to provide event loop, the choice seems to be:
>> 1 GMainContext.
>> 2 AioContext.
>> 3 Encapsulation, such as GlibQContext.
>>
>>    1) and 2) would not be perfect since non standard glib event loop will
>> be exposed, 3) will shows a unified interface similar to glib main loop,
>> but more code adjust. After some thinking, I guess AioContext is the
>> easiest way, which represent the block's own event loop, and give up
>> using glib event loop at this level, just add custom API as
>> block_iterate(). Briefly, bdrv_read will becomes:
>> int bdrv_read(AioContext *ctx, ....);
>
> I don't understand why you want to add AioContext *ctx to bdrv_read().
> The synchronous APIs already work fine since no event loop integration
> is necessary at all (the event loop is hidden inside the synchronous
> function).
>
   OK... I used a wrong example. It should be bdrv_aio_readv(). At this
level, do you think AioContext * should be used, instead of
GMainContext *?

> Since AioContext provides a GSource, integrating with an application's
> glib event loop should also be easy.  The only hard part is timers,
> since we use nanosecond timers - there we should just round up to
> millisecond granularity to libqblock.  The nanosecond timers aren't
> critical in libqblock, only for running guests.
>
> Stefan
>


-- 
Best Regards

Wenchao Xia

  reply	other threads:[~2013-08-21  9:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-10  3:24 [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes Wenchao Xia
2013-08-10  8:03 ` Paolo Bonzini
2013-08-12  6:46   ` Wenchao Xia
2013-08-12  7:30     ` Paolo Bonzini
2013-08-12 17:01       ` Michael Roth
2013-08-13  8:44         ` Wenchao Xia
2013-08-15 15:23           ` Michael Roth
2013-08-15 16:32             ` Michael Roth
2013-08-16  7:15               ` Wenchao Xia
2013-08-16  8:12                 ` Wenchao Xia
2013-08-20  9:59                   ` Wenchao Xia
2013-08-20 17:54                     ` Alex Bligh
2013-08-21  8:45                     ` Stefan Hajnoczi
2013-08-21  9:33                       ` Wenchao Xia [this message]
2013-08-22 11:40                         ` Stefan Hajnoczi
2013-08-21 10:06                       ` Alex Bligh
2013-08-10 10:15 ` Alex Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52148975.1060102@linux.vnet.ibm.com \
    --to=xiawenc@linux.vnet.ibm.com \
    --cc=alex@alex.org.uk \
    --cc=anthony@codemonkey.ws \
    --cc=kwolf@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=pingfank@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.