qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	Ping Fan Liu <pingfank@linux.vnet.ibm.com>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes
Date: Fri, 16 Aug 2013 15:15:36 +0800	[thread overview]
Message-ID: <520DD198.60708@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130815163207.16563.45553@loki>

于 2013-8-16 0:32, Michael Roth 写道:
> Quoting Michael Roth (2013-08-15 10:23:20)
>> Quoting Wenchao Xia (2013-08-13 03:44:39)
>>> 于 2013-8-13 1:01, Michael Roth 写道:
>>>> Quoting Paolo Bonzini (2013-08-12 02:30:28)
>>>>>> 1) rename AioContext to AioSource.
>>>>>>     This is my major purpose, which declare it is not a "context" concept,
>>>>>> and GMainContext is the entity represent the thread's activity.
>>>>>
>>>>> Note that the nested event loops in QEMU are _very_ different from
>>>>> glib nested event loops.  In QEMU, nested event loops only run block
>>>>> layer events.  In glib, they run all events.  That's why you need
>>>>> AioContext.
>>>>
>>>> Would it be possible to use glib for our nested loops as well by just
>>>> setting a higher priority for the AioContext GSource?
>>>>
>>>> Stefan and I were considering how we could make use of his "drop
>>>> ioflush" patches to use a common mechanism to register fd events, but
>>>> still allow us to distinguish between AioContext and non-AioContext
>>>> for nested loops. I was originally thinking of using prepare() functions
>>>> to filter out non-AioContext events, but that requires we implement
>>>> on GSource's with that in mind, and non make use of pre-baked ones
>>>> like GIOChannel's, and bakes block stuff into every event source
>>>> implementation.
>>>>
>>>     Besides priority, also g_source_set_can_recurse() can help.
>>>     With a deeper think, I found a harder problem:
>>> g_main_context_acquire() and g_main_context_release(). In release,
>>> pending BH/IO call back need to be cleared, but this action can't
>>> be triggered automatically when user call g_main_context_release().
>>
>> I don't understand why this is a requirement, gmctx_acquire/release ensure
>> that only one thread attempts to iterate the main loop at a time. this
>> isn't currently an issue in qemu, and if we re-implemented qemu_aio_wait()
>> to use the same glib interfaces, the tracking of in-flight requests would
>> be moved to the block layer via Stefan's 'drop io_flush' patches, which
>> moves that block-specific logic out of the main loop/AioContext GSource
>> by design. Are there other areas where you see this as a problem?
>
> I think I understand better what you're referring to, you mean that
> if qemu_aio_wait was called, and was implementated to essentially call
> g_main_context_iterate(), that after, say, 1 iteration, the
> iothread/dataplane thread might acquire the main loop and dispatch
> block/non-block events between qemu_aio_wait() returned? The simple
> approach would be to have qemu_aio_wait() call g_main_context_acquire/release
> at the start end of the function, which would ensure that this never
> happens.
>
   qemu_aio_wait() is relative simple to resolve by 
g_main_context_acquire(), but also shows additional code needed
for a thread switch:
(guess following is the plan to implement qemu_aio_wait())
qemu_aio_wait(GMainContext *ctx)
{
     return g_main_context(ctx, PRI_BLOCK);
}
at caller:
{
     while (qemu_aio_wait(ctx) {
         ;
     }
     g_main_context_release(ctx);
}
   The above code shows that, in *ctx switch operation, there is
more than glib's own event loop API envolved, qemu_aio_wait(). So
it referenced to a question: what data structure
should be used to represent context concept and control the thread
switching behavior?  It is better to have a clear layer, GMainContext or
GlibQContext, instead of GMainContext plus custom function. The caller
reference to at least two: nested user block layer, flat user above
block layer.
   In my opinion, this problem is brought by Gsource AioContext, Take
the call path of bdrv_aio_readv(*bdrv_cb) on raw linux file as
an example, there are aync following operations involved for AioContext:
1 the bdrv_cb() will be executed in bdrv_co_em_bh().
2 bdrv_co_io_em_complete() will be executed in event_notfier_ready().
3 aio_worker() will be executed in worker_thread().
   Operation 2 and 3 are tracked by block layer's queue after Stefan's
dropping io_flush() series.
   Now if we stick to GMainContext to represent context concept,
then when thread B want to aquire GMainContext used by thread A,
all works setupped by A should be finished before B aquire it,
otherwise B will execute some function supposed to work in A. The
work refers to the three kind of aync functions above.
   For this issue, we can solve it by different means:
1 the event loop API doesn't guarentee work setupped by thread A
will always be finished in A. This put a limitation to caller to
consider thread switching. I talked on IRC with Stefan, and thinks
it is possible for the nested user block layer, but I still want to
avoid this to the flat user above block layer.
2 ask caller to finish all pending operations.
   2.1 glib GMainContext API plus custom API such as aio_wait(). This is
bad that detail under GMainContext is exposed to caller. Since
operation 1 mentioned before is not tracked yet, to make sure bdrv_cb()
is called in thread setupped it, 1 need to be added in the track
queue, or in the call chain of aio_wait(), check the queue plus check
operation 1. Perhaps add a custom function ask caller to call as
context_work_flush()?
   2.2 Use GlibQContext API. In this way all operation is hidden,
in whose release() the pending work will be finished. In this way
a clear layer represent event loop can be exposed.

   A clear layer represent event loop will make caller code clean,
especially helpful when try to expose block layer with AIO as a
public library.
   Personally I hope not to use AioContext as context concept, add
iterate APIs around it will be tricky, since actually it is a GSource,
it will bring troubles to integrate with glib's event loop from my
intuition. So I hope to rename AioContext to AioSource, or break it
as a common case in QContext. Then I got bdrv_read(GlibQContext ctx*).

   With a talk with Stefan, I plan to read gstream doc to see how it
deal with such problem.

>>
>>>     For the above reason, I tend to think, maybe we should t wrap all of
>>> Glib's mainloop into custom encapsulation, such as QContext, Add the
>>> aio poll logic in q_context_release(). Use QContext * in every caller
>>> to hide GMainContext *, so QContext layer play the role of clear
>>> event loop API.
>>>
>>>> Priorities didn't cross my mind though, but it seems pretty
>>>> straightfoward...
>>>>
>>>> AioContext could then just be a container of sorts for managing
>>>> bottom-halves and AioContext FDs and binding them to the proper
>>>> GMainContext/MainLoop, but the underlying GSources could
>>>> still be driven by a normal glib-based mainloop, just with a specific
>>>> priority in the nested case.
>>>>
>>>>>
>>>>>> 2) Break AioSource into FdSource and BhSource.
>>>>>>     This make custom code less and simpler, one Gsource for one kind of
>>>>>> job. It is not necessary but IMHO it will make things clear when add
>>>>>> more things into main loop: add a new Gsource sub class, avoid to
>>>>>> always have relationship with AioContext.
>>>>>
>>>>> But this is only complicating things work since users rely on both file-
>>>>> descriptor APIs and bottom half APIs.
>>>>
>>>> Taking things a step further, maybe AioContext can stop being a
>>>> block-specific construct, but actually be the "QContext" we've
>>>> discussed in the past for managing multiple event loops. All
>>>> the block stuff would be hidden away in the GSource priority.
>>>>
>>>> For instance,
>>>>
>>>> #ifndef _WIN32
>>>>
>>>> qemu_aio_set_fd_handler(fd, ...):
>>>>       aio_set_fd_handler(qemu_aio_context, fd, ..., QEMU_PRIORITY_BLOCK)
>>>>
>>>> qemu_set_fd_handler(fd, ...):
>>>>       aio_set_fd_handler(qemu_aio_context, fd, ..., G_PRIORITY_DEFAULT)
>>>>
>>>> #else
>>>>
>>>> qemu_add_wait_object(fd, ...):
>>>>       add_wait_object(qemu_aio_context, fd, ...)
>>>>
>>>> qemu_set_fd_handler(fd, ...):
>>>>       set_socket_handler(qemu_aio_context, fd, ..., G_PRIORITY_DEFAULT)
>>>>
>>>> #endif
>>>>
>>>> qemu_bh_schedule:
>>>>       bh_schedule(qemu_aio_context, ...)
>>>>
>>>> etc...
>>>>
>>>> I'll be sending patches this week for moving
>>>> add_wait_object/qemu_set_fd_handler to GSources, the non-global ones use
>>>> GMainContext * to specify a non-default thread/context, but can be easily
>>>> changed, or we can just do aioctx->g_main_context at the call sites.
>>>> There's some nice possibilities in using the former though: avoiding
>>>> O(n) lookups for stuff like finding the GSource for a particular
>>>> event/event type, for instance, by storing pointers to the GSource or
>>>> some kind of hashmap lookup. But probably better to discuss that aspect
>>>> with some context so I'll try to get those patches out soon.
>>>>
>>>>>
>>>>>>>>      More reasons:
>>>>>>>>      When I thinking how to bind library code to a thread context, it may
>>>>>>>> need to add Context's concept into API of block.c. If I use AioContext,
>>>>>>>> there will need a wrapper API to run the event loop. But If I got
>>>>>>>> glib's GmainContext, things become simple.
>>>>>
>>>>> You already have it because AioContext is a GSource.  You do not need
>>>>> to expose the AioContext, except as a GSource.
>>>>>
>>>     I think expose GmainContext * or QContext *, is better than
>>> GSource *.
>>>
>>> int bdrv_read(GMainContext *ctx,
>>>                 BlockDriverState *bs,
>>>                 int64_t sector_num,
>>>                 uint8_t *buf,
>>>                 int nb_sectors)
>>>
>>>>> Paolo
>>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Wenchao Xia
>


-- 
Best Regards

Wenchao Xia

  reply	other threads:[~2013-08-16  7:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-10  3:24 [Qemu-devel] [RFC] Convert AioContext to Gsource sub classes Wenchao Xia
2013-08-10  8:03 ` Paolo Bonzini
2013-08-12  6:46   ` Wenchao Xia
2013-08-12  7:30     ` Paolo Bonzini
2013-08-12 17:01       ` Michael Roth
2013-08-13  8:44         ` Wenchao Xia
2013-08-15 15:23           ` Michael Roth
2013-08-15 16:32             ` Michael Roth
2013-08-16  7:15               ` Wenchao Xia [this message]
2013-08-16  8:12                 ` Wenchao Xia
2013-08-20  9:59                   ` Wenchao Xia
2013-08-20 17:54                     ` Alex Bligh
2013-08-21  8:45                     ` Stefan Hajnoczi
2013-08-21  9:33                       ` Wenchao Xia
2013-08-22 11:40                         ` Stefan Hajnoczi
2013-08-21 10:06                       ` Alex Bligh
2013-08-10 10:15 ` Alex Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=520DD198.60708@linux.vnet.ibm.com \
    --to=xiawenc@linux.vnet.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=kwolf@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=pingfank@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).