From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:45431)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jvrao@linux.vnet.ibm.com>) id 1QL2ZX-0000QZ-RR
	for qemu-devel@nongnu.org; Fri, 13 May 2011 20:19:09 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jvrao@linux.vnet.ibm.com>) id 1QL2ZW-0006Jc-J0
	for qemu-devel@nongnu.org; Fri, 13 May 2011 20:19:07 -0400
Received: from e33.co.us.ibm.com ([32.97.110.151]:40640)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jvrao@linux.vnet.ibm.com>) id 1QL2ZW-0006JX-7N
	for qemu-devel@nongnu.org; Fri, 13 May 2011 20:19:06 -0400
Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com
	[9.17.195.226])
	by e33.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p4E0BrF3008775
	for <qemu-devel@nongnu.org>; Fri, 13 May 2011 18:11:54 -0600
Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168])
	by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	p4E0Iv8v140790
	for <qemu-devel@nongnu.org>; Fri, 13 May 2011 18:18:57 -0600
Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1])
	by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id p4DIIT01024928
	for <qemu-devel@nongnu.org>; Fri, 13 May 2011 12:18:29 -0600
Message-ID: <4DCDCA6F.9000503@linux.vnet.ibm.com>
Date: Fri, 13 May 2011 17:18:55 -0700
From: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <1305233867-4367-1-git-send-email-jvrao@linux.vnet.ibm.com>
	<20110513085503.GA23158@stefanha-thinkpad.localdomain>
	<87sjsio0zi.fsf@linux.vnet.ibm.com>
In-Reply-To: <87sjsio0zi.fsf@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [0/25] Async threading for VirtFS using glib
 threads & coroutines.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>, aliguori@us.ibm.com, qemu-devel@nongnu.org, stefanha@linux.vnet.ibm.com

On 05/13/2011 12:26 PM, Aneesh Kumar K.V wrote:
> On Fri, 13 May 2011 09:55:03 +0100, Stefan Hajnoczi<stefanha@gmail.com>  wrote:
>> On Thu, May 12, 2011 at 01:57:22PM -0700, Venkateswararao Jujjuri (JV) wrote:
>>> VirtFS (fileserver base on 9P) performs many blocking system calls in the
>>> vCPU context. This effort is to move the blocking calls out of vCPU/IO
>>> thread context, into asynchronous threads.
>>>
>>> Anthony's " Add hard build dependency on glib" patch and
>>> Kevin/Stefan's coroutine effort is a prerequisite.
>>>
>>> This patch set contains:
>>>   - Converting all 9pfs calls into coroutines.
>>>   - Each 9P operation will be modified for:
>>>      - Remove post* functions. These are our call back functions which makes
>>>        the code very hard to read. Now with coroutines, we can achieve the same
>>>        state machine model with nice sequential code flow.
>>>      - Move errno access near to the local_syscall()
>>>      - Introduce asynchronous threading
>>>
>>> This series has the basic infrastructure and few routines like
>>> mkdir,monod,unlink,readdir,xattr,lstat, etc converted.
>>> Currently we are working on converting and testing other 9P operations also
>>> into this model and those patches will follow shortly.
>>>
>>> Removing callback functions made some of the patches little lengthy.
>> This long patch series adds temporary structs and marshalling code for
>> each file system operation - I think none of this is necessary.  Instead
>> we can exploit coroutines more:
>>
>> The point of coroutines is that you can suspend a thread of control (a
>> call-stack, not an OS-level thread) and can re-enter it later.  We
>> should make coroutines thread-safe (i.e. work outside of the global
>> mutex) and then allow switching a coroutine from a QEMU thread to a
>> worker thread and back again:
>>
>> int coroutine_fn v9fs_co_readdir(V9fsState *s, V9fsFidState *fidp,
>>                                   struct dirent **dent)
>> {
>>      int ret = 0;
>>
>>      v9fs_co_run_in_worker({
>>          errno = 0;
>>          *dent = s->ops->readdir(&s->ctx, fidp->fs.dir);
>>          if (!*dent&&  errno) {
>>              ret = -errno;
>>          }
>>      });
>>      return ret;
>> }
>>
>> v9fs_co_readdir() can be called from a QEMU thread.  The block of code
>> inside v9fs_co_run_in_worker() will be executed in a worker thread.
>> Notice that no marshalling variables is necessary at all; we can use the
>> function arguments and local variables because this is still the same
>> function!
>>
>> When control reaches the end of the v9fs_co_run_in_worker() block,
>> execution is resumed in a QEMU thread and the function then returns ret.
>> It would be incorrect to return inside the v9fs_co_run_in_worker() block
>> because at that point we're still inside the worker thread.
>>
>> Here is how v9fs_co_run_in_worker() does its magic:
>>
>> #define v9fs_co_run_in_worker(block) \
>> { \
>>      BH *co_bh; \
>> \
>>      co_bh = qemu_bh_new(co_run_in_worker_bh, qemu_coroutine_self()); \
>>      qemu_bh_schedule(co_bh); \
>>      qemu_coroutine_yield(); /* re-entered in worker thread */ \
>>      qemu_bh_delete(co_bh); \
>> \
>>      block; \
>> \
>>      qemu_coroutine_yield(); /* re-entered in QEMU thread */ \
>> }
>>
>> void co_run_in_worker_bh(void *opaque)
>> {
>>      Coroutine *co = opaque;
>>
>>      g_thread_pool_push(pool, co, NULL);
>> }
>>
>> void worker_thread_fn(gpointer data, gpointer user_data)
>> {
>>      Coroutine *co = user_data;
>>      char byte = 0;
>>      ssize_t len;
>>
>>      qemu_coroutine_enter(co, NULL);
>>
>>      g_async_queue_push(v9fs_pool.completed, co);
>>      do {
>>          len = write(v9fs_pool.wfd,&byte, sizeof(byte));
>>      } while (len == -1&&  errno == EINTR);
>> }
>>
>> void process_req_done(void *arg)
>> {
>>      Coroutine *co;
>>      char byte;
>>      ssize_t len;
>>
>>      do {
>>          len = read(v9fs_pool.rfd,&byte, sizeof(byte));
>>      } while (len == -1&&  errno == EINTR);
>>
>>      while ((co = g_async_queue_try_pop(v9fs_pool.completed)) != NULL) {
>>          qemu_coroutine_enter(co, NULL);
>>      }
>> }
>>
>> I typed this code out in the email, it has not been compiled or tested.
>>
>> If you decide to eliminate coroutines entirely in the future and use
>> worker threads exclusively to process requests, then there are clearly
>> marked sections in the code: anything inside v9fs_co_run_in_worker()
>> must be thread-safe already and anything outside it needs to be audited
>> and made thread-safe.  The changes required are smaller than those if
>> your current patch series was applied.  I wanted to mention this point
>> to show that this doesn't paint virtfs into a corner.
>>
>> So where does this leave virtfs?  No marshalling is necessary and
>> blocking operations can be performed inline using
>> v9fs_co_run_in_worker() blocks.  The codebase will be a lot smaller.
>>
>> Does this seem reasonable?
>>
> Do we really need bottom halfs here ? can't we achieve the same with
> v9fs_qemu_submit_request() and making the glib thread
> function callback (request.func())to do qemu_coroutine_enter()


I had the same question. :)  Tested without BH and touch testing worked 
fine.


#define v9fs_co_run_in_worker(block) \
{ \
     g_thread_pool_push(v9fs_pool.pool, qemu_coroutine_self(), NULL); \
     qemu_coroutine_yield(); /* re-entered in worker thread */ \
\
     block; \
\
     qemu_coroutine_yield(); /* re-entered in QEMU thread */ \
}

void v9fs_qemu_process_req_done(void *arg)
{
     Coroutine *co;
     char byte;
     ssize_t len;

     do {
         len = read(v9fs_pool.rfd, &byte, sizeof(byte));
     } while (len == -1 &&  errno == EINTR);

     while ((co = g_async_queue_try_pop(v9fs_pool.completed)) != NULL) {
         qemu_coroutine_enter(co, NULL);
     }
}

void v9fs_thread_routine(gpointer data, gpointer user_data)
{
     Coroutine *co = data;
     char byte = 0;
     ssize_t len;

     qemu_coroutine_enter(co, NULL);

     g_async_queue_push(v9fs_pool.completed, co);
     do {
         len = write(v9fs_pool.wfd, &byte, sizeof(byte));
     } while (len == -1 && errno == EINTR);
}

This model makes the code simple and also in one shot we can convert all 
v9fs_do_syscalls
into asynchronous threads. But as Aneesh raised will there be any 
additional overhead
for the additional jumps?  We can quickly test it out too.

For this to work; First we need to consolidate errno and then convert 
v9fs_do_syscalls
with v9fs_co_run_in_worker(). After that we can post patches folding in 
the Post* functions.
Just a thought.

Thanks,
JV

> like:
>
> int v9fs_co_readdir(V9fsState *s, V9fsFidState *fidp, struct dirent **dent)
> {
>       v9fs_wthread_enter();
>       s->ops->readdir(&s->ctx, fidp->fs.dir);
>       v9fs_wthread_exit();
> }
>
> v9fs_worker_thread_enter()
> {
>      v9fs_qemu_submit_request(v9fs_worker_request);
>      qemu_coroutine_yield();
> }
>
> v9fs_coroutine_woker_func()
> {
>        qemu_coroutine_enter(coroutine, NULL);
> }
>
>
> I also wonder whether additional bottom halfs and additional
> setcontext/setjmp that we end up with will have a performance impact compared
> to what we have currently ?
>
> -aneesh