From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:50623)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jvrao@linux.vnet.ibm.com>) id 1QL3fQ-0003TO-Iu
	for qemu-devel@nongnu.org; Fri, 13 May 2011 21:29:17 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jvrao@linux.vnet.ibm.com>) id 1QL3fP-0001C8-DI
	for qemu-devel@nongnu.org; Fri, 13 May 2011 21:29:16 -0400
Received: from e38.co.us.ibm.com ([32.97.110.159]:60709)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jvrao@linux.vnet.ibm.com>) id 1QL3fP-0001By-4Q
	for qemu-devel@nongnu.org; Fri, 13 May 2011 21:29:15 -0400
Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com
	[9.17.195.227])
	by e38.co.us.ibm.com (8.14.4/8.13.1) with ESMTP id p4E1L8Eq027301
	for <qemu-devel@nongnu.org>; Fri, 13 May 2011 19:21:08 -0600
Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168])
	by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id
	p4E1T7q2162478
	for <qemu-devel@nongnu.org>; Fri, 13 May 2011 19:29:08 -0600
Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1])
	by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
	id p4DJSeIe012962
	for <qemu-devel@nongnu.org>; Fri, 13 May 2011 13:28:40 -0600
Received: from oc6675851006.ibm.com (dyn9047029068.beaverton.ibm.com
	[9.47.29.68])
	by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id
	p4DJSeI6012937
	for <qemu-devel@nongnu.org>; Fri, 13 May 2011 13:28:40 -0600
Message-ID: <4DCDDAE3.2030201@linux.vnet.ibm.com>
Date: Fri, 13 May 2011 18:29:07 -0700
From: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <1305233867-4367-1-git-send-email-jvrao@linux.vnet.ibm.com>
	<20110513085503.GA23158@stefanha-thinkpad.localdomain>
	<87sjsio0zi.fsf@linux.vnet.ibm.com>
	<4DCDCA6F.9000503@linux.vnet.ibm.com>
In-Reply-To: <4DCDCA6F.9000503@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [0/25] Async threading for VirtFS using glib
 threads & coroutines.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

On 05/13/2011 05:18 PM, Venkateswararao Jujjuri wrote:
> On 05/13/2011 12:26 PM, Aneesh Kumar K.V wrote:
>> On Fri, 13 May 2011 09:55:03 +0100, Stefan 
>> Hajnoczi<stefanha@gmail.com>  wrote:
>>> On Thu, May 12, 2011 at 01:57:22PM -0700, Venkateswararao Jujjuri 
>>> (JV) wrote:
>>>> VirtFS (fileserver base on 9P) performs many blocking system calls 
>>>> in the
>>>> vCPU context. This effort is to move the blocking calls out of vCPU/IO
>>>> thread context, into asynchronous threads.
>>>>
>>>> Anthony's " Add hard build dependency on glib" patch and
>>>> Kevin/Stefan's coroutine effort is a prerequisite.
>>>>
>>>> This patch set contains:
>>>>   - Converting all 9pfs calls into coroutines.
>>>>   - Each 9P operation will be modified for:
>>>>      - Remove post* functions. These are our call back functions 
>>>> which makes
>>>>        the code very hard to read. Now with coroutines, we can 
>>>> achieve the same
>>>>        state machine model with nice sequential code flow.
>>>>      - Move errno access near to the local_syscall()
>>>>      - Introduce asynchronous threading
>>>>
>>>> This series has the basic infrastructure and few routines like
>>>> mkdir,monod,unlink,readdir,xattr,lstat, etc converted.
>>>> Currently we are working on converting and testing other 9P 
>>>> operations also
>>>> into this model and those patches will follow shortly.
>>>>
>>>> Removing callback functions made some of the patches little lengthy.
>>> This long patch series adds temporary structs and marshalling code for
>>> each file system operation - I think none of this is necessary.  
>>> Instead
>>> we can exploit coroutines more:
>>>
>>> The point of coroutines is that you can suspend a thread of control (a
>>> call-stack, not an OS-level thread) and can re-enter it later.  We
>>> should make coroutines thread-safe (i.e. work outside of the global
>>> mutex) and then allow switching a coroutine from a QEMU thread to a
>>> worker thread and back again:
>>>
>>> int coroutine_fn v9fs_co_readdir(V9fsState *s, V9fsFidState *fidp,
>>>                                   struct dirent **dent)
>>> {
>>>      int ret = 0;
>>>
>>>      v9fs_co_run_in_worker({
>>>          errno = 0;
>>>          *dent = s->ops->readdir(&s->ctx, fidp->fs.dir);
>>>          if (!*dent&&  errno) {
>>>              ret = -errno;
>>>          }
>>>      });
>>>      return ret;
>>> }
>>>
>>> v9fs_co_readdir() can be called from a QEMU thread.  The block of code
>>> inside v9fs_co_run_in_worker() will be executed in a worker thread.
>>> Notice that no marshalling variables is necessary at all; we can use 
>>> the
>>> function arguments and local variables because this is still the same
>>> function!
>>>
>>> When control reaches the end of the v9fs_co_run_in_worker() block,
>>> execution is resumed in a QEMU thread and the function then returns 
>>> ret.
>>> It would be incorrect to return inside the v9fs_co_run_in_worker() 
>>> block
>>> because at that point we're still inside the worker thread.
>>>
>>> Here is how v9fs_co_run_in_worker() does its magic:
>>>
>>> #define v9fs_co_run_in_worker(block) \
>>> { \
>>>      BH *co_bh; \
>>> \
>>>      co_bh = qemu_bh_new(co_run_in_worker_bh, qemu_coroutine_self()); \
>>>      qemu_bh_schedule(co_bh); \
>>>      qemu_coroutine_yield(); /* re-entered in worker thread */ \
>>>      qemu_bh_delete(co_bh); \
>>> \
>>>      block; \
>>> \
>>>      qemu_coroutine_yield(); /* re-entered in QEMU thread */ \
>>> }
>>>
>>> void co_run_in_worker_bh(void *opaque)
>>> {
>>>      Coroutine *co = opaque;
>>>
>>>      g_thread_pool_push(pool, co, NULL);
>>> }
>>>
>>> void worker_thread_fn(gpointer data, gpointer user_data)
>>> {
>>>      Coroutine *co = user_data;
>>>      char byte = 0;
>>>      ssize_t len;
>>>
>>>      qemu_coroutine_enter(co, NULL);
>>>
>>>      g_async_queue_push(v9fs_pool.completed, co);
>>>      do {
>>>          len = write(v9fs_pool.wfd,&byte, sizeof(byte));
>>>      } while (len == -1&&  errno == EINTR);
>>> }
>>>
>>> void process_req_done(void *arg)
>>> {
>>>      Coroutine *co;
>>>      char byte;
>>>      ssize_t len;
>>>
>>>      do {
>>>          len = read(v9fs_pool.rfd,&byte, sizeof(byte));
>>>      } while (len == -1&&  errno == EINTR);
>>>
>>>      while ((co = g_async_queue_try_pop(v9fs_pool.completed)) != 
>>> NULL) {
>>>          qemu_coroutine_enter(co, NULL);
>>>      }
>>> }
>>>
>>> I typed this code out in the email, it has not been compiled or tested.
>>>
>>> If you decide to eliminate coroutines entirely in the future and use
>>> worker threads exclusively to process requests, then there are clearly
>>> marked sections in the code: anything inside v9fs_co_run_in_worker()
>>> must be thread-safe already and anything outside it needs to be audited
>>> and made thread-safe.  The changes required are smaller than those if
>>> your current patch series was applied.  I wanted to mention this point
>>> to show that this doesn't paint virtfs into a corner.
>>>
>>> So where does this leave virtfs?  No marshalling is necessary and
>>> blocking operations can be performed inline using
>>> v9fs_co_run_in_worker() blocks.  The codebase will be a lot smaller.
>>>
>>> Does this seem reasonable?
>>>
>> Do we really need bottom halfs here ? can't we achieve the same with
>> v9fs_qemu_submit_request() and making the glib thread
>> function callback (request.func())to do qemu_coroutine_enter()
>
>
> I had the same question. :)  Tested without BH and touch testing 
> worked fine.
>
>
> #define v9fs_co_run_in_worker(block) \
> { \
>     g_thread_pool_push(v9fs_pool.pool, qemu_coroutine_self(), NULL); \
>     qemu_coroutine_yield(); /* re-entered in worker thread */ \
> \
>     block; \
> \
>     qemu_coroutine_yield(); /* re-entered in QEMU thread */ \
> }

I guess there is a need for BH. :)
Without that .. little stress with smp=2 causing

Co-routine re-entered recursively
Aborted

- JV


>
> void v9fs_qemu_process_req_done(void *arg)
> {
>     Coroutine *co;
>     char byte;
>     ssize_t len;
>
>     do {
>         len = read(v9fs_pool.rfd, &byte, sizeof(byte));
>     } while (len == -1 &&  errno == EINTR);
>
>     while ((co = g_async_queue_try_pop(v9fs_pool.completed)) != NULL) {
>         qemu_coroutine_enter(co, NULL);
>     }
> }
>
> void v9fs_thread_routine(gpointer data, gpointer user_data)
> {
>     Coroutine *co = data;
>     char byte = 0;
>     ssize_t len;
>
>     qemu_coroutine_enter(co, NULL);
>
>     g_async_queue_push(v9fs_pool.completed, co);
>     do {
>         len = write(v9fs_pool.wfd, &byte, sizeof(byte));
>     } while (len == -1 && errno == EINTR);
> }
>
> This model makes the code simple and also in one shot we can convert 
> all v9fs_do_syscalls
> into asynchronous threads. But as Aneesh raised will there be any 
> additional overhead
> for the additional jumps?  We can quickly test it out too.
>
> For this to work; First we need to consolidate errno and then convert 
> v9fs_do_syscalls
> with v9fs_co_run_in_worker(). After that we can post patches folding 
> in the Post* functions.
> Just a thought.
>
> Thanks,
> JV
>
>> like:
>>
>> int v9fs_co_readdir(V9fsState *s, V9fsFidState *fidp, struct dirent 
>> **dent)
>> {
>>       v9fs_wthread_enter();
>>       s->ops->readdir(&s->ctx, fidp->fs.dir);
>>       v9fs_wthread_exit();
>> }
>>
>> v9fs_worker_thread_enter()
>> {
>>      v9fs_qemu_submit_request(v9fs_worker_request);
>>      qemu_coroutine_yield();
>> }
>>
>> v9fs_coroutine_woker_func()
>> {
>>        qemu_coroutine_enter(coroutine, NULL);
>> }
>>
>>
>> I also wonder whether additional bottom halfs and additional
>> setcontext/setjmp that we end up with will have a performance impact 
>> compared
>> to what we have currently ?
>>
>> -aneesh
>
>