From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:42704)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1QL8Pm-0002yZ-AX
	for qemu-devel@nongnu.org; Sat, 14 May 2011 02:33:27 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1QL8Pk-0001ti-T5
	for qemu-devel@nongnu.org; Sat, 14 May 2011 02:33:26 -0400
Received: from mail-yw0-f45.google.com ([209.85.213.45]:60439)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1QL8Pk-0001ta-Nv
	for qemu-devel@nongnu.org; Sat, 14 May 2011 02:33:24 -0400
Received: by ywl41 with SMTP id 41so1307612ywl.4
	for <qemu-devel@nongnu.org>; Fri, 13 May 2011 23:33:24 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <4DCDDAE3.2030201@linux.vnet.ibm.com>
References: <1305233867-4367-1-git-send-email-jvrao@linux.vnet.ibm.com>
	<20110513085503.GA23158@stefanha-thinkpad.localdomain>
	<87sjsio0zi.fsf@linux.vnet.ibm.com>
	<4DCDCA6F.9000503@linux.vnet.ibm.com>
	<4DCDDAE3.2030201@linux.vnet.ibm.com>
Date: Sat, 14 May 2011 07:33:24 +0100
Message-ID: <BANLkTikVYzAZqv-JDYF61ttG6vO_pR9r6Q@mail.gmail.com>
From: Stefan Hajnoczi <stefanha@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [0/25] Async threading for VirtFS using glib
 threads & coroutines.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Cc: qemu-devel@nongnu.org

On Sat, May 14, 2011 at 2:29 AM, Venkateswararao Jujjuri
<jvrao@linux.vnet.ibm.com> wrote:
> On 05/13/2011 05:18 PM, Venkateswararao Jujjuri wrote:
>>
>> On 05/13/2011 12:26 PM, Aneesh Kumar K.V wrote:
>>>
>>> On Fri, 13 May 2011 09:55:03 +0100, Stefan Hajnoczi<stefanha@gmail.com>
>>> =A0wrote:
>>>>
>>>> On Thu, May 12, 2011 at 01:57:22PM -0700, Venkateswararao Jujjuri (JV)
>>>> wrote:
>>>>>
>>>>> VirtFS (fileserver base on 9P) performs many blocking system calls in
>>>>> the
>>>>> vCPU context. This effort is to move the blocking calls out of vCPU/I=
O
>>>>> thread context, into asynchronous threads.
>>>>>
>>>>> Anthony's " Add hard build dependency on glib" patch and
>>>>> Kevin/Stefan's coroutine effort is a prerequisite.
>>>>>
>>>>> This patch set contains:
>>>>> =A0- Converting all 9pfs calls into coroutines.
>>>>> =A0- Each 9P operation will be modified for:
>>>>> =A0 =A0 - Remove post* functions. These are our call back functions w=
hich
>>>>> makes
>>>>> =A0 =A0 =A0 the code very hard to read. Now with coroutines, we can a=
chieve
>>>>> the same
>>>>> =A0 =A0 =A0 state machine model with nice sequential code flow.
>>>>> =A0 =A0 - Move errno access near to the local_syscall()
>>>>> =A0 =A0 - Introduce asynchronous threading
>>>>>
>>>>> This series has the basic infrastructure and few routines like
>>>>> mkdir,monod,unlink,readdir,xattr,lstat, etc converted.
>>>>> Currently we are working on converting and testing other 9P operation=
s
>>>>> also
>>>>> into this model and those patches will follow shortly.
>>>>>
>>>>> Removing callback functions made some of the patches little lengthy.
>>>>
>>>> This long patch series adds temporary structs and marshalling code for
>>>> each file system operation - I think none of this is necessary. =A0Ins=
tead
>>>> we can exploit coroutines more:
>>>>
>>>> The point of coroutines is that you can suspend a thread of control (a
>>>> call-stack, not an OS-level thread) and can re-enter it later. =A0We
>>>> should make coroutines thread-safe (i.e. work outside of the global
>>>> mutex) and then allow switching a coroutine from a QEMU thread to a
>>>> worker thread and back again:
>>>>
>>>> int coroutine_fn v9fs_co_readdir(V9fsState *s, V9fsFidState *fidp,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0str=
uct dirent **dent)
>>>> {
>>>> =A0 =A0 int ret =3D 0;
>>>>
>>>> =A0 =A0 v9fs_co_run_in_worker({
>>>> =A0 =A0 =A0 =A0 errno =3D 0;
>>>> =A0 =A0 =A0 =A0 *dent =3D s->ops->readdir(&s->ctx, fidp->fs.dir);
>>>> =A0 =A0 =A0 =A0 if (!*dent&& =A0errno) {
>>>> =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -errno;
>>>> =A0 =A0 =A0 =A0 }
>>>> =A0 =A0 });
>>>> =A0 =A0 return ret;
>>>> }
>>>>
>>>> v9fs_co_readdir() can be called from a QEMU thread. =A0The block of co=
de
>>>> inside v9fs_co_run_in_worker() will be executed in a worker thread.
>>>> Notice that no marshalling variables is necessary at all; we can use t=
he
>>>> function arguments and local variables because this is still the same
>>>> function!
>>>>
>>>> When control reaches the end of the v9fs_co_run_in_worker() block,
>>>> execution is resumed in a QEMU thread and the function then returns re=
t.
>>>> It would be incorrect to return inside the v9fs_co_run_in_worker() blo=
ck
>>>> because at that point we're still inside the worker thread.
>>>>
>>>> Here is how v9fs_co_run_in_worker() does its magic:
>>>>
>>>> #define v9fs_co_run_in_worker(block) \
>>>> { \
>>>> =A0 =A0 BH *co_bh; \
>>>> \
>>>> =A0 =A0 co_bh =3D qemu_bh_new(co_run_in_worker_bh, qemu_coroutine_self=
()); \
>>>> =A0 =A0 qemu_bh_schedule(co_bh); \
>>>> =A0 =A0 qemu_coroutine_yield(); /* re-entered in worker thread */ \
>>>> =A0 =A0 qemu_bh_delete(co_bh); \
>>>> \
>>>> =A0 =A0 block; \
>>>> \
>>>> =A0 =A0 qemu_coroutine_yield(); /* re-entered in QEMU thread */ \
>>>> }
>>>>
>>>> void co_run_in_worker_bh(void *opaque)
>>>> {
>>>> =A0 =A0 Coroutine *co =3D opaque;
>>>>
>>>> =A0 =A0 g_thread_pool_push(pool, co, NULL);
>>>> }
>>>>
>>>> void worker_thread_fn(gpointer data, gpointer user_data)
>>>> {
>>>> =A0 =A0 Coroutine *co =3D user_data;
>>>> =A0 =A0 char byte =3D 0;
>>>> =A0 =A0 ssize_t len;
>>>>
>>>> =A0 =A0 qemu_coroutine_enter(co, NULL);
>>>>
>>>> =A0 =A0 g_async_queue_push(v9fs_pool.completed, co);
>>>> =A0 =A0 do {
>>>> =A0 =A0 =A0 =A0 len =3D write(v9fs_pool.wfd,&byte, sizeof(byte));
>>>> =A0 =A0 } while (len =3D=3D -1&& =A0errno =3D=3D EINTR);
>>>> }
>>>>
>>>> void process_req_done(void *arg)
>>>> {
>>>> =A0 =A0 Coroutine *co;
>>>> =A0 =A0 char byte;
>>>> =A0 =A0 ssize_t len;
>>>>
>>>> =A0 =A0 do {
>>>> =A0 =A0 =A0 =A0 len =3D read(v9fs_pool.rfd,&byte, sizeof(byte));
>>>> =A0 =A0 } while (len =3D=3D -1&& =A0errno =3D=3D EINTR);
>>>>
>>>> =A0 =A0 while ((co =3D g_async_queue_try_pop(v9fs_pool.completed)) !=
=3D NULL) {
>>>> =A0 =A0 =A0 =A0 qemu_coroutine_enter(co, NULL);
>>>> =A0 =A0 }
>>>> }
>>>>
>>>> I typed this code out in the email, it has not been compiled or tested=
.
>>>>
>>>> If you decide to eliminate coroutines entirely in the future and use
>>>> worker threads exclusively to process requests, then there are clearly
>>>> marked sections in the code: anything inside v9fs_co_run_in_worker()
>>>> must be thread-safe already and anything outside it needs to be audite=
d
>>>> and made thread-safe. =A0The changes required are smaller than those i=
f
>>>> your current patch series was applied. =A0I wanted to mention this poi=
nt
>>>> to show that this doesn't paint virtfs into a corner.
>>>>
>>>> So where does this leave virtfs? =A0No marshalling is necessary and
>>>> blocking operations can be performed inline using
>>>> v9fs_co_run_in_worker() blocks. =A0The codebase will be a lot smaller.
>>>>
>>>> Does this seem reasonable?
>>>>
>>> Do we really need bottom halfs here ? can't we achieve the same with
>>> v9fs_qemu_submit_request() and making the glib thread
>>> function callback (request.func())to do qemu_coroutine_enter()
>>
>>
>> I had the same question. :) =A0Tested without BH and touch testing worke=
d
>> fine.
>>
>>
>> #define v9fs_co_run_in_worker(block) \
>> { \
>> =A0 =A0g_thread_pool_push(v9fs_pool.pool, qemu_coroutine_self(), NULL); =
\
>> =A0 =A0qemu_coroutine_yield(); /* re-entered in worker thread */ \
>> \
>> =A0 =A0block; \
>> \
>> =A0 =A0qemu_coroutine_yield(); /* re-entered in QEMU thread */ \
>> }
>
> I guess there is a need for BH. :)
> Without that .. little stress with smp=3D2 causing
>
> Co-routine re-entered recursively
> Aborted

Right.  It's for synchronization:
1. Yield the coroutine in the QEMU thread.
2. Submit the coroutine to a worker thread.
3. Enter the coroutine in the worker thread.

But it's not safe to swap steps 1 and 2.  Otherwise the worker thread
could enter the coroutine before step 2 while it is still running.

Stefan