From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=39264 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PFo20-0003z1-L7 for qemu-devel@nongnu.org; Tue, 09 Nov 2010 08:14:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PFo1x-000484-Rf for qemu-devel@nongnu.org; Tue, 09 Nov 2010 08:14:36 -0500 Received: from mail-wy0-f173.google.com ([74.125.82.173]:53654) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PFo1x-00047q-Kv for qemu-devel@nongnu.org; Tue, 09 Nov 2010 08:14:33 -0500 Received: by wyj26 with SMTP id 26so7449211wyj.4 for ; Tue, 09 Nov 2010 05:14:32 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20101109120604.GA3395@linux.vnet.ibm.com> References: <20101108104542.6769.22583.stgit@localhost6.localdomain6> <20101108143322.GA10435@linux.vnet.ibm.com> <20101109120604.GA3395@linux.vnet.ibm.com> Date: Tue, 9 Nov 2010 13:14:31 +0000 Message-ID: Subject: Re: [Qemu-devel] [PATCH 1/3] Make paio subsystem use threadlets infrastructure From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: arun@linux.vnet.ibm.com Cc: qemu-devel@nongnu.org On Tue, Nov 9, 2010 at 12:06 PM, Arun R Bharadwaj wrote: > * Stefan Hajnoczi [2010-11-08 21:29:12]: > >> On Mon, Nov 8, 2010 at 2:33 PM, Arun R Bharadwaj >> wrote: >> > diff --git a/Makefile.objs b/Makefile.objs >> > index cd5a24b..3b7ec27 100644 >> > --- a/Makefile.objs >> > +++ b/Makefile.objs >> > @@ -9,6 +9,7 @@ qobject-obj-y +=3D qerror.o >> > >> > =A0block-obj-y =3D cutils.o cache-utils.o qemu-malloc.o qemu-option.o = module.o >> > =A0block-obj-y +=3D nbd.o block.o aio.o aes.o osdep.o qemu-config.o >> > +block-obj-$(CONFIG_POSIX) +=3D qemu-thread.o >> > =A0block-obj-$(CONFIG_POSIX) +=3D posix-aio-compat.o >> > =A0block-obj-$(CONFIG_LINUX_AIO) +=3D linux-aio.o >> > >> > @@ -124,7 +125,6 @@ endif >> > =A0common-obj-y +=3D $(addprefix ui/, $(ui-obj-y)) >> > >> > =A0common-obj-y +=3D iov.o acl.o >> > -common-obj-$(CONFIG_THREAD) +=3D qemu-thread.o >> > =A0common-obj-y +=3D notify.o event_notifier.o >> > =A0common-obj-y +=3D qemu-timer.o >> > > Hi Stefan, > > Thanks for the quick review. > >> This change makes CONFIG_THREAD unused. =A0The ./configure code that >> sets CONFIG_THREAD=3Dy should be removed. >> > > I'll remove this. > >> > diff --git a/posix-aio-compat.c b/posix-aio-compat.c >> > index 7b862b5..00b2a4e 100644 >> > --- a/posix-aio-compat.c >> > +++ b/posix-aio-compat.c >> > @@ -29,7 +29,32 @@ >> > =A0#include "block_int.h" >> > >> > =A0#include "block/raw-posix-aio.h" >> > +#include "qemu-thread.h" >> > >> > +#define MAX_GLOBAL_THREADS =A064 >> > +#define MIN_GLOBAL_THREADS =A0 8 >> > + >> > +QemuMutex aiocb_mutex; >> >> This variable should be static since it isn't used externally. >> >> > + >> > +static void aio_thread(ThreadletWork *work) >> > =A0{ >> > =A0 =A0 pid_t pid; >> > + =A0 =A0struct qemu_paiocb *aiocb =3D container_of(work, struct qemu_= paiocb, work); >> > + =A0 =A0ssize_t ret =3D 0; >> > >> > =A0 =A0 pid =3D getpid(); >> > + =A0 =A0aiocb->active =3D 1; >> >> aiocb->active needs to be assigned with aiocb_mutex held and then releas= ed in >> order for this memory write to be visible to other threads after this >> line of code. >> > > Yes. That makes sense. We definitely need to hold the mutex here. > >> > >> > =A0static ssize_t qemu_paio_return(struct qemu_paiocb *aiocb) >> > =A0{ >> > =A0 =A0 ssize_t ret; >> > >> > - =A0 =A0mutex_lock(&lock); >> > + =A0 =A0qemu_mutex_lock(&aiocb_mutex); >> > =A0 =A0 ret =3D aiocb->ret; >> > - =A0 =A0mutex_unlock(&lock); >> > - >> > + =A0 =A0qemu_mutex_unlock(&aiocb_mutex); >> > =A0 =A0 return ret; >> > =A0} >> > >> > @@ -536,20 +619,20 @@ static void paio_cancel(BlockDriverAIOCB *blocka= cb) >> > =A0 =A0 struct qemu_paiocb *acb =3D (struct qemu_paiocb *)blockacb; >> > =A0 =A0 int active =3D 0; >> > >> > - =A0 =A0mutex_lock(&lock); >> > =A0 =A0 if (!acb->active) { >> > - =A0 =A0 =A0 =A0QTAILQ_REMOVE(&request_list, acb, node); >> > - =A0 =A0 =A0 =A0acb->ret =3D -ECANCELED; >> > + =A0 =A0 =A0 =A0if (!dequeue_work(&acb->work)) { >> > + =A0 =A0 =A0 =A0 =A0 =A0acb->ret =3D -ECANCELED; >> > + =A0 =A0 =A0 =A0 } else { >> > + =A0 =A0 =A0 =A0 =A0 =A0active =3D 1; >> > + =A0 =A0 =A0 =A0 } >> > =A0 =A0 } else if (acb->ret =3D=3D -EINPROGRESS) { >> > =A0 =A0 =A0 =A0 active =3D 1; >> > =A0 =A0 } >> > - =A0 =A0mutex_unlock(&lock); >> > >> > =A0 =A0 if (active) { >> > =A0 =A0 =A0 =A0 /* fail safe: if the aio could not be canceled, we wai= t for >> > =A0 =A0 =A0 =A0 =A0 =A0it */ >> > - =A0 =A0 =A0 =A0while (qemu_paio_error(acb) =3D=3D EINPROGRESS) >> > - =A0 =A0 =A0 =A0 =A0 =A0; >> > + =A0 =A0 =A0 =A0active =3D qemu_paio_error(acb); >> > =A0 =A0 } >> > >> > =A0 =A0 paio_remove(acb); >> >> acb->ret is not being consistently accessed with aiocb_mutex held. >> >> We don't wait for the work item to complete if it is active. =A0This cha= nges the >> semantics of paio_cancel() and will break callers who expect the request= to be >> cancelled/completed when paio_cancel() returns. =A0Also, we go ahead and= free the >> acb for a running request which is dangerous because it may be reused an= d >> corrupted. >> >> I think both the active variable and field in qemu_paiocb are unnecessar= y >> because dequeue_work() already deals with inactive work items. =A0If >> dequeue_work() was unsuccessful you need to wait until ret !=3D -EINPROG= RESS. >> > > So this would mean that we can use the earlier infinite while loop > right? > > while (qemu_paio_error(acb) =3D=3D EINPROGRESS) > =A0 =A0; > > We can just take this outside the if (active) condition check. I would go for something like this as paio_cancel(): if (dequeue_work(&acb->work) !=3D 0) { /* Wait for running work item to complete */ mutex_lock(&aiocb_mutex); while (acb->ret =3D=3D -EINPROGRESS) { cond_wait(&aiocb_completion, &aiocb_mutex); } mutex_unlock(&aiocb_mutex); } paio_remove(acb); If the work item has not yet been run, then dequeue_work() returns 0 and we can just remove the acb. If the work item is just running we sleep on the completion condition variable until there is a return value. If the work item completes just as we're doing the check, then the aiocb_mutex will synchronize things and we will not go to sleep without detecting that the request has completed. This approach requires introducing a condition variable. We only need a global aiocb_completion condition variable since there can only be one cancellation happening at a time. Stefan