From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:49612) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TTTgZ-0001Ni-1o for qemu-devel@nongnu.org; Wed, 31 Oct 2012 04:30:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TTTgW-0008S3-DW for qemu-devel@nongnu.org; Wed, 31 Oct 2012 04:30:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:63010) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TTTgW-0008Ru-4o for qemu-devel@nongnu.org; Wed, 31 Oct 2012 04:30:00 -0400 Date: Tue, 30 Oct 2012 20:13:03 +0100 From: Stefan Hajnoczi Message-ID: <20121030191303.GD16674@stefanha-thinkpad.redhat.com> References: <1351260355-19802-1-git-send-email-pbonzini@redhat.com> <1351260355-19802-20-git-send-email-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1351260355-19802-20-git-send-email-pbonzini@redhat.com> Subject: Re: [Qemu-devel] [PATCH 19/25] aio: add generic thread-pool facility List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org On Fri, Oct 26, 2012 at 04:05:49PM +0200, Paolo Bonzini wrote: > +static void event_notifier_ready(EventNotifier *notifier) > +{ > + ThreadPoolElement *elem, *next; > + > + event_notifier_test_and_clear(notifier); > +restart: > + QLIST_FOREACH_SAFE(elem, &head, all, next) { > + if (elem->state != THREAD_CANCELED && elem->state != THREAD_DONE) { > + continue; > + } > + if (elem->state == THREAD_DONE) { > + trace_thread_pool_complete(elem, elem->common.opaque, elem->ret); > + } > + if (elem->state == THREAD_DONE && elem->common.cb) { > + QLIST_REMOVE(elem, all); > + elem->common.cb(elem->common.opaque, elem->ret); This function didn't take the lock. First it accessed elem->state and how it reads elem->ret. We need to take the lock to ensure both elem->state and elem->ret have been set - otherwise we could read elem->ret before the return value was stored. > +typedef struct ThreadPoolCo { > + Coroutine *co; > + int ret; > +} ThreadPoolCo; > + > +static void thread_pool_co_cb(void *opaque, int ret) > +{ > + ThreadPoolCo *co = opaque; > + > + co->ret = ret; > + qemu_coroutine_enter(co->co, NULL); > +} > + > +int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg) > +{ > + ThreadPoolCo tpc = { .co = qemu_coroutine_self(), .ret = -EINPROGRESS }; > + assert(qemu_in_coroutine()); > + thread_pool_submit_aio(func, arg, thread_pool_co_cb, &tpc); > + qemu_coroutine_yield(); > + return tpc.ret; It's important to understand that the submit_aio, yield, return ret pattern works because we assume this function was called as part of the main loop. If thread_pool_submit_co() was called outside the event loop and global mutex, then there is a race between the submit_aio and yield steps where thread_pool_co_cb() is called before this coroutine yields! I see no reason why this is a problem today but I had to think through this case when reading the code. Stefan