From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56023) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XuNbL-0007Xy-At for qemu-devel@nongnu.org; Fri, 28 Nov 2014 10:37:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XuNbF-0002QU-4Z for qemu-devel@nongnu.org; Fri, 28 Nov 2014 10:36:55 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48503) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XuNbE-0002Q2-Se for qemu-devel@nongnu.org; Fri, 28 Nov 2014 10:36:49 -0500 Date: Fri, 28 Nov 2014 16:36:34 +0100 From: Kevin Wolf Message-ID: <20141128153634.GG4035@noname.redhat.com> References: <1417183941-26329-1-git-send-email-pbonzini@redhat.com> <1417183941-26329-2-git-send-email-pbonzini@redhat.com> <87h9xj2xl7.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87h9xj2xl7.fsf@blackfin.pond.sub.org> Subject: Re: [Qemu-devel] [PATCH 1/7] coroutine-ucontext: use __thread List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: Paolo Bonzini , ming.lei@canonical.com, pl@kamp.de, qemu-devel@nongnu.org, stefanha@redhat.com Am 28.11.2014 um 15:45 hat Markus Armbruster geschrieben: > Paolo Bonzini writes: > > > ELF thread local storage is about 10% faster on tests/test-coroutine's > > perf/cost test. The timing on my machine is 160ns per iteration with > > pthread TLS, 145 with ELF TLS. > > > > Based on a patch by Kevin Wolf and Peter Lieven, but redone to follow > > the model of coroutine-win32.c (including the important "noinline" > > attribute!!!). > > > > Platforms without thread-local storage (OpenBSD probably?) will need > > a new-enough GCC for this to compile, in order to use the same emutls > > support that Windows already relies on. > [...] > > @@ -193,15 +155,22 @@ void qemu_coroutine_delete(Coroutine *co_) > > g_free(co); > > } > > > > +/* This function is marked noinline to prevent GCC from inlining it > > + * into coroutine_trampoline(). If we allow it to do that then it > > + * hoists the code to get the address of the TLS variable "current" > > + * out of the while() loop. This is an invalid transformation because > > + * the SwitchToFiber() call may be called when running thread A but > > + * return in thread B, and so we might be in a different thread > > + * context each time round the loop. > > + */ > > CoroutineAction qemu_coroutine_switch(Coroutine *from_, Coroutine *to_, > > CoroutineAction action) > > Err, did you forget the actual __attribute__((noinline))? The comment needs updating, too. There's no SwitchToFiber() in the ucontext implementation. Kevin