From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35376) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XuKYP-0000NN-Ds for qemu-devel@nongnu.org; Fri, 28 Nov 2014 07:21:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XuKYG-0006rY-J3 for qemu-devel@nongnu.org; Fri, 28 Nov 2014 07:21:41 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46594) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XuKYG-0006rH-9e for qemu-devel@nongnu.org; Fri, 28 Nov 2014 07:21:32 -0500 Message-ID: <547868BE.1030206@redhat.com> Date: Fri, 28 Nov 2014 13:21:18 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1417084026-12307-1-git-send-email-pl@kamp.de> <1417084026-12307-4-git-send-email-pl@kamp.de> <547753F7.2030709@redhat.com> <54782EC3.10005@kamp.de> <54784E55.6060405@redhat.com> <54785067.60905@kamp.de> <547858FF.5070602@redhat.com> <54785AA5.9070409@kamp.de> <54785B2E.9070203@redhat.com> <54785D60.1070306@kamp.de> In-Reply-To: <54785D60.1070306@kamp.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH 3/3] qemu-coroutine: use a ring per thread for the pool List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven , ming.lei@canonical.com, Kevin Wolf , Stefan Hajnoczi , "qemu-devel@nongnu.org" , Markus Armbruster On 28/11/2014 12:32, Peter Lieven wrote: > Am 28.11.2014 um 12:23 schrieb Paolo Bonzini: >> >> On 28/11/2014 12:21, Peter Lieven wrote: >>> Am 28.11.2014 um 12:14 schrieb Paolo Bonzini: >>>>> master: >>>>> Run operation 40000000 iterations 12.851414 s, 3112K operations/s, = 321ns per coroutine >>>>> >>>>> paolo: >>>>> Run operation 40000000 iterations 11.951720 s, 3346K operations/s, = 298ns per coroutine >>>> Nice. :) >>>> >>>> Can you please try "coroutine: Use __thread =85 " together, too? I = still >>>> see 11% time spent in pthread_getspecific, and I get ~10% more indee= d if >>>> I apply it here (my times are 191/160/145). >>> indeed: >>> >>> Run operation 40000000 iterations 10.138684 s, 3945K operations/s, 25= 3ns per coroutine >> Your perf_master2 uses the ring buffer unconditionally, right? I wond= er >> if we can use a similar algorithm but with arrays instead of lists... >=20 > Why do you set pool_size =3D 0 in the create path? >=20 > When I do the following: > diff --git a/qemu-coroutine.c b/qemu-coroutine.c > index 6bee354..c79ee78 100644 > --- a/qemu-coroutine.c > +++ b/qemu-coroutine.c > @@ -44,7 +44,7 @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entr= y) > * and the actual size of alloc_pool. But it is just = a heuristic, > * it does not need to be perfect. > */ > - pool_size =3D 0; > + atomic_dec(&pool_size); > QSLIST_MOVE_ATOMIC(&alloc_pool, &release_pool); > co =3D QSLIST_FIRST(&alloc_pool); >=20 >=20 > I get: > Run operation 40000000 iterations 9.883958 s, 4046K operations/s, 247ns= per coroutine Because pool_size is the (approximate) number of coroutines in the pool. It is zero after QSLIST_MOVE_ATOMIC has NULL-ed out release_pool.slh_fir= st. Paolo