From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NEsGw-0005bY-94 for qemu-devel@nongnu.org; Sun, 29 Nov 2009 17:29:38 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NEsGq-0005b6-Mx for qemu-devel@nongnu.org; Sun, 29 Nov 2009 17:29:37 -0500 Received: from [199.232.76.173] (port=43885 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NEsGq-0005ax-H4 for qemu-devel@nongnu.org; Sun, 29 Nov 2009 17:29:32 -0500 Received: from mail2.shareable.org ([80.68.89.115]:45610) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NEsGq-00021w-5K for qemu-devel@nongnu.org; Sun, 29 Nov 2009 17:29:32 -0500 Date: Sun, 29 Nov 2009 22:29:24 +0000 From: Jamie Lokier Subject: Re: [Qemu-devel] [PATCH 2/7] store thread-specific env information Message-ID: <20091129222924.GA12299@shareable.org> References: <1259256300-23937-1-git-send-email-glommer@redhat.com> <1259256300-23937-2-git-send-email-glommer@redhat.com> <1259256300-23937-3-git-send-email-glommer@redhat.com> <4B129372.1070204@redhat.com> <5E6C2888-0B2C-4BBE-A0E6-B9ECAB50F5F0@web.de> <4B129661.1000808@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4B129661.1000808@redhat.com> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: qemu-devel@nongnu.org, Andreas =?iso-8859-1?Q?F=E4rber?= , Glauber Costa , aliguori@us.ibm.com > On 11/29/2009 05:38 PM, Andreas Färber wrote: >> Am 29.11.2009 um 16:29 schrieb Avi Kivity: >>> Where is __thread not supported? >> Apple, Sun. Some flavours of uClinux :-) Avi Kivity wrote: > Well, pthread_getspecific is around 130 bytes of code, whereas __thread > is just on instruction. Maybe we should support both. It's easy enough, they are quite similar. Except that pthread_key_create lets you provide a destructor which is called as each thread is destroyed (unfortunately no constructor for new threads; and you can use both methods if you need a destructor and speed together). It's not always one instruction - it's more complicated in shared libraries, but it's always close to that. Anyway, I decided to measure them both as I wondered about this for another program. On my 2.0GHz Core Duo (32-bit), tight unrolled loop, everything in cache: Read void *__thread variable ~ 0.6 ns Call pthread_getspecific(key) ~ 8.8 ns __thread is preferable but it's not much overhead to call pthread_getspecific(). Imho, it's not worth making code less portable or more complicated to handle both, but it's a nice touch. However, I did notice that the compiler optimises away references to __thread variables much better, such as hoisting from inside loops. In my programs I have taken to wrapping everything inside a thread_specific(var) macro, similar to the one in the kernel, which expands to call pthread_getspecific() or use __thread[*], That keeps the complexity in one place, which is where the macro is defined. ( [*] - Windows has __thread, but it sometimes crashes when used in a DLL, so I use the Windows equivalent of pthread_getspecific() in the same wrapper macro, which is fine. ) -- Jamie