From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mathieu Desnoyers Subject: Re: [RFC PATCH 0/3] Implement getcpu_cache system call Date: Wed, 13 Jan 2016 00:22:29 +0000 (UTC) Message-ID: <484967406.344576.1452644549992.JavaMail.zimbra@efficios.com> References: <1451977320-4886-1-git-send-email-mathieu.desnoyers@efficios.com> <20160111230306.GC28717@cloud> <137700396.343696.1452559758752.JavaMail.zimbra@efficios.com> <20160112024549.GA6488@x> <9F8D25C2-B5EE-479D-BD61-0FE466962B9E@fb.com> <467525713.343916.1452604549209.JavaMail.zimbra@efficios.com> <5CDDBDF2D36D9F43B9F5E99003F6A0D49A1426A5@PRN-MBX02-1.TheFacebook.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5CDDBDF2D36D9F43B9F5E99003F6A0D49A1426A5-f8hGUhss0nh9TZdEUguypQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Ben Maurer Cc: Josh Triplett , Shane M Seymour , Thomas Gleixner , Paul Turner , Andrew Hunter , Peter Zijlstra , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api , Andy Lutomirski , Andi Kleen , Dave Watson , Chris Lameter , Ingo Molnar , rostedt , "Paul E. McKenney" , Linus Torvalds , Andrew Morton , Russell King , Catalin Marinas , Will Deacon , Michael Kerrisk List-Id: linux-api@vger.kernel.org ----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer-b10kYP2dOMg@public.gmane.org wrote: >> One idea I have would be to let the kernel reserve some space either after the >> first stack address (for a stack growing down) or at the beginning of the >> allocated TLS area for each thread in copy_thread_tls() by fiddling with >> sp or the tls base address when creating a thread. > > Could this be implemented by having glibc use a well known symbol name to define > the per-thread TLS area? If an high performance application wants to avoid any > relocations in accessing this variable it would define it and that definition > would override glibc's. This is how things work with malloc. glibc has a > default malloc implementation but we link jemalloc directly into our binaries. > in addition to changing the malloc implementation this means that calls to > malloc don't go through the PLT. Just to make sure I understand your proposal: defining a well known symbol with a weak attribute in glibc (or bionic...), e.g.: int32_t __thread __attribute__((weak)) __getcpu_cache; so that applications which care about bypassing the PLT can override it with: int32_t __thread __getcpu_cache; glibc/bionic would be responsible for calling the getcpu_cache() system call to register/unregister this TLS variable for each thread. One thing I would like to figure out is whether we can use this in a way that would allow introducing getcpu_cache() into applications and libraries (e.g. lttng-ust tracer) before it gets implemented into glibc, in a way that would keep forward compatibility for whenever it gets introduced in glibc. We can declare __getcpu_cache as a weak symbol in arbitrary libraries, and make them register/unregister the cache through the getcpu_cache syscall. The main thing that I would need to tweak at the kernel level within the system call would be to keep a refcount of the number of times the __getcpu_cache is registered per thread. This would allow multiple registrations, one per library (e.g. lttng-ust) and one for glibc, but we would validate that they all register the exact same address for a given thread. The reference counting trick should also work for cases where applications define a non-weak __getcpu_cache, and want to call the getcpu_cache system call to register it themselves (before glibc adds support for it). Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754392AbcAMAWd (ORCPT ); Tue, 12 Jan 2016 19:22:33 -0500 Received: from mail.efficios.com ([78.47.125.74]:57984 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754011AbcAMAWb (ORCPT ); Tue, 12 Jan 2016 19:22:31 -0500 Date: Wed, 13 Jan 2016 00:22:29 +0000 (UTC) From: Mathieu Desnoyers To: Ben Maurer Cc: Josh Triplett , Shane M Seymour , Thomas Gleixner , Paul Turner , Andrew Hunter , Peter Zijlstra , linux-kernel@vger.kernel.org, linux-api , Andy Lutomirski , Andi Kleen , Dave Watson , Chris Lameter , Ingo Molnar , rostedt , "Paul E. McKenney" , Linus Torvalds , Andrew Morton , Russell King , Catalin Marinas , Will Deacon , Michael Kerrisk Message-ID: <484967406.344576.1452644549992.JavaMail.zimbra@efficios.com> In-Reply-To: <5CDDBDF2D36D9F43B9F5E99003F6A0D49A1426A5@PRN-MBX02-1.TheFacebook.com> References: <1451977320-4886-1-git-send-email-mathieu.desnoyers@efficios.com> <20160111230306.GC28717@cloud> <137700396.343696.1452559758752.JavaMail.zimbra@efficios.com> <20160112024549.GA6488@x> <9F8D25C2-B5EE-479D-BD61-0FE466962B9E@fb.com> <467525713.343916.1452604549209.JavaMail.zimbra@efficios.com> <5CDDBDF2D36D9F43B9F5E99003F6A0D49A1426A5@PRN-MBX02-1.TheFacebook.com> Subject: Re: [RFC PATCH 0/3] Implement getcpu_cache system call MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [78.47.125.74] X-Mailer: Zimbra 8.6.0_GA_1178 (ZimbraWebClient - FF43 (Linux)/8.6.0_GA_1178) Thread-Topic: [RFC PATCH 0/3] Implement getcpu_cache system call Thread-Index: AQHRR4cGvK1Ai2CjXUGdGoSQqME6XJ73eKIAgAAG4gCAAB2sAIAAII6A//+WaJG363QifcgVoL/h/dm4moU= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Jan 12, 2016, at 4:02 PM, Ben Maurer bmaurer@fb.com wrote: >> One idea I have would be to let the kernel reserve some space either after the >> first stack address (for a stack growing down) or at the beginning of the >> allocated TLS area for each thread in copy_thread_tls() by fiddling with >> sp or the tls base address when creating a thread. > > Could this be implemented by having glibc use a well known symbol name to define > the per-thread TLS area? If an high performance application wants to avoid any > relocations in accessing this variable it would define it and that definition > would override glibc's. This is how things work with malloc. glibc has a > default malloc implementation but we link jemalloc directly into our binaries. > in addition to changing the malloc implementation this means that calls to > malloc don't go through the PLT. Just to make sure I understand your proposal: defining a well known symbol with a weak attribute in glibc (or bionic...), e.g.: int32_t __thread __attribute__((weak)) __getcpu_cache; so that applications which care about bypassing the PLT can override it with: int32_t __thread __getcpu_cache; glibc/bionic would be responsible for calling the getcpu_cache() system call to register/unregister this TLS variable for each thread. One thing I would like to figure out is whether we can use this in a way that would allow introducing getcpu_cache() into applications and libraries (e.g. lttng-ust tracer) before it gets implemented into glibc, in a way that would keep forward compatibility for whenever it gets introduced in glibc. We can declare __getcpu_cache as a weak symbol in arbitrary libraries, and make them register/unregister the cache through the getcpu_cache syscall. The main thing that I would need to tweak at the kernel level within the system call would be to keep a refcount of the number of times the __getcpu_cache is registered per thread. This would allow multiple registrations, one per library (e.g. lttng-ust) and one for glibc, but we would validate that they all register the exact same address for a given thread. The reference counting trick should also work for cases where applications define a non-weak __getcpu_cache, and want to call the getcpu_cache system call to register it themselves (before glibc adds support for it). Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com