From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread Date: Fri, 26 Feb 2016 12:33:04 +0100 Message-ID: <20160226113304.GA6356@twins.programming.kicks-ass.net> References: <1456270120-7560-1-git-send-email-mathieu.desnoyers@efficios.com> <1456270120-7560-2-git-send-email-mathieu.desnoyers@efficios.com> <20160225095635.GO6356@twins.programming.kicks-ass.net> <390571988.7745.1456419326288.JavaMail.zimbra@efficios.com> <20160225170416.GV6356@twins.programming.kicks-ass.net> <2135602720.7810.1456420671941.JavaMail.zimbra@efficios.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <2135602720.7810.1456420671941.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Mathieu Desnoyers Cc: Andrew Morton , Russell King , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api , Paul Turner , Andrew Hunter , Andy Lutomirski , Andi Kleen , Dave Watson , Chris Lameter , Ben Maurer , rostedt , "Paul E. McKenney" , Josh Triplett , Linus Torvalds , Catalin Marinas , Will Deacon , Michael Kerrisk List-Id: linux-api@vger.kernel.org On Thu, Feb 25, 2016 at 05:17:51PM +0000, Mathieu Desnoyers wrote: > ----- On Feb 25, 2016, at 12:04 PM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote: > > > On Thu, Feb 25, 2016 at 04:55:26PM +0000, Mathieu Desnoyers wrote: > >> ----- On Feb 25, 2016, at 4:56 AM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote: > >> The restartable sequences are intrinsically designed to work > >> on per-cpu data, so they need to fetch the current CPU number > >> within the rseq critical section. This is where the getcpu_cache > >> system call becomes very useful when combined with rseq: > >> getcpu_cache allows reading the current CPU number in a > >> fraction of cycle. > > > > Yes yes, I know how restartable sequences work. > > > > But what I worry about is that they want a cpu number and a sequence > > number, and for performance it would be very good if those live in the > > same cacheline. > > > > That means either getcpu needs to grow a seq number, or restartable > > sequences need to _also_ provide the cpu number. > > If we plan things well, we could have both the cpu number and the > seqnum in the same cache line, registered by two different system > calls. It's up to user-space to organize those two variables > to fit within the same cache-line. I feel this is more fragile than needed. Why not do a single systemcall that does both? > getcpu_cache GETCPU_CACHE_SET operation takes the address where > the CPU number should live as input. > > rseq system call could do the same for the seqnum address. So I really don't like that, that means we have to track more kernel state -- we have to carry two pointers instead of one, we have to have more update functions etc.. That just increases the total overhead of all of this. > The question becomes: how do we introduce this to user-space, > considering that only a single address per thread is allowed > for each of getcpu_cache and rseq ? > > If both CPU number and seqnum are centralized in a TLS within > e.g. glibc, that would be OK, but if we intend to allow libraries > or applications to directly register their own getcpu_cache > address and/or rseq, we may end up in situations where we have > to fallback on using two different cache-lines. But how much > should we care about performance in cases where non-generic > libraries directly use those system calls ? > > Thoughts ? Yeah, not sure, but that is a separate problem. Both your proposed code and the rseq code have this. Having them separate system calls just increases the amount of ways you can do it wrong.