From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?B?T25kxZllaiBCw61sa2E=?= Subject: Re: [RFC PATCH] getcpu_cache system call: caching current CPU number (x86) Date: Tue, 21 Jul 2015 09:30:53 +0200 Message-ID: <20150721073053.GA14716@domone> References: <1436724386-30909-1-git-send-email-mathieu.desnoyers@efficios.com> <55ACB2DC.5010503@redhat.com> <55AD14A4.6030101@redhat.com> <2010227315.699.1437438300542.JavaMail.zimbra@efficios.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <2010227315.699.1437438300542.JavaMail.zimbra-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Mathieu Desnoyers Cc: Linus Torvalds , Andy Lutomirski , Ben Maurer , Ingo Molnar , libc-alpha , Andrew Morton , linux-api , rostedt , "Paul E. McKenney" , Florian Weimer , Josh Triplett , Lai Jiangshan , Paul Turner , Andrew Hunter , Peter Zijlstra List-Id: linux-api@vger.kernel.org On Tue, Jul 21, 2015 at 12:25:00AM +0000, Mathieu Desnoyers wrote: > >> Does it solve the Wine problem? If Wine uses gs for something and > >> calls a function that does this, Wine still goes boom, right? > > > > So the advantage of just making a global segment descriptor available > > is that it's not *that* expensive to just save/restore segments. So > > either wine could do it, or any library users would do it. > > > > But anyway, I'm not sure this is a good idea. The advantage of it is > > that the kernel support really is _very_ minimal. > > Considering that we'd at least also want this feature on ARM and > PowerPC 32/64, and that the gs segment selector approach clashes with > existing apps (wine), I'm not sure that implementing a gs segment > selector based approach to cpu number caching would lead to an overall > decrease in complexity if it leads to performance similar to those of > portable approaches. > > I'm perfectly fine with architecture-specific tweaks that lead to > fast-path speedups, but if we have to bite the bullet and implement > an approach based on TLS and registering a memory area at thread start > through a system call on other architectures anyway, it might end up > being less complex to add a new system call on x86 too, especially if > fast path overhead is similar. > > But I'm inclined to think that some aspect of the question eludes me, > especially given the amount of interest generated by the gs-segment > selector approach. What am I missing ? > As I wrote before you don't have to bite bullet as I said before. It suffices to create 128k element array with cpu for each tid, make that mmapable file and userspace could get cpu with nearly same performance without hacks.