From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
Subject: Re: [RFC PATCH] getcpu_cache system call: caching current CPU
 number (x86)
Date: Tue, 21 Jul 2015 12:58:13 +0000 (UTC)
Message-ID: <894137397.137.1437483493715.JavaMail.zimbra@efficios.com>
References: <1436724386-30909-1-git-send-email-mathieu.desnoyers@efficios.com> <55AD14A4.6030101@redhat.com> <CALCETrUx6wFxmz+9TyW5bNgaMN0q180G8y9YOyq_D41sdhFaRQ@mail.gmail.com> <CA+55aFzMJkzydXb7uVv1iSUnp=539d43ghQaonGdzMoF7QLZBA@mail.gmail.com> <CALCETrUZ8vB30rdmeoV4JKPUsRnVPvoxXRJ47CEFud2aSF2=Ew@mail.gmail.com> <CA+55aFwLZLeeN7UN82dyt=emQcNBc8qZPJAw5iqtAbBwFA7FPQ@mail.gmail.com> <2010227315.699.1437438300542.JavaMail.zimbra@efficios.com> <20150721073053.GA14716@domone>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Return-path: <linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
In-Reply-To: <20150721073053.GA14716@domone>
Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
To: =?utf-8?Q?Ond=C5=99ej_B=C3=ADlka?= <neleai-9Vj9tDbzfuSlVyrhU4qvOw@public.gmane.org>
Cc: Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>, Ben Maurer <bmaurer-b10kYP2dOMg@public.gmane.org>, Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, libc-alpha <libc-alpha-9JcytcrH/bA+uJoB2kUjGw@public.gmane.org>, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, linux-api <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>, "Paul E. McKenney" <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>, Florian Weimer <fweimer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Josh Triplett <josh-iaAMLnmF4UmaiuxdJuQwMA@public.gmane.org>, Lai Jiangshan <laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>, Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
List-Id: linux-api@vger.kernel.org

----- On Jul 21, 2015, at 3:30 AM, Ond=C5=99ej B=C3=ADlka neleai@seznam=
=2Ecz wrote:

> On Tue, Jul 21, 2015 at 12:25:00AM +0000, Mathieu Desnoyers wrote:
>> >> Does it solve the Wine problem?  If Wine uses gs for something an=
d
>> >> calls a function that does this, Wine still goes boom, right?
>> >=20
>> > So the advantage of just making a global segment descriptor availa=
ble
>> > is that it's not *that* expensive to just save/restore segments. S=
o
>> > either wine could do it, or any library users would do it.
>> >=20
>> > But anyway, I'm not sure this is a good idea. The advantage of it =
is
>> > that the kernel support really is _very_ minimal.
>>=20
>> Considering that we'd at least also want this feature on ARM and
>> PowerPC 32/64, and that the gs segment selector approach clashes wit=
h
>> existing apps (wine), I'm not sure that implementing a gs segment
>> selector based approach to cpu number caching would lead to an overa=
ll
>> decrease in complexity if it leads to performance similar to those o=
f
>> portable approaches.
>>=20
>> I'm perfectly fine with architecture-specific tweaks that lead to
>> fast-path speedups, but if we have to bite the bullet and implement
>> an approach based on TLS and registering a memory area at thread sta=
rt
>> through a system call on other architectures anyway, it might end up
>> being less complex to add a new system call on x86 too, especially i=
f
>> fast path overhead is similar.
>>=20
>> But I'm inclined to think that some aspect of the question eludes me=
,
>> especially given the amount of interest generated by the gs-segment
>> selector approach. What am I missing ?
>>=20
> As I wrote before you don't have to bite bullet as I said before. It
> suffices to create 128k element array with cpu for each tid, make tha=
t
> mmapable file and userspace could get cpu with nearly same performanc=
e
> without hacks.

I don't see how this would be acceptable on memory-constrained embedded
systems. They have multiple cores, and performance requirements, so
having a fast getcpu would be useful there (e.g. telecom industry),
but they clearly cannot afford a 512kB table per process just for that.

Thanks,

Mathieu

--=20
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com