ppc/ppc64 and x86 vsyscalls

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* ppc/ppc64 and x86 vsyscalls
@ 2004-03-08  1:17 Benjamin Herrenschmidt
  2004-03-09  8:05 ` Ulrich Drepper
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-08  1:17 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux Kernel list

Hi Ulrich !

I've seen the various debates regarding the x86 vsyscalls.

I want to implement something similar for ppc, though the actual
syscalls covered will be differents.

The idea is that the kernel already has the necessary infrastructure
for precisely identifying each CPU model and doing various optimisations
based on which CPU we are running on.

We are planning on adding a set of per-CPU (model/familly)
implementation of some critical functions, that I would like to "share"
between the kernel and userland.

That include the atomic ops (some CPUs like embedde 4xx need additional
sync in there, while a sync would be a huge overhead on others) and
spinlocks, but also memory copy routines, and eventually cache
flush/invalidate functions (especially since some CPUs don't need them
at all) and possibly a userland implementation of gettimeofday.

The basic idea is to have a set of pages containing those functions
mapped in the top of the address space. The interest here is that it
would possibly allow to use the short absolute branch instructions
to get there.

However, the actual "form" of those is a bit difficult to decide on. The
problem is that just exposing a .so like x86 does is difficult. There
will be much more functions exposed (maybe around 20) and for each of
them, about 1 to 4 or 5 variations depending on the CPU we are running
on, but also depending on the current process beeing 32 or 64 bits.

They are all very simple leaf functions though. The "easy" way would
be to just have some kind of branch table code can "bal" to directly,
that or an exception-like design, which every function at an 0xn00
offset (with possible branch to some scratch space in the rare case
where a given implementation may overflow). The kernel could esily
"build" the pages based on which implementation is to be used on a
given CPU & process context.

However, the above makes things more difficult for userland, the big
problem as I was told by Alan Modra will be the lack of CFI informations
for stack unwinding on exceptions. But then, adding those for each
implementation makes the complexity of building those completely out
of control.

What do you think ? Any "smart" idea on how we could implement that
keeping the complexity of the kernel side reasonable without making
the userland side a nightmare ?

Regards,
Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ppc/ppc64 and x86 vsyscalls
  2004-03-08  1:17 ppc/ppc64 and x86 vsyscalls Benjamin Herrenschmidt
@ 2004-03-09  8:05 ` Ulrich Drepper
  2004-03-09 11:05   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Ulrich Drepper @ 2004-03-09  8:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux Kernel list

Benjamin Herrenschmidt wrote:

> However, the actual "form" of those is a bit difficult to decide on. The
> problem is that just exposing a .so like x86 does is difficult. There
> will be much more functions exposed (maybe around 20) and for each of
> them, about 1 to 4 or 5 variations depending on the CPU we are running
> on, but also depending on the current process beeing 32 or 64 bits.

This is no reason for not using the DSO form.  The userlevel code finds
the vDSO via the auxiliary vector.  By passing up different values for
32 and 64 bit processes you easily handle the last problem.

Many functions are no real issue either.  It not inefficient to use the
symbol table.

The only issue is that the vDSO should (IMO must) be position
independent.  You certainly want to map the same copy in each address
space.  This means the symbol table cannot contain addresses, only offsets.

> They are all very simple leaf functions though. The "easy" way would
> be to just have some kind of branch table code can "bal" to directly,
> that or an exception-like design, which every function at an 0xn00
> offset (with possible branch to some scratch space in the rare case

This means adding a multiplexer in the vDSO while the caller knows
exactly which function is wanted.  Sure, it's possible and quite easy to
implement.

If you want it more complicated you could do it as I suggested for x86.
 Add absolute symbols to the symbol table, have the libc use the dlsym()
equivalent to determine whether the symbol is defined.  If it is, it'll
return an offset which then can be used for the jump.  ppc64 function
descriptors can certainly be handled.

The results of the dlsym() calls would be cached so that it only happens
once for each symbol.  And since the soname of the vdso is known, one
doesn't even have to look in the global scope but instead directly in
the vdso's symbol table.

This all would definitely need changes in libc and ld.so.  But I guess
it'll work.

> where a given implementation may overflow). The kernel could esily
> "build" the pages based on which implementation is to be used on a
> given CPU & process context.

That's easily doable with a real DSO.

> However, the above makes things more difficult for userland, the big
> problem as I was told by Alan Modra will be the lack of CFI informations
> for stack unwinding on exceptions.

Why lack?  As a real DSO the vDSO can use the normal unwind info
handling userlevel DSOs use, too.  I do not see a reason which this
cannot work.  The unwind info for DSOs is position-independent so it
should work just fine.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ppc/ppc64 and x86 vsyscalls
  2004-03-09  8:05 ` Ulrich Drepper
@ 2004-03-09 11:05   ` Benjamin Herrenschmidt
  2004-03-09 21:14     ` Ulrich Drepper
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-09 11:05 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux Kernel list


> This is no reason for not using the DSO form.  The userlevel code finds
> the vDSO via the auxiliary vector.  By passing up different values for
> 32 and 64 bit processes you easily handle the last problem.
> 
> Many functions are no real issue either.  It not inefficient to use the
> symbol table.

Thanks for the clue.

> The only issue is that the vDSO should (IMO must) be position
> independent.  You certainly want to map the same copy in each address
> space.  This means the symbol table cannot contain addresses, only offsets.

Ok. The problem is building the DSO in the kernel from the various
individual functions depending on the CPU & app mode.

> .../...
>
> Why lack?  As a real DSO the vDSO can use the normal unwind info
> handling userlevel DSOs use, too.  I do not see a reason which this
> cannot work.  The unwind info for DSOs is position-independent so it
> should work just fine.

Ok. So the challenge is to write the necessary code in the kernel
to build that DSO based on the various functions after detection
of the CPU type.

Another option would be to pre-build a bunch of them at kernel compile
time. I have to investigate. The risk is that we end up with too
many combinations, thus bloating the kernel image.

Ben.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ppc/ppc64 and x86 vsyscalls
  2004-03-09 11:05   ` Benjamin Herrenschmidt
@ 2004-03-09 21:14     ` Ulrich Drepper
  2004-03-09 21:33       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Ulrich Drepper @ 2004-03-09 21:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux Kernel list

Benjamin Herrenschmidt wrote:

> Ok. So the challenge is to write the necessary code in the kernel
> to build that DSO based on the various functions after detection
> of the CPU type.
> 
> Another option would be to pre-build a bunch of them at kernel compile
> time. I have to investigate. The risk is that we end up with too
> many combinations, thus bloating the kernel image.

You can create one "big" DSO which covers all the configured processors.
 Then at kernel start time, you determine the actual processor and
adjust the symbol table offsets to point to the correct version.  There
is no requirement that the table used is identical to the one on disk.
It's loaded into ordinary memory which can be modified.

The tricky part of this would be to determine the symbol table slots.
But even this is quite simple.  Just locate the symbol table in the
ELF-way, then iterate over the entries and use strcmp() for the names
and act upon match.  What you shouldn't do is to generate pointer to the
symbol table entries somewhere.  This is probably fragile and not worth
the few cycles you'll save.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ppc/ppc64 and x86 vsyscalls
  2004-03-09 21:14     ` Ulrich Drepper
@ 2004-03-09 21:33       ` Benjamin Herrenschmidt
  2004-03-10  0:35         ` Ulrich Drepper
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-09 21:33 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux Kernel list

On Wed, 2004-03-10 at 08:14, Ulrich Drepper wrote:

> You can create one "big" DSO which covers all the configured processors.
>  Then at kernel start time, you determine the actual processor and
> adjust the symbol table offsets to point to the correct version.  There
> is no requirement that the table used is identical to the one on disk.
> It's loaded into ordinary memory which can be modified.

Ah, interesting.

> The tricky part of this would be to determine the symbol table slots.
> But even this is quite simple.  Just locate the symbol table in the
> ELF-way, then iterate over the entries and use strcmp() for the names
> and act upon match.  What you shouldn't do is to generate pointer to the
> symbol table entries somewhere.  This is probably fragile and not worth
> the few cycles you'll save.

Yes, make sense. I can "pre" prepare bth 32 and 64 bits DSOs at kernel
start time, then I just have to map them. I suppose I should layout my
DSO in such a way:

/* In one place the actual function implementation */
 function_A_vers_1()
 function_A_vers_2()
 function_A_vers_3()
 function_B_vers_1()
 function_B_vers_2()
   etc .../...

/* Then, some empty "stubs" for the symbol table that gets really
 * linked into user binaries. Those are the symbol table entries
 * that get patched
 */
 function_A() {}
 function_B() {}

Sounds right ? It's not completely obvious how I'll do to also map the
64 bits versions of these in the kernel address space, but I can find
a trick.

Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ppc/ppc64 and x86 vsyscalls
  2004-03-09 21:33       ` Benjamin Herrenschmidt
@ 2004-03-10  0:35         ` Ulrich Drepper
  2004-03-10  2:08           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Ulrich Drepper @ 2004-03-10  0:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linux Kernel list

Benjamin Herrenschmidt wrote:

> /* In one place the actual function implementation */
>  function_A_vers_1()
>  function_A_vers_2()
>  function_A_vers_3()
>  function_B_vers_1()
>  function_B_vers_2()
>    etc .../...
> 
> /* Then, some empty "stubs" for the symbol table that gets really
>  * linked into user binaries. Those are the symbol table entries
>  * that get patched
>  */
>  function_A() {}
>  function_B() {}
> 
> Sounds right ?

Basically yes.  But you don't actually need the stub functions.  You
just need a symbol table entry which can be arranged via an alias to any
one of the real functions.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: ppc/ppc64 and x86 vsyscalls
  2004-03-10  0:35         ` Ulrich Drepper
@ 2004-03-10  2:08           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2004-03-10  2:08 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux Kernel list


> 
> Basically yes.  But you don't actually need the stub functions.  You
> just need a symbol table entry which can be arranged via an alias to any
> one of the real functions.


Ok, thanks.

Ben.




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-03-10  2:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-08  1:17 ppc/ppc64 and x86 vsyscalls Benjamin Herrenschmidt
2004-03-09  8:05 ` Ulrich Drepper
2004-03-09 11:05   ` Benjamin Herrenschmidt
2004-03-09 21:14     ` Ulrich Drepper
2004-03-09 21:33       ` Benjamin Herrenschmidt
2004-03-10  0:35         ` Ulrich Drepper
2004-03-10  2:08           ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox