public inbox for linux-parisc@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]     ` <201304021917.17659.vapier@gentoo.org>
@ 2013-04-07 10:00       ` Michael Kerrisk (man-pages)
  2013-04-07 13:55         ` Kyle McMartin
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-07 10:00 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

[Adding a few people to CC who may be able to help with Mike's doubts
on PA-RISC; folks, if any of you could have a quick look at the parisc
piece below, that would be helpful]

Mike,

On Wed, Apr 3, 2013 at 1:17 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
>> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
>> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
>> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
>> >> > on a related topic, would it be useful to document the exact calling
>> >> > convention for architecture system calls ?  from time to time, i need
>> >> > to reference this, and i inevitably turn to a variety of sources to
>> >> > dig up the answer (the kernel itself, or strace, or qemu, or glibc,
>> >> > or uClibc, or lss, or other random places).  i would find it handy to
>> >> > have all of these in a single location.
>> >>
>> >> Sounds like it would be useful to have that documented. Would you have
>> >> a chance to write patches for that?
>> >
>> > should we do it in syscall(2) ?  or a dedicated man page ?
>>
>> It's a little hard to say until I see the shape of what comes. Can you
>> provide a rough per-syscall example or two of what you expect to
>> document? (Don't write too concrete a patch yet, until I can get a
>> handle on what you intend.)
>
> this renders nicely i think.  it shows most of the stuff i'm interested in.
> might be useful to add a dedicated section covering the clobbers in the
> future.

Thanks for that. It looks good to me, and I have applied. But it
renders too wide (wherever possible, I try to ensure that everything
renders inside 80 columns), so I have split into tables, one with
"instruction, NR, ret" and another with the arguments (arg1 to arg7).

Now, just to make 100% sure of your intention, the NR column would be
better named "syscall #" (or similar), right? (I've made that change.)

> --- a/man2/syscall.2
> +++ b/man2/syscall.2
> @@ -79,6 +79,35 @@ and an error code is stored in
>  .BR syscall ()
>  first appeared in
>  4BSD.
> +.SS Architecture calling conventions
> +Every architecture has its own way of invoking & passing arguments to the
> +kernel.
> +Note that the instruction listed below might not be the fastest or best way to
> +transition to the kernel, so you might have to refer to the VDSO.

Mike, any chance that I could interest you in writing a vdso(7) man
page? I've felt the lack of such a page for a while (it need not be
too long), but am not deep enough into the details to write it easily
(I am not sure if you are).

> +Also note that this doesn't cover the entire calling convention -- some
> +architectures may indiscriminately clobber other registers not listed here.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l l l l l l l l l l.
> +arch/ABI       insn    NR      ret     arg1    arg2    arg3    arg4    arg5    arg6    arg7
> +_
> +arm/OABI       swi NR; -       a1      a1      a2      a3      a4      v1      v2      v3
> +arm/EABI       swi 0x0;        r7      r1      r1      r2      r3      r4      r5      r6      r7
> +bfin   excpt 0x0;      P0      R0      R0      R1      R2      R3      R4      R5      -
> +i386   int $0x80;      eax     eax     ebx     ecx     edx     esi     edi     ebp     -
> +ia64   break 0x100000; r15     r10/r8  r11     r9      r10     r14     r15     r13     -
> +.\" not sure about insn or NR
> +.\" parisc     ble 0x100(%%sr2, %%r0); -       r28     r26     r25     r24     r23     r22     r21     -

PA-RISC folks, are you able to confirm/correct the above?

> +sparc/32       t 0x10; g1      o0      o0      o1      o2      o3      o4      o5      -
> +sparc/64       t 0x6d; g1      o0      o0      o1      o2      o3      o4      o5      -
> +x86_64 syscall;        rax     rax     rdi     rsi     rdx     r10     r8      r9      -
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
>  .SS Architecture-specific requirements
>  Each architecture ABI has its own requirements on how
>  system call arguments are passed to the kernel.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 10:00       ` [PATCH] man2 : syscall.2 : document syscall calling conventions Michael Kerrisk (man-pages)
@ 2013-04-07 13:55         ` Kyle McMartin
  2013-04-07 14:56           ` James Bottomley
       [not found]           ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
  0 siblings, 2 replies; 19+ messages in thread
From: Kyle McMartin @ 2013-04-07 13:55 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Mike Frysinger, linux-man, Kyle McMartin, Helge Deller,
	James E.J. Bottomley, linux-parisc

On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> [Adding a few people to CC who may be able to help with Mike's doubts
> on PA-RISC; folks, if any of you could have a quick look at the parisc
> piece below, that would be helpful]
> 

The syscall number is in %r20, everything else looks correct. The
returned value is in %r28 and the args are %r26 through %r21.

--Kyle

> Mike,
> 
> On Wed, Apr 3, 2013 at 1:17 AM, Mike Frysinger <vapier@gentoo.org> wrote:
> > On Tuesday 02 April 2013 02:54:39 Michael Kerrisk (man-pages) wrote:
> >> On Mon, Apr 1, 2013 at 12:32 PM, Mike Frysinger wrote:
> >> > On Monday 01 April 2013 05:29:11 Michael Kerrisk (man-pages) wrote:
> >> >> On Mon, Apr 1, 2013 at 10:29 AM, Mike Frysinger wrote:
> >> >> > on a related topic, would it be useful to document the exact calling
> >> >> > convention for architecture system calls ?  from time to time, i need
> >> >> > to reference this, and i inevitably turn to a variety of sources to
> >> >> > dig up the answer (the kernel itself, or strace, or qemu, or glibc,
> >> >> > or uClibc, or lss, or other random places).  i would find it handy to
> >> >> > have all of these in a single location.
> >> >>
> >> >> Sounds like it would be useful to have that documented. Would you have
> >> >> a chance to write patches for that?
> >> >
> >> > should we do it in syscall(2) ?  or a dedicated man page ?
> >>
> >> It's a little hard to say until I see the shape of what comes. Can you
> >> provide a rough per-syscall example or two of what you expect to
> >> document? (Don't write too concrete a patch yet, until I can get a
> >> handle on what you intend.)
> >
> > this renders nicely i think.  it shows most of the stuff i'm interested in.
> > might be useful to add a dedicated section covering the clobbers in the
> > future.
> 
> Thanks for that. It looks good to me, and I have applied. But it
> renders too wide (wherever possible, I try to ensure that everything
> renders inside 80 columns), so I have split into tables, one with
> "instruction, NR, ret" and another with the arguments (arg1 to arg7).
> 
> Now, just to make 100% sure of your intention, the NR column would be
> better named "syscall #" (or similar), right? (I've made that change.)
> 
> > --- a/man2/syscall.2
> > +++ b/man2/syscall.2
> > @@ -79,6 +79,35 @@ and an error code is stored in
> >  .BR syscall ()
> >  first appeared in
> >  4BSD.
> > +.SS Architecture calling conventions
> > +Every architecture has its own way of invoking & passing arguments to the
> > +kernel.
> > +Note that the instruction listed below might not be the fastest or best way to
> > +transition to the kernel, so you might have to refer to the VDSO.
> 
> Mike, any chance that I could interest you in writing a vdso(7) man
> page? I've felt the lack of such a page for a while (it need not be
> too long), but am not deep enough into the details to write it easily
> (I am not sure if you are).
> 
> > +Also note that this doesn't cover the entire calling convention -- some
> > +architectures may indiscriminately clobber other registers not listed here.
> > +.if t \{\
> > +.ft CW
> > +\}
> > +.TS
> > +l l l l l l l l l l l.
> > +arch/ABI       insn    NR      ret     arg1    arg2    arg3    arg4    arg5    arg6    arg7
> > +_
> > +arm/OABI       swi NR; -       a1      a1      a2      a3      a4      v1      v2      v3
> > +arm/EABI       swi 0x0;        r7      r1      r1      r2      r3      r4      r5      r6      r7
> > +bfin   excpt 0x0;      P0      R0      R0      R1      R2      R3      R4      R5      -
> > +i386   int $0x80;      eax     eax     ebx     ecx     edx     esi     edi     ebp     -
> > +ia64   break 0x100000; r15     r10/r8  r11     r9      r10     r14     r15     r13     -
> > +.\" not sure about insn or NR
> > +.\" parisc     ble 0x100(%%sr2, %%r0); -       r28     r26     r25     r24     r23     r22     r21     -
> 
> PA-RISC folks, are you able to confirm/correct the above?
> 
> > +sparc/32       t 0x10; g1      o0      o0      o1      o2      o3      o4      o5      -
> > +sparc/64       t 0x6d; g1      o0      o0      o1      o2      o3      o4      o5      -
> > +x86_64 syscall;        rax     rax     rdi     rsi     rdx     r10     r8      r9      -
> > +.TE
> > +.if t \{\
> > +.in
> > +.ft P
> > +\}
> >  .SS Architecture-specific requirements
> >  Each architecture ABI has its own requirements on how
> >  system call arguments are passed to the kernel.
> 
> Cheers,
> 
> Michael
> 
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Author of "The Linux Programming Interface"; http://man7.org/tlpi/
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 13:55         ` Kyle McMartin
@ 2013-04-07 14:56           ` James Bottomley
  2013-04-07 15:11             ` Kyle McMartin
       [not found]           ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: James Bottomley @ 2013-04-07 14:56 UTC (permalink / raw)
  To: Kyle McMartin
  Cc: Michael Kerrisk (man-pages), Mike Frysinger, linux-man,
	Kyle McMartin, Helge Deller, James E.J. Bottomley, linux-parisc

On Sun, 2013-04-07 at 09:55 -0400, Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> > [Adding a few people to CC who may be able to help with Mike's doubts
> > on PA-RISC; folks, if any of you could have a quick look at the parisc
> > piece below, that would be helpful]
> > 
> 
> The syscall number is in %r20, everything else looks correct. The
> returned value is in %r28 and the args are %r26 through %r21.

Actually, that's not quite correct.  on 64 bits it's arg1-8 are %r26-%
r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
rest on stack.  We can also do register pair combining on 32 bits for a
64 bit argument.

Our register use is documented in 

Documentation/parisc/registers

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 14:56           ` James Bottomley
@ 2013-04-07 15:11             ` Kyle McMartin
       [not found]               ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Kyle McMartin @ 2013-04-07 15:11 UTC (permalink / raw)
  To: James Bottomley
  Cc: Michael Kerrisk (man-pages), Mike Frysinger, linux-man,
	Kyle McMartin, Helge Deller, James E.J. Bottomley, linux-parisc

On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
> Actually, that's not quite correct.  on 64 bits it's arg1-8 are %r26-%
> r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
> rest on stack.  We can also do register pair combining on 32 bits for a
> 64 bit argument.

I guess the confusion is whether you're writing this from the kernel
side or the userspace side. The syscall instruction is called with six
arg registers, but we fix it on entry to the kernel when we call into C.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]               ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
@ 2013-04-07 15:38                 ` James Bottomley
  2013-04-08  9:18                 ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 19+ messages in thread
From: James Bottomley @ 2013-04-07 15:38 UTC (permalink / raw)
  To: Kyle McMartin
  Cc: Michael Kerrisk (man-pages), Mike Frysinger, linux-man,
	Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

On Sun, 2013-04-07 at 11:11 -0400, Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
> > Actually, that's not quite correct.  on 64 bits it's arg1-8 are %r26-%
> > r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
> > rest on stack.  We can also do register pair combining on 32 bits for a
> > 64 bit argument.
> 
> I guess the confusion is whether you're writing this from the kernel
> side or the userspace side. The syscall instruction is called with six
> arg registers, but we fix it on entry to the kernel when we call into C.

Oh, right, syscall arguments, sorry didn't manage to extract the content
from all the quotes.  I was just thinking general ABI.

The syscall arguments are all in

arch/parisc/include/asm/unistd.h

As Kyle says, we override the calling convention and define in-register
arguments even on 32 bit (so %r26-%r21).  We actually don't define
_syscall6() yet, but we're ready for it.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]           ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
@ 2013-04-07 18:39             ` Mike Frysinger
  2013-04-07 18:48               ` John David Anglin
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Frysinger @ 2013-04-07 18:39 UTC (permalink / raw)
  To: Kyle McMartin
  Cc: Michael Kerrisk (man-pages), linux-man, Kyle McMartin,
	Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: Text/Plain, Size: 884 bytes --]

On Sunday 07 April 2013 09:55:14 Kyle McMartin wrote:
> On Sun, Apr 07, 2013 at 12:00:50PM +0200, Michael Kerrisk (man-pages) wrote:
> > [Adding a few people to CC who may be able to help with Mike's doubts
> > on PA-RISC; folks, if any of you could have a quick look at the parisc
> > piece below, that would be helpful]
> 
> The syscall number is in %r20, everything else looks correct. The
> returned value is in %r28 and the args are %r26 through %r21.

just to be clear, the only insn you need is:
	ble 0x100(%sr2, %r0);

the kernel docs say sr2 holds the kernel gateway page (so i guess 0x100 is a 
known offset into that).  the docs don't mention r0 that i can see, so i'm 
guessing it's one of those "always 0" registers ?

the sysdep code has an ldi call in the branch delay slot (i think), but all 
that seems to do is load r20 with the syscall nr.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 18:39             ` Mike Frysinger
@ 2013-04-07 18:48               ` John David Anglin
       [not found]                 ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
  2013-04-12  1:55                 ` Mike Frysinger
  0 siblings, 2 replies; 19+ messages in thread
From: John David Anglin @ 2013-04-07 18:48 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Kyle McMartin, Michael Kerrisk (man-pages), linux-man,
	Kyle McMartin, Helge Deller, James E.J. Bottomley, linux-parisc

On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:

> just to be clear, the only insn you need is:
> 	ble 0x100(%sr2, %r0);
>
> the kernel docs say sr2 holds the kernel gateway page (so i guess  
> 0x100 is a
> known offset into that).  the docs don't mention r0 that i can see,  
> so i'm
> guessing it's one of those "always 0" registers ?

Yes.  There is also an entry at offset 0xb0 for light-weight- 
syscalls.  Currently,
this implements an atomic CAS operation used for pthread support.

Dave
--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]               ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
  2013-04-07 15:38                 ` James Bottomley
@ 2013-04-08  9:18                 ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 19+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-08  9:18 UTC (permalink / raw)
  To: Kyle McMartin
  Cc: James Bottomley, Mike Frysinger, linux-man, Kyle McMartin,
	Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

On Sun, Apr 7, 2013 at 5:11 PM, Kyle McMartin <kyle-pfcGkIkfWfAsA/PxXw9srA@public.gmane.org> wrote:
> On Sun, Apr 07, 2013 at 07:56:49AM -0700, James Bottomley wrote:
>> Actually, that's not quite correct.  on 64 bits it's arg1-8 are %r26-%
>> r19 but on 32 the convention is that arg1-arg4 are %r26-%r23 and the
>> rest on stack.  We can also do register pair combining on 32 bits for a
>> 64 bit argument.
>
> I guess the confusion is whether you're writing this from the kernel
> side or the userspace side. The syscall instruction is called with six
> arg registers, but we fix it on entry to the kernel when we call into C.> --
> To unsubscribe from this list: send the line "unsubscribe linux-man" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks, Kyle.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                 ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
@ 2013-04-08  9:20                   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 19+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-04-08  9:20 UTC (permalink / raw)
  To: Mike Frysinger, Kyle McMartin
  Cc: John David Anglin, linux-man, Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

On Sun, Apr 7, 2013 at 8:48 PM, John David Anglin <dave.anglin-CzeTG9NwML0@public.gmane.org=
> wrote:
> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>
>> just to be clear, the only insn you need is:
>>         ble 0x100(%sr2, %r0);
>>
>> the kernel docs say sr2 holds the kernel gateway page (so i guess 0x=
100 is
>> a
>> known offset into that).  the docs don't mention r0 that i can see, =
so i'm
>> guessing it's one of those "always 0" registers ?
>
>
> Yes.  There is also an entry at offset 0xb0 for light-weight-syscalls=
=2E
> Currently,
> this implements an atomic CAS operation used for pthread support.

Mike (and Kyle),

=46or review, here are the tables as they now stand:

=3D=3D=3D=3D=3D
   Architecture calling conventions
       Every architecture has its own way of invoking and passing argum=
ents to
       the kernel.  The details for various architectures are  listed  =
in  the
       two tables below.

       The  first  table  lists  the  instruction used to transition to=
 kernel
       mode, (which might not be the fastest or best way to transition =
to  the
       kernel,  so  you might have to refer to the VDSO), the register =
used to
       indicate the system call number, and the register used  to  retu=
rn  the
       system call result.

       arch/ABI   instruction          syscall #   retval  Notes
       =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
       arm/OABI   swi NR               -           a1      NR is syscal=
l #
       arm/EABI   swi 0x0              r7          r1
       blackfin   excpt 0x0            P0          R0
       i386       int $0x80            eax         eax
       ia64       break 0x100000       r15         r10/r8C
       parisc     ble 0x100(%sr2, %r0) r20         r28
       sparc/32   t 0x10               g1          o0
       sparc/64   t 0x6d               g1          o0
       x86_64     syscall              rax         rax

       The second table shows the registers used to pass the system cal=
l argu=E2=80=90
       ments.

       arch/ABI   arg1   arg2   arg3   arg4   arg5   arg6   arg7
       =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80
       arm/OABI   a1     a2     a3     a4     v1     v2     v3
       arm/EABI   r1     r2     r3     r4     r5     r6     r7
       blackfin   R0     R1     R2     R3     R4     R5     -
       i386       ebx    ecx    edx    esi    edi    ebp    -
       ia64       r11    r9     r10    r14    r15    r13    -
       parisc     r26    r25    r24    r23    r22    r21    -
       sparc/32   o0     o1     o2     o3     o4     o5     -
       sparc/64   o0     o1     o2     o3     o4     o5     -
       x86_64     rdi    rsi    rdx    r10    r8     r9     -

       Note that these tables don't cover the entire  calling  conventi=
on=E2=80=94some
       architectures  may  indiscriminately clobber other registers not=
 listed
       here.
=3D=3D=3D=3D=3D

Cheers,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-07 18:48               ` John David Anglin
       [not found]                 ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
@ 2013-04-12  1:55                 ` Mike Frysinger
       [not found]                   ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  2013-04-12 14:01                   ` Kyle McMartin
  1 sibling, 2 replies; 19+ messages in thread
From: Mike Frysinger @ 2013-04-12  1:55 UTC (permalink / raw)
  To: John David Anglin
  Cc: Kyle McMartin, Michael Kerrisk (man-pages), linux-man,
	Kyle McMartin, Helge Deller, James E.J. Bottomley, linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 686 bytes --]

On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > just to be clear, the only insn you need is:
> > 	ble 0x100(%sr2, %r0);
> > 
> > the kernel docs say sr2 holds the kernel gateway page (so i guess
> > 0x100 is a
> > known offset into that).  the docs don't mention r0 that i can see,
> > so i'm
> > guessing it's one of those "always 0" registers ?
> 
> Yes.  There is also an entry at offset 0xb0 for light-weight-
> syscalls.  Currently,
> this implements an atomic CAS operation used for pthread support.

interesting.  sounds like a poor man's vDSO.  i'll document this the new 
vdso(7) man page.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
       [not found]                   ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-12  2:34                     ` John David Anglin
  2013-04-12  3:38                       ` Mike Frysinger
  0 siblings, 1 reply; 19+ messages in thread
From: John David Anglin @ 2013-04-12  2:34 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: Kyle McMartin, Michael Kerrisk (man-pages), linux-man,
	Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc-u79uwXL29TY76Z2rM5mHXA

On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:

> On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
>> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>>> just to be clear, the only insn you need is:
>>> 	ble 0x100(%sr2, %r0);
>>>
>>> the kernel docs say sr2 holds the kernel gateway page (so i guess
>>> 0x100 is a
>>> known offset into that).  the docs don't mention r0 that i can see,
>>> so i'm
>>> guessing it's one of those "always 0" registers ?
>>
>> Yes.  There is also an entry at offset 0xb0 for light-weight-
>> syscalls.  Currently,
>> this implements an atomic CAS operation used for pthread support.
>
> interesting.  sounds like a poor man's vDSO.  i'll document this the  
> new
> vdso(7) man page.

Not exactly, the code runs on the gateway page which is in kernel space.
The main reason for doing the operation in kernel space is to prevent
processes from being preempted while executing in the lock region.  In  
general,
parisc processes are not preempted on the gateway page.  There are
some subtleties regarding fault handling.

There is support in glibc and libgcc for these calls.  The libgcc  
implementation
in linux-atomic.c is very similar to that on arm.

Dave
--
John David Anglin	dave.anglin-CzeTG9NwML0@public.gmane.org



--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  2:34                     ` John David Anglin
@ 2013-04-12  3:38                       ` Mike Frysinger
  2013-04-12  4:45                         ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Frysinger @ 2013-04-12  3:38 UTC (permalink / raw)
  To: John David Anglin
  Cc: Kyle McMartin, Michael Kerrisk (man-pages), linux-man,
	Kyle McMartin, Helge Deller, James E.J. Bottomley, linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 2707 bytes --]

On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> >>> just to be clear, the only insn you need is:
> >>> 	ble 0x100(%sr2, %r0);
> >>> 
> >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> >>> 0x100 is a
> >>> known offset into that).  the docs don't mention r0 that i can see,
> >>> so i'm
> >>> guessing it's one of those "always 0" registers ?
> >> 
> >> Yes.  There is also an entry at offset 0xb0 for light-weight-
> >> syscalls.  Currently,
> >> this implements an atomic CAS operation used for pthread support.
> > 
> > interesting.  sounds like a poor man's vDSO.  i'll document this the
> > new
> > vdso(7) man page.
> 
> Not exactly, the code runs on the gateway page which is in kernel space.
> The main reason for doing the operation in kernel space is to prevent
> processes from being preempted while executing in the lock region.  In
> general,
> parisc processes are not preempted on the gateway page.  There are
> some subtleties regarding fault handling.

sure ... the Blackfin arch does a similar thing for providing fast atomic 
primitives to userspace since the ISA can't.

what do you think of this section for vdso(7) ?  i might have to split the 
"real" vdso arches from these others since there's a couple now (arm, bfin, 
parisc), and i think there might be more down the line (microblaze).

.SS parisc (hppa) functions
.\" See linux/arch/parisc/kernel/syscall.S
.\" See linux/Documentation/parisc/registers
The parisc port has a code page full of utility functions.
Rather than use the normal ELF aux vector approach, it passes the address of
the page to the process via the SR2 register.
This is done to match the way HP-UX works.

Since it's just a raw page of code, there is no ELF information for doing
symbol lookups or versioning.
Simply call into the appropriate offset via the branch instruction, e.g.:
.br
ble <offset>(%sr2, %r0)
.if t \{\
.ft CW
\}
.TS
l l.
offset	function
_
00b0	lws_entry
00e0	set_thread_pointer
0100	linux_gateway_entry (syscall)
0268	syscall_nosys
0274	tracesys
0324	tracesys_next
0368	tracesys_exit
03a0	tracesys_sigexit
03b8	lws_start
03dc	lws_exit_nosys
03e0	lws_exit
03e4	lws_compare_and_swap64
03e8	lws_compare_and_swap
0404	cas_wouldblock
0410	cas_action
.TE
.if t \{\
.in
.ft P
\}

> There is support in glibc and libgcc for these calls.  The libgcc
> implementation
> in linux-atomic.c is very similar to that on arm.

interesting.  another arch to add :).
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  3:38                       ` Mike Frysinger
@ 2013-04-12  4:45                         ` James Bottomley
  2013-04-12 12:17                           ` John David Anglin
  2013-04-12 18:45                           ` Mike Frysinger
  0 siblings, 2 replies; 19+ messages in thread
From: James Bottomley @ 2013-04-12  4:45 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> > On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> > >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > >>> just to be clear, the only insn you need is:
> > >>>   ble 0x100(%sr2, %r0);
> > >>> 
> > >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> > >>> 0x100 is a
> > >>> known offset into that).  the docs don't mention r0 that i can see,
> > >>> so i'm
> > >>> guessing it's one of those "always 0" registers ?
> > >> 
> > >> Yes.  There is also an entry at offset 0xb0 for light-weight-
> > >> syscalls.  Currently,
> > >> this implements an atomic CAS operation used for pthread support.
> > > 
> > > interesting.  sounds like a poor man's vDSO.  i'll document this the
> > > new
> > > vdso(7) man page.
> > 
> > Not exactly, the code runs on the gateway page which is in kernel space.
> > The main reason for doing the operation in kernel space is to prevent
> > processes from being preempted while executing in the lock region.  In
> > general,
> > parisc processes are not preempted on the gateway page.  There are
> > some subtleties regarding fault handling.
> 
> sure ... the Blackfin arch does a similar thing for providing fast atomic 
> primitives to userspace since the ISA can't.
> 
> what do you think of this section for vdso(7) ?  i might have to split the 
> "real" vdso arches from these others since there's a couple now (arm, bfin, 
> parisc), and i think there might be more down the line (microblaze).

I've got to say, I really don't think this can be classified as a vdso.
For a vdso, the kernel exports an ELF object that can be linked
dynamically into any elf binary requiring it.  The ELF section
information provides full details and so vdso entries can be called by
symbol.

In the parisc gateway page implementation, we have a set of "hidden"
primitives which the executable must know how to call (no self
description like a vdso).  This mechanism is identical to the original
intent of the x86 int <n> instruction (an instruction that traps into
the kernel and performs some primitive action but to use it, you have to
know which function corresponds to which value of <n>).

James


> .SS parisc (hppa) functions
> .\" See linux/arch/parisc/kernel/syscall.S
> .\" See linux/Documentation/parisc/registers
> The parisc port has a code page full of utility functions.
> Rather than use the normal ELF aux vector approach, it passes the address of
> the page to the process via the SR2 register.
> This is done to match the way HP-UX works.
> 
> Since it's just a raw page of code, there is no ELF information for doing
> symbol lookups or versioning.
> Simply call into the appropriate offset via the branch instruction, e.g.:
> .br
> ble <offset>(%sr2, %r0)
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> offset  function
> _
> 00b0    lws_entry
> 00e0    set_thread_pointer
> 0100    linux_gateway_entry (syscall)
> 0268    syscall_nosys
> 0274    tracesys
> 0324    tracesys_next
> 0368    tracesys_exit
> 03a0    tracesys_sigexit
> 03b8    lws_start
> 03dc    lws_exit_nosys
> 03e0    lws_exit
> 03e4    lws_compare_and_swap64
> 03e8    lws_compare_and_swap
> 0404    cas_wouldblock
> 0410    cas_action
> .TE
> .if t \{\
> .in
> .ft P
> \}
> 
> > There is support in glibc and libgcc for these calls.  The libgcc
> > implementation
> > in linux-atomic.c is very similar to that on arm.
> 
> interesting.  another arch to add :).
> -mike



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  4:45                         ` James Bottomley
@ 2013-04-12 12:17                           ` John David Anglin
  2013-04-12 18:45                           ` Mike Frysinger
  1 sibling, 0 replies; 19+ messages in thread
From: John David Anglin @ 2013-04-12 12:17 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mike Frysinger, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On 12-Apr-13, at 12:45 AM, James Bottomley wrote:

> On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
>> On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
>>> On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
>>>> On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
>>>>> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
>>>>>> just to be clear, the only insn you need is:
>>>>>>  ble 0x100(%sr2, %r0);
>>>>>>
>>>>>> the kernel docs say sr2 holds the kernel gateway page (so i guess
>>>>>> 0x100 is a
>>>>>> known offset into that).  the docs don't mention r0 that i can  
>>>>>> see,
>>>>>> so i'm
>>>>>> guessing it's one of those "always 0" registers ?
>>>>>
>>>>> Yes.  There is also an entry at offset 0xb0 for light-weight-
>>>>> syscalls.  Currently,
>>>>> this implements an atomic CAS operation used for pthread support.
>>>>
>>>> interesting.  sounds like a poor man's vDSO.  i'll document this  
>>>> the
>>>> new
>>>> vdso(7) man page.
>>>
>>> Not exactly, the code runs on the gateway page which is in kernel  
>>> space.
>>> The main reason for doing the operation in kernel space is to  
>>> prevent
>>> processes from being preempted while executing in the lock  
>>> region.  In
>>> general,
>>> parisc processes are not preempted on the gateway page.  There are
>>> some subtleties regarding fault handling.
>>
>> sure ... the Blackfin arch does a similar thing for providing fast  
>> atomic
>> primitives to userspace since the ISA can't.
>>
>> what do you think of this section for vdso(7) ?  i might have to  
>> split the
>> "real" vdso arches from these others since there's a couple now  
>> (arm, bfin,
>> parisc), and i think there might be more down the line (microblaze).
>
> I've got to say, I really don't think this can be classified as a  
> vdso.
> For a vdso, the kernel exports an ELF object that can be linked
> dynamically into any elf binary requiring it.  The ELF section
> information provides full details and so vdso entries can be called by
> symbol.
>
> In the parisc gateway page implementation, we have a set of "hidden"
> primitives which the executable must know how to call (no self
> description like a vdso).  This mechanism is identical to the original
> intent of the x86 int <n> instruction (an instruction that traps into
> the kernel and performs some primitive action but to use it, you  
> have to
> know which function corresponds to which value of <n>).

I agree with James.  There is no ELF object exported to userspace.  The
content of the gateway page is hidden.  The data structures used for
the locks are in the kernel itself.  Access is via a special branch  
instruction
rather than a break/trap instruction.

>
> James
>
>
>> .SS parisc (hppa) functions
>> .\" See linux/arch/parisc/kernel/syscall.S
>> .\" See linux/Documentation/parisc/registers
>> The parisc port has a code page full of utility functions.
>> Rather than use the normal ELF aux vector approach, it passes the  
>> address of
>> the page to the process via the SR2 register.
>> This is done to match the way HP-UX works.
>>
>> Since it's just a raw page of code, there is no ELF information for  
>> doing
>> symbol lookups or versioning.
>> Simply call into the appropriate offset via the branch instruction,  
>> e.g.:
>> .br
>> ble <offset>(%sr2, %r0)
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> offset  function
>> _
>> 00b0    lws_entry
>> 00e0    set_thread_pointer
>> 0100    linux_gateway_entry (syscall)
>> 0268    syscall_nosys
>> 0274    tracesys
>> 0324    tracesys_next
>> 0368    tracesys_exit
>> 03a0    tracesys_sigexit
>> 03b8    lws_start
>> 03dc    lws_exit_nosys
>> 03e0    lws_exit
>> 03e4    lws_compare_and_swap64
>> 03e8    lws_compare_and_swap
>> 0404    cas_wouldblock
>> 0410    cas_action
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>>
>>> There is support in glibc and libgcc for these calls.  The libgcc
>>> implementation
>>> in linux-atomic.c is very similar to that on arm.
>>
>> interesting.  another arch to add :).
>> -mike
>
>
>

--
John David Anglin	dave.anglin@bell.net




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  1:55                 ` Mike Frysinger
       [not found]                   ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-12 14:01                   ` Kyle McMartin
  1 sibling, 0 replies; 19+ messages in thread
From: Kyle McMartin @ 2013-04-12 14:01 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: John David Anglin, Michael Kerrisk (man-pages), linux-man,
	Kyle McMartin, Helge Deller, James E.J. Bottomley, linux-parisc

On Thu, Apr 11, 2013 at 09:55:43PM -0400, Mike Frysinger wrote:
> interesting.  sounds like a poor man's vDSO.  i'll document this the new 
> vdso(7) man page.
> -mike

fwiw ia64 does basically the same thing for a subset of syscalls
(fsys.c)

--Kyle

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12  4:45                         ` James Bottomley
  2013-04-12 12:17                           ` John David Anglin
@ 2013-04-12 18:45                           ` Mike Frysinger
  2013-04-12 19:14                             ` James Bottomley
  1 sibling, 1 reply; 19+ messages in thread
From: Mike Frysinger @ 2013-04-12 18:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 3533 bytes --]

On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > On Thursday 11 April 2013 22:34:43 John David Anglin wrote:
> > > On 11-Apr-13, at 9:55 PM, Mike Frysinger wrote:
> > > > On Sunday 07 April 2013 14:48:42 John David Anglin wrote:
> > > >> On 7-Apr-13, at 2:39 PM, Mike Frysinger wrote:
> > > >>> just to be clear, the only insn you need is:
> > > >>>   ble 0x100(%sr2, %r0);
> > > >>> 
> > > >>> the kernel docs say sr2 holds the kernel gateway page (so i guess
> > > >>> 0x100 is a
> > > >>> known offset into that).  the docs don't mention r0 that i can see,
> > > >>> so i'm
> > > >>> guessing it's one of those "always 0" registers ?
> > > >> 
> > > >> Yes.  There is also an entry at offset 0xb0 for light-weight-
> > > >> syscalls.  Currently,
> > > >> this implements an atomic CAS operation used for pthread support.
> > > > 
> > > > interesting.  sounds like a poor man's vDSO.  i'll document this the
> > > > new
> > > > vdso(7) man page.
> > > 
> > > Not exactly, the code runs on the gateway page which is in kernel
> > > space. The main reason for doing the operation in kernel space is to
> > > prevent processes from being preempted while executing in the lock
> > > region.  In general,
> > > parisc processes are not preempted on the gateway page.  There are
> > > some subtleties regarding fault handling.
> > 
> > sure ... the Blackfin arch does a similar thing for providing fast atomic
> > primitives to userspace since the ISA can't.
> > 
> > what do you think of this section for vdso(7) ?  i might have to split
> > the "real" vdso arches from these others since there's a couple now
> > (arm, bfin, parisc), and i think there might be more down the line
> > (microblaze).
> 
> I've got to say, I really don't think this can be classified as a vdso.
> For a vdso, the kernel exports an ELF object that can be linked
> dynamically into any elf binary requiring it.  The ELF section
> information provides full details and so vdso entries can be called by
> symbol.

strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the 
acronym is literally "virtual dynamic shared object").  however, i see the 
vdso as being a bit more of a flexible concept -- it's a place of shared code 
that the kernel manages and exports for all userspace processes.  
fundamentally, the point of the vDSO is to provide services to greatly speed 
up userspace.  in that regard, these mapped pages are exactly like vDSOs.

thus i think it's appropriate to document these "fixed code" regions that many 
arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the same man 
page as the vdso.  especially since (currently) arches do one or the other, 
but not both.

> In the parisc gateway page implementation, we have a set of "hidden"
> primitives which the executable must know how to call (no self
> description like a vdso).  This mechanism is identical to the original
> intent of the x86 int <n> instruction (an instruction that traps into
> the kernel and performs some primitive action but to use it, you have to
> know which function corresponds to which value of <n>).

would it be useful to document all of them ?  or just the ones that userspace 
actively uses (like syscall/cas) ?  or should all of this be recorded in the 
kernel's Documentation/parisc/ subdir and just have the man page refer people 
there (like it does for ARM & Blackfin currently) ?
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12 18:45                           ` Mike Frysinger
@ 2013-04-12 19:14                             ` James Bottomley
  2013-04-12 19:46                               ` Mike Frysinger
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2013-04-12 19:14 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > what do you think of this section for vdso(7) ?  i might have to split
> > > the "real" vdso arches from these others since there's a couple now
> > > (arm, bfin, parisc), and i think there might be more down the line
> > > (microblaze).
> > 
> > I've got to say, I really don't think this can be classified as a vdso.
> > For a vdso, the kernel exports an ELF object that can be linked
> > dynamically into any elf binary requiring it.  The ELF section
> > information provides full details and so vdso entries can be called by
> > symbol.
> 
> strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the 
> acronym is literally "virtual dynamic shared object").  however, i see the 
> vdso as being a bit more of a flexible concept -- it's a place of shared code 
> that the kernel manages and exports for all userspace processes.  
> fundamentally, the point of the vDSO is to provide services to greatly speed 
> up userspace.  in that regard, these mapped pages are exactly like vDSOs.

I don't entirely understand this classification.  If the kernel<->user
gateway becomes classified as a vdso, that covers our syscall interface
on every archtecture.  There's now no distinction between a vdso (which
may not even move to kernel mode) and a syscall.

I think the difference is that a syscall is a specific call to a known
kernel routine by number and it involves a transition to kernel mode.  A
vdso is an exported link object containing certain functions which may
or may not cause a trap to kernel mode when executed.  The distinction
is how you do the call.  For syscalls, you have to know the number and
the arguments.  For vdso you just have to know the symbol (and
obviously, the prototype for C code) and the kernel supplies the
implementation direct to the userspace binary.

> thus i think it's appropriate to document these "fixed code" regions that many 
> arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the same man 
> page as the vdso.  especially since (currently) arches do one or the other, 
> but not both.

I really see these as a type of lightweight syscall.  You use the
syscall prototype (call by number with known arguments) but the call may
not necessarily transition to kernel mode proper to handle the function.

> > In the parisc gateway page implementation, we have a set of "hidden"
> > primitives which the executable must know how to call (no self
> > description like a vdso).  This mechanism is identical to the original
> > intent of the x86 int <n> instruction (an instruction that traps into
> > the kernel and performs some primitive action but to use it, you have to
> > know which function corresponds to which value of <n>).
> 
> would it be useful to document all of them ?  or just the ones that userspace 
> actively uses (like syscall/cas) ?  or should all of this be recorded in the 
> kernel's Documentation/parisc/ subdir and just have the man page refer people 
> there (like it does for ARM & Blackfin currently) ?

I'm not sure.  For x86 they're in include/asm/traps.h.  I think the only
ones we really use are int3 for breakpoint, int4 for overflow and int80
for legacy syscall.

James




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12 19:14                             ` James Bottomley
@ 2013-04-12 19:46                               ` Mike Frysinger
  2013-04-12 20:25                                 ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Frysinger @ 2013-04-12 19:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

[-- Attachment #1: Type: Text/Plain, Size: 5259 bytes --]

On Friday 12 April 2013 15:14:47 James Bottomley wrote:
> On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> > On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > > what do you think of this section for vdso(7) ?  i might have to
> > > > split the "real" vdso arches from these others since there's a
> > > > couple now (arm, bfin, parisc), and i think there might be more down
> > > > the line (microblaze).
> > > 
> > > I've got to say, I really don't think this can be classified as a vdso.
> > > For a vdso, the kernel exports an ELF object that can be linked
> > > dynamically into any elf binary requiring it.  The ELF section
> > > information provides full details and so vdso entries can be called by
> > > symbol.
> > 
> > strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the
> > acronym is literally "virtual dynamic shared object").  however, i see
> > the vdso as being a bit more of a flexible concept -- it's a place of
> > shared code that the kernel manages and exports for all userspace
> > processes. fundamentally, the point of the vDSO is to provide services
> > to greatly speed up userspace.  in that regard, these mapped pages are
> > exactly like vDSOs.
> 
> I don't entirely understand this classification.  If the kernel<->user
> gateway becomes classified as a vdso, that covers our syscall interface
> on every archtecture.  There's now no distinction between a vdso (which
> may not even move to kernel mode) and a syscall.
> 
> I think the difference is that a syscall is a specific call to a known
> kernel routine by number and it involves a transition to kernel mode.  A
> vdso is an exported link object containing certain functions which may
> or may not cause a trap to kernel mode when executed.  The distinction
> is how you do the call.  For syscalls, you have to know the number and
> the arguments.  For vdso you just have to know the symbol (and
> obviously, the prototype for C code) and the kernel supplies the
> implementation direct to the userspace binary.

i'm not fully versed in the parisc linux gateway page or how the architecture 
is handling things, so i could be completely off here.  from reading the source 
code, it *looked* like it was just a page of utility funcs that userspace 
branches to without changing privilege modes or going through the full syscall 
routines.

so i'm saying the gateway page itself can be thought of in the same vein as a 
vDSO.  it's a black box with entry points that provide light weight services 
to userspace.  sometimes it ends up triggering a full syscall, sometimes it 
doesn't (just like a vDSO).

> > thus i think it's appropriate to document these "fixed code" regions that
> > many arches export (ARM, Blackfin, Itanium, Microblaze, PA-RISC) in the
> > same man page as the vdso.  especially since (currently) arches do one
> > or the other, but not both.
> 
> I really see these as a type of lightweight syscall.  You use the
> syscall prototype (call by number with known arguments) but the call may
> not necessarily transition to kernel mode proper to handle the function.

if you think of the vdso in a very strict light (it's exactly an ELF that the 
kernel automatically maps into every process's address space), then i guess 
you can only classify these as lightweight syscalls (where the address/offset 
is the "syscall #").

i see vdso as being a more flexible concept than that -- if it's code mapped 
into a process's address space and provides useful lightweight services that 
are meant to be used specifically in lieu of syscall(), then it's vdso-like and 
should be in the vdso(7) man page.  it has a lot more in common imo with a 
vdso than it does with an actual syscall.  i certainly think vdso(7) is more 
appropriate for these regions than syscall(2) or syscalls(2).

> > > In the parisc gateway page implementation, we have a set of "hidden"
> > > primitives which the executable must know how to call (no self
> > > description like a vdso).  This mechanism is identical to the original
> > > intent of the x86 int <n> instruction (an instruction that traps into
> > > the kernel and performs some primitive action but to use it, you have
> > > to know which function corresponds to which value of <n>).
> > 
> > would it be useful to document all of them ?  or just the ones that
> > userspace actively uses (like syscall/cas) ?  or should all of this be
> > recorded in the kernel's Documentation/parisc/ subdir and just have the
> > man page refer people there (like it does for ARM & Blackfin currently)
> > ?
> 
> I'm not sure.  For x86 they're in include/asm/traps.h.  I think the only
> ones we really use are int3 for breakpoint, int4 for overflow and int80
> for legacy syscall.

hmm, i wasn't even considering the other arch-specific services offered by e.g. 
software interrupts.  i don't think those belong in vdso(7) as they don't 
confer any of the lightweight advantages the vdso is designed to bring, but it 
might be useful to document these somewhere.  they're also not as common for 
people to encounter as a vdso ...
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] man2 : syscall.2 : document syscall calling conventions
  2013-04-12 19:46                               ` Mike Frysinger
@ 2013-04-12 20:25                                 ` James Bottomley
  0 siblings, 0 replies; 19+ messages in thread
From: James Bottomley @ 2013-04-12 20:25 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: John David Anglin, Kyle McMartin, Michael Kerrisk (man-pages),
	linux-man, Kyle McMartin, Helge Deller, James E.J. Bottomley,
	linux-parisc

On Fri, 2013-04-12 at 15:46 -0400, Mike Frysinger wrote:
> On Friday 12 April 2013 15:14:47 James Bottomley wrote:
> > On Fri, 2013-04-12 at 14:45 -0400, Mike Frysinger wrote:
> > > On Friday 12 April 2013 00:45:12 James Bottomley wrote:
> > > > On Thu, 2013-04-11 at 23:38 -0400, Mike Frysinger wrote:
> > > > > what do you think of this section for vdso(7) ?  i might have to
> > > > > split the "real" vdso arches from these others since there's a
> > > > > couple now (arm, bfin, parisc), and i think there might be more down
> > > > > the line (microblaze).
> > > > 
> > > > I've got to say, I really don't think this can be classified as a vdso.
> > > > For a vdso, the kernel exports an ELF object that can be linked
> > > > dynamically into any elf binary requiring it.  The ELF section
> > > > information provides full details and so vdso entries can be called by
> > > > symbol.
> > > 
> > > strictly speaking, sure, a vDSO is only a vDSO if it's an ELF (since the
> > > acronym is literally "virtual dynamic shared object").  however, i see
> > > the vdso as being a bit more of a flexible concept -- it's a place of
> > > shared code that the kernel manages and exports for all userspace
> > > processes. fundamentally, the point of the vDSO is to provide services
> > > to greatly speed up userspace.  in that regard, these mapped pages are
> > > exactly like vDSOs.
> > 
> > I don't entirely understand this classification.  If the kernel<->user
> > gateway becomes classified as a vdso, that covers our syscall interface
> > on every archtecture.  There's now no distinction between a vdso (which
> > may not even move to kernel mode) and a syscall.
> > 
> > I think the difference is that a syscall is a specific call to a known
> > kernel routine by number and it involves a transition to kernel mode.  A
> > vdso is an exported link object containing certain functions which may
> > or may not cause a trap to kernel mode when executed.  The distinction
> > is how you do the call.  For syscalls, you have to know the number and
> > the arguments.  For vdso you just have to know the symbol (and
> > obviously, the prototype for C code) and the kernel supplies the
> > implementation direct to the userspace binary.
> 
> i'm not fully versed in the parisc linux gateway page or how the architecture 
> is handling things, so i could be completely off here.  from reading the source 
> code, it *looked* like it was just a page of utility funcs that userspace 
> branches to without changing privilege modes or going through the full syscall 
> routines.

Oh, if that's the misunderstanding, then the gateway page is "special".
It actually has PAGE_GATEWAY bits set (this is linux terminology; in
parisc terminology it's Execute, promote to PL0)in the page map.  So
anything executing on this page executes with kernel level privilege
(there's more to it than that: to have this happen, you also have to use
a branch with a ,gate completer to activate the privilege promotion).
The upshot is that everything that runs on the gateway page runs at
kernel privilege but with the current user process address space
(although you have access to kernel space via %sr2).  For the 0x100
syscall entry, we redo the space registers to point to the kernel
address space (preserving the user address space in %sr3), move to wide
mode if required, save the user registers and branch into the kernel
syscall entry point.  For all the other functions, we execute at kernel
privilege but don't flip address spaces.  The basic upshot of this is
that these code snippets are executed atomically (because the kernel
can't be pre-empted) and they may perform architecturally forbidden (to
PL3) operations (like setting control registers).

James




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-04-12 20:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1364361092-5948-1-git-send-email-ch0.han@lge.com>
     [not found] ` <201304010632.41520.vapier@gentoo.org>
     [not found]   ` <CAKgNAkgG2kdCC1tyZQkYU7O_nP7RB8VoCmx6eb8FcudU1s6RgA@mail.gmail.com>
     [not found]     ` <201304021917.17659.vapier@gentoo.org>
2013-04-07 10:00       ` [PATCH] man2 : syscall.2 : document syscall calling conventions Michael Kerrisk (man-pages)
2013-04-07 13:55         ` Kyle McMartin
2013-04-07 14:56           ` James Bottomley
2013-04-07 15:11             ` Kyle McMartin
     [not found]               ` <20130407151134.GX12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2013-04-07 15:38                 ` James Bottomley
2013-04-08  9:18                 ` Michael Kerrisk (man-pages)
     [not found]           ` <20130407135514.GW12938-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2013-04-07 18:39             ` Mike Frysinger
2013-04-07 18:48               ` John David Anglin
     [not found]                 ` <BLU0-SMTP986B123D17DB8B88214F797C40-MsuGFMq8XAE@public.gmane.org>
2013-04-08  9:20                   ` Michael Kerrisk (man-pages)
2013-04-12  1:55                 ` Mike Frysinger
     [not found]                   ` <201304112155.46349.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-12  2:34                     ` John David Anglin
2013-04-12  3:38                       ` Mike Frysinger
2013-04-12  4:45                         ` James Bottomley
2013-04-12 12:17                           ` John David Anglin
2013-04-12 18:45                           ` Mike Frysinger
2013-04-12 19:14                             ` James Bottomley
2013-04-12 19:46                               ` Mike Frysinger
2013-04-12 20:25                                 ` James Bottomley
2013-04-12 14:01                   ` Kyle McMartin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox