linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* vdso(7): new man page
@ 2013-04-10  3:17 Mike Frysinger
       [not found] ` <201304092317.01590.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2013-04-10  3:17 UTC (permalink / raw)
  To: linux-man-u79uwXL29TY76Z2rM5mHXA; +Cc: Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 10789 bytes --]

so i've slapped this together.  i think the HISTORY section could use more
filling out, but that'd probably be more for funsies rather than usefulness.

might also be helpful if someone could do fact checking on what i've written
for the HISTORY.  this predates my involvement in the linux world, so i'm just
making educated guesses.
-mike

.\" Written by Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
.\"
.\" %%%LICENSE_START(PUBLIC_DOMAIN)
.\" This page is in the public domain.  Suck it.
.\" %%%LICENSE_END
.\"
.TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
.SH NAME
vDSO \- overview of the virtual ELF dynamic shared object
.SH SYNOPSIS
.B #include <sys/auxv.h>

.B void *vdso = (uintptr_t)getauxval(AT_SYSINFO_EHDR);
.SH DESCRIPTION
The "vDSO" is a small shared library that the kernel automatically maps into the
address space of all userspace applications.
Applications themselves usually need not concern themselves with this as it is
most commonly called by the C library.
This way you can write using standard functions and the C library will take care
of using any available functionality.

Why does this object exist at all?
There are some facilities the kernel provides that userspace ends up using
frequently to the point that such calls can dominate overall performance.
This is due both to the frequency of the call as well as the context overhead
from exiting userspace and entering the kernel.

The rest of this documentation is geared towards the curious and/or C library
writers rather than general developers.
If you're trying to call the vDSO in your own application rather than using
the C library, you're most likely doing it wrong.
.SS Example Background
Making syscalls themselves can incur significant penalty.
In x86 systems, you can trigger a software interrupt (int $0x80) to tell the
kernel you wish to make a syscall.
Internally, this means the call has to go through the normal interrupt layers
to save/restore context before it even gets a chance to start processing the
system call.
Wouldn't it be nicer if you could start processing the system call immediately?
With newer revisions of the x86 architecture, there is now a syscall/sysenter
instruction that does exactly that (jumps directly to the system call entry).
However, rather than require the C library to figure out if this functionality
is available itself, the vDSO includes symbols that can be used.
This is typically referred to using the term "vsyscall".

Another frequent system call is gettimeofday().
This is called both directly by userspace applications as well as indirectly by
the C library.
Think timestamps or timing loops or polling -- all of these frequently need to
know what time it is right now.
This information is also not secret -- any application in any privilege mode
(root or any user) will get the same answer.
Thus the kernel arranges for the information required to answer this question
to be placed in memory the process can access.
Now a call to gettimeofday() changes from a syscall to a normal function call
and a few memory accesses.
.SS Finding The vDSO
The base address of the vDSO (if one exists) is passed by the kernel to each
program in the initial auxiliary vector.
Specifically, via the
.B AT_SYSINFO_EHDR
tag.

For some architectures, there is also a
.B AT_SYSINFO
tag.
This is used only for locating the vsyscall entry point and is frequently
disabled or set to 0 (meaning it's not available).
It is a throw back to the initial vDSO work (see
.IR HISTORY
below for more details) and should be avoided.

See
.BR getauxval (3)
for more details on accessing these fields.
.SS File Format
Since the vDSO is a fully formed ELF, you can do symbol lookups on it.
This allows new symbols to be added with newer kernel releases, and for the
C library to detect available functionality at runtime when running under
different kernel versions.
Often times the C library will do detection with the first call and then
cache the result for subsequent calls.

All symbols are also versioned (using the GNU version format).
This allows the kernel (in the very unlikely situation) to update the function
signature without breaking backwards compatibility.
This means changing the arguments that it accepts as well as the return value.
When looking up a symbol in the vDSO, you must always include the version you
are writing against.

Typically the vDSO follows the naming convention of prefixing all symbols with
"__vdso_" or "__kernel_" so as to distinguish from other standard symbols.
e.g. The "gettimeofday" function is named "__vdso_gettimeofday".
.SH NOTES
.SS Source
When you compile the kernel, it will automatically compile and link the vDSO
code for you.
You will frequently find it under the arch specific dir:
find arch/$ARCH/ -name '*vdso*.so'

Note that the vDSO that is used is based on the ABI of your userspace code
and not the ABI of the kernel.
i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an
x86_64 64bit kernel, you'll get the same vDSO.
So when referring to sections below, use the userspace ABI.
.SS vDSO Names
The name of this shared object varies across architectures.
It will often show up in things like glibc's `ldd` output.
The exact name should not matter to any code, so please do not hardcode it.
.if t \{\
.ft CW
\}
.TS
l l.
arch	vdso name
_
aarch64	linux-vdso.so.1
ia64	linux-gate.so.1
ppc/32	linux-vdso32.so.1
ppc/64	linux-vdso64.so.1
s390	linux-vdso32.so.1
s390x	linux-vdso64.so.1
sh	linux-gate.so.1
i386	linux-gate.so.1
x86_64	linux-vdso.so.1
x86/x32	linux-vdso.so.1
.TE
.if t \{\
.in
.ft P
\}
.SS aarch64 functions
.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_rt_sigreturn	LINUX_2.6.39
__kernel_gettimeofday	LINUX_2.6.39
__kernel_clock_gettime	LINUX_2.6.39
__kernel_clock_getres	LINUX_2.6.39
.TE
.if t \{\
.in
.ft P
\}
.SS bfin (Blackfin) functions
.\" See linux/arch/blackfin/kernel/fixed_code.S
.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
As this cpu lacks a MMU, it doesn't setup a vDSO in the normal sense.
Instead, it maps at boot time a few raw functions into a fixed location in
memory.
Userspace apps then call directly into that.
There is no provision for backwards compatibility beyond sniffing raw opcodes,
but as this is an embedded CPU, it can get away with things -- some of the
object formats it runs aren't even ELF based (they're bFLT/FLAT).

For documentation on this format, it's better you refer to the public docs:
.br
http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
.SS ia64 (Itanium) functions
.\" See linux/arch/ia64/kernel/gate.lds.S
.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_sigtramp	LINUX_2.5
__kernel_syscall_via_break	LINUX_2.5
__kernel_syscall_via_epc	LINUX_2.5
.TE
.if t \{\
.in
.ft P
\}

The Itanium port actually likes to get tricky.
In addition to the vDSO above, it also has "light-weight system calls" aka
"fast syscalls" aka "fsys".
You can invoke these via the __kernel_syscall_via_epc vDSO helper.
The system calls listed here have the same semantics as if you called them
directly via
.BR syscall (3),
so refer to the relevant
documentation for each.
The table below lists the functions available via this mechanism.
.if t \{\
.ft CW
\}
.TS
l.
function
_
clock_gettime
getcpu
getpid
getppid
gettimeofday
set_tid_address
.TE
.if t \{\
.in
.ft P
\}
.SS ppc/32 functions
.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
The functions marked with a
.I *
below are only available when the kernel is
a powerpc64 (64bit) kernel.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.15
__kernel_clock_gettime	LINUX_2.6.15
__kernel_datapage_offset	LINUX_2.6.15
__kernel_get_syscall_map	LINUX_2.6.15
__kernel_get_tbfreq	LINUX_2.6.15
__kernel_getcpu \fI*\fR	LINUX_2.6.15
__kernel_gettimeofday	LINUX_2.6.15
__kernel_sigtramp_rt32	LINUX_2.6.15
__kernel_sigtramp32	LINUX_2.6.15
__kernel_sync_dicache	LINUX_2.6.15
__kernel_sync_dicache_p5	LINUX_2.6.15
.TE
.if t \{\
.in
.ft P
\}
.SS ppc/64 functions
.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.15
__kernel_clock_gettime	LINUX_2.6.15
__kernel_datapage_offset	LINUX_2.6.15
__kernel_get_syscall_map	LINUX_2.6.15
__kernel_get_tbfreq	LINUX_2.6.15
__kernel_getcpu	LINUX_2.6.15
__kernel_gettimeofday	LINUX_2.6.15
__kernel_sigtramp_rt64	LINUX_2.6.15
__kernel_sync_dicache	LINUX_2.6.15
__kernel_sync_dicache_p5	LINUX_2.6.15
.TE
.if t \{\
.in
.ft P
\}
.SS s390 functions
.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.29
__kernel_clock_gettime	LINUX_2.6.29
__kernel_gettimeofday	LINUX_2.6.29
.TE
.if t \{\
.in
.ft P
\}
.SS s390x functions
.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.29
__kernel_clock_gettime	LINUX_2.6.29
__kernel_gettimeofday	LINUX_2.6.29
.TE
.if t \{\
.in
.ft P
\}
.SS sh (SuperH) functions
.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_rt_sigreturn	LINUX_2.6
__kernel_sigreturn	LINUX_2.6
__kernel_vsyscall	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SS i386 functions
.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_sigreturn	LINUX_2.5
__kernel_rt_sigreturn	LINUX_2.5
__kernel_vsyscall	LINUX_2.5
.TE
.if t \{\
.in
.ft P
\}
.SS x86_64 functions
.\" See linux/arch/x86/vdso/vdso.lds.S
Each of these symbols are also available without the "__vdso_" prefix, but
you should ignore those and stick to the names below.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__vdso_clock_gettime	LINUX_2.6
__vdso_getcpu	LINUX_2.6
__vdso_gettimeofday	LINUX_2.6
__vdso_time	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SS x86/x32 functions
.\" See linux/arch/x86/vdso/vdso32.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__vdso_clock_gettime	LINUX_2.6
__vdso_getcpu	LINUX_2.6
__vdso_gettimeofday	LINUX_2.6
__vdso_time	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SH HISTORY
The vDSO was originally just a single function -- the vsyscall.
In older kernels, you might see that in a process's memory map rather than vdso.
Overtime, people realized that this was a great way to pass more functionality
to userspace, so it was reconceived as a vDSO in the current format.
.SH SEE ALSO
.BR syscalls (2),
.BR getauxval (3),
.BR proc (5)

The docs/examples/sources in the Linux sources:
.nf
Documentation/ABI/stable/vdso
linux/Documentation/ia64/fsys.txt
Documentation/vDSO/*
find arch/ -iname '*vdso*' -o -iname '*gate*'
.fi

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vdso(7): new man page
       [not found] ` <201304092317.01590.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-04-11 18:31   ` Andy Lutomirski
       [not found]     ` <CALCETrXwfpH=dRZ82MqjWWL0oFohigcUHgLPnRPpnisOHYxKQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-04-12  1:28   ` Mike Frysinger
  2013-12-31  7:41   ` [PATCH v3] " Mike Frysinger
  2 siblings, 1 reply; 15+ messages in thread
From: Andy Lutomirski @ 2013-04-11 18:31 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA

On Tue, Apr 9, 2013 at 8:17 PM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> so i've slapped this together.  i think the HISTORY section could use more
> filling out, but that'd probably be more for funsies rather than usefulness.
>

This looks good to me.  It may be worth mentioning that programs must
not assume anything about where the vDSO is mapped -- that may change
from kernel to kernel or exec to exec, unless the particular
architecture makes some guarantee.


> .SS Example Background
> Making syscalls themselves can incur significant penalty.
> In x86 systems, you can trigger a software interrupt (int $0x80) to tell the
> kernel you wish to make a syscall.
> Internally, this means the call has to go through the normal interrupt layers
> to save/restore context before it even gets a chance to start processing the
> system call.
> Wouldn't it be nicer if you could start processing the system call immediately?
> With newer revisions of the x86 architecture, there is now a syscall/sysenter
> instruction that does exactly that (jumps directly to the system call entry).
> However, rather than require the C library to figure out if this functionality
> is available itself, the vDSO includes symbols that can be used.
> This is typically referred to using the term "vsyscall".

I find this a bit confusing.  The term "vsyscall" means something
different on x86_64, and syscall and sysenter are different
instructions.  How about this:

Making syscalls can be slow.
In x86 32-bit systems, you can trigger a software interrupt (int
$0x80) to tell the
kernel you wish to make a syscall.  This instruction is expensive: it
goes through the full interrupt handling paths in microcode and in the
kernel.  Newer processors have faster, incompatible ways to issue
system calls, and the kernel abstracts these through an entry point in
the vdso.  (The terminology can be confusing.  On x86_32, this
function is called __kernel_vsyscall, but on x86_64, the term
"vsyscall" refers to an obsolete way to ask the kernel what time it is
or what cpu the caller is on.)

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vdso(7): new man page
       [not found]     ` <CALCETrXwfpH=dRZ82MqjWWL0oFohigcUHgLPnRPpnisOHYxKQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-04-12  1:28       ` Mike Frysinger
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Frysinger @ 2013-04-12  1:28 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: Text/Plain, Size: 2436 bytes --]

On Thursday 11 April 2013 14:31:55 Andy Lutomirski wrote:
> On Tue, Apr 9, 2013 at 8:17 PM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> > so i've slapped this together.  i think the HISTORY section could use
> > more filling out, but that'd probably be more for funsies rather than
> > usefulness.
> 
> This looks good to me.  It may be worth mentioning that programs must
> not assume anything about where the vDSO is mapped -- that may change
> from kernel to kernel or exec to exec, unless the particular
> architecture makes some guarantee.

good idea

> > .SS Example Background
> > Making syscalls themselves can incur significant penalty.
> > In x86 systems, you can trigger a software interrupt (int $0x80) to tell
> > the kernel you wish to make a syscall.
> > Internally, this means the call has to go through the normal interrupt
> > layers to save/restore context before it even gets a chance to start
> > processing the system call.
> > Wouldn't it be nicer if you could start processing the system call
> > immediately? With newer revisions of the x86 architecture, there is now
> > a syscall/sysenter instruction that does exactly that (jumps directly to
> > the system call entry). However, rather than require the C library to
> > figure out if this functionality is available itself, the vDSO includes
> > symbols that can be used. This is typically referred to using the term
> > "vsyscall".
> 
> I find this a bit confusing.  The term "vsyscall" means something
> different on x86_64, and syscall and sysenter are different
> instructions.  How about this:
> 
> Making syscalls can be slow.
> In x86 32-bit systems, you can trigger a software interrupt (int
> $0x80) to tell the
> kernel you wish to make a syscall.  This instruction is expensive: it
> goes through the full interrupt handling paths in microcode and in the
> kernel.  Newer processors have faster, incompatible ways to issue
> system calls, and the kernel abstracts these through an entry point in
> the vdso.  (The terminology can be confusing.  On x86_32, this
> function is called __kernel_vsyscall, but on x86_64, the term
> "vsyscall" refers to an obsolete way to ask the kernel what time it is
> or what cpu the caller is on.)

i've merged in your suggestions.  is the current (small) HISTORY section 
correct then ?  you can find it at the bottom of the page.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vdso(7): new man page
       [not found] ` <201304092317.01590.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  2013-04-11 18:31   ` Andy Lutomirski
@ 2013-04-12  1:28   ` Mike Frysinger
       [not found]     ` <201304112128.47633.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  2013-12-31  7:41   ` [PATCH v3] " Mike Frysinger
  2 siblings, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2013-04-12  1:28 UTC (permalink / raw)
  To: linux-man-u79uwXL29TY76Z2rM5mHXA; +Cc: Andy Lutomirski

[-- Attachment #1: Type: Text/Plain, Size: 10972 bytes --]

here's v2 w/Andy's feedback
-mike

.\" Written by Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
.\"
.\" %%%LICENSE_START(PUBLIC_DOMAIN)
.\" This page is in the public domain.  Suck it.
.\" %%%LICENSE_END
.\"
.TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
.SH NAME
vDSO \- overview of the virtual ELF dynamic shared object
.SH SYNOPSIS
.B #include <sys/auxv.h>

.B void *vdso = (uintptr_t)getauxval(AT_SYSINFO_EHDR);
.SH DESCRIPTION
The "vDSO" is a small shared library that the kernel automatically maps into the
address space of all userspace applications.
Applications themselves usually need not concern themselves with this as it is
most commonly called by the C library.
This way you can write using standard functions and the C library will take care
of using any available functionality.

Why does this object exist at all?
There are some facilities the kernel provides that userspace ends up using
frequently to the point that such calls can dominate overall performance.
This is due both to the frequency of the call as well as the context overhead
from exiting userspace and entering the kernel.

The rest of this documentation is geared towards the curious and/or C library
writers rather than general developers.
If you're trying to call the vDSO in your own application rather than using
the C library, you're most likely doing it wrong.
.SS Example Background
Making syscalls can be slow.
In x86 32bit systems, you can trigger a software interrupt (int $0x80) to tell
the kernel you wish to make a syscall.
However, this instruction is expensive: it goes through the full interrupt
handling paths in the processor's microcode as well as in the kernel.
Newer processors have faster (but backwards incompatible) instructions to
initiate system calls.
Rather than require the C library to figure out if this functionality is
available at runtime itself, it can use functions provided by the kernel in
the vDSO.
Note that the terminology can be confusing.
On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64,
the term "vsyscall" also refers to an obsolete way to ask the kernel what time
it is or what cpu the caller is on.

Another frequent system call is gettimeofday().
This is called both directly by userspace applications as well as indirectly by
the C library.
Think timestamps or timing loops or polling -- all of these frequently need to
know what time it is right now.
This information is also not secret -- any application in any privilege mode
(root or any user) will get the same answer.
Thus the kernel arranges for the information required to answer this question
to be placed in memory the process can access.
Now a call to gettimeofday() changes from a syscall to a normal function call
and a few memory accesses.
.SS Finding The vDSO
The base address of the vDSO (if one exists) is passed by the kernel to each
program in the initial auxiliary vector.
Specifically, via the
.B AT_SYSINFO_EHDR
tag.

You must not assume the vDSO is mapped at any particular location in the
user's memory map.
The base address will usually be randomized at runtime every time a new is
processed (at
.BR execve (2)
time).
This is done for security reasons to prevent standard "return-to-libc" attacks.

For some architectures, there is also a
.B AT_SYSINFO
tag.
This is used only for locating the vsyscall entry point and is frequently
omitted or set to 0 (meaning it's not available).
It is a throw back to the initial vDSO work (see
.IR HISTORY
below) and should be avoided.

Refer to
.BR getauxval (3)
for more details on accessing these fields.
.SS File Format
Since the vDSO is a fully formed ELF, you can do symbol lookups on it.
This allows new symbols to be added with newer kernel releases, and for the
C library to detect available functionality at runtime when running under
different kernel versions.
Often times the C library will do detection with the first call and then
cache the result for subsequent calls.

All symbols are also versioned (using the GNU version format).
This allows the kernel (in the very unlikely situation) to update the function
signature without breaking backwards compatibility.
This means changing the arguments that it accepts as well as the return value.
When looking up a symbol in the vDSO, you must always include the version you
are writing against.

Typically the vDSO follows the naming convention of prefixing all symbols with
"__vdso_" or "__kernel_" so as to distinguish from other standard symbols.
e.g. The "gettimeofday" function is named "__vdso_gettimeofday".

You use the standard C calling conventions when calling any of these functions.
No need to worry about weird register or stack behavior.
.SH NOTES
.SS Source
When you compile the kernel, it will automatically compile and link the vDSO
code for you.
You will frequently find it under the arch specific dir:
.br
find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'

Note that the vDSO that is used is based on the ABI of your userspace code
and not the ABI of the kernel.
i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an
x86_64 64bit kernel, you'll get the same vDSO.
So when referring to sections below, use the userspace ABI.
.SS vDSO Names
The name of this shared object varies across architectures.
It will often show up in things like glibc's `ldd` output.
The exact name should not matter to any code, so please do not hardcode it.
.if t \{\
.ft CW
\}
.TS
l l.
user ABI	vDSO name
_
aarch64	linux-vdso.so.1
ia64	linux-gate.so.1
ppc/32	linux-vdso32.so.1
ppc/64	linux-vdso64.so.1
s390	linux-vdso32.so.1
s390x	linux-vdso64.so.1
sh	linux-gate.so.1
i386	linux-gate.so.1
x86_64	linux-vdso.so.1
x86/x32	linux-vdso.so.1
.TE
.if t \{\
.in
.ft P
\}
.SS aarch64 functions
.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_rt_sigreturn	LINUX_2.6.39
__kernel_gettimeofday	LINUX_2.6.39
__kernel_clock_gettime	LINUX_2.6.39
__kernel_clock_getres	LINUX_2.6.39
.TE
.if t \{\
.in
.ft P
\}
.SS bfin (Blackfin) functions
.\" See linux/arch/blackfin/kernel/fixed_code.S
.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
As this cpu lacks a MMU, it doesn't setup a vDSO in the normal sense.
Instead, it maps at boot time a few raw functions into a fixed location in
memory.
Userspace apps then call directly into that.
There is no provision for backwards compatibility beyond sniffing raw opcodes,
but as this is an embedded CPU, it can get away with things -- some of the
object formats it runs aren't even ELF based (they're bFLT/FLAT).

For documentation on this format, it's better you refer to the public docs:
.br
http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
.SS ia64 (Itanium) functions
.\" See linux/arch/ia64/kernel/gate.lds.S
.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_sigtramp	LINUX_2.5
__kernel_syscall_via_break	LINUX_2.5
__kernel_syscall_via_epc	LINUX_2.5
.TE
.if t \{\
.in
.ft P
\}

The Itanium port actually likes to get tricky.
In addition to the vDSO above, it also has "light-weight system calls" aka
"fast syscalls" aka "fsys".
You can invoke these via the __kernel_syscall_via_epc vDSO helper.
The system calls listed here have the same semantics as if you called them
directly via
.BR syscall (3),
so refer to the relevant
documentation for each.
The table below lists the functions available via this mechanism.
.if t \{\
.ft CW
\}
.TS
l.
function
_
clock_gettime
getcpu
getpid
getppid
gettimeofday
set_tid_address
.TE
.if t \{\
.in
.ft P
\}
.SS ppc/32 functions
.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
The functions marked with a
.I *
below are only available when the kernel is
a powerpc64 (64bit) kernel.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.15
__kernel_clock_gettime	LINUX_2.6.15
__kernel_datapage_offset	LINUX_2.6.15
__kernel_get_syscall_map	LINUX_2.6.15
__kernel_get_tbfreq	LINUX_2.6.15
__kernel_getcpu \fI*\fR	LINUX_2.6.15
__kernel_gettimeofday	LINUX_2.6.15
__kernel_sigtramp_rt32	LINUX_2.6.15
__kernel_sigtramp32	LINUX_2.6.15
__kernel_sync_dicache	LINUX_2.6.15
__kernel_sync_dicache_p5	LINUX_2.6.15
.TE
.if t \{\
.in
.ft P
\}
.SS ppc/64 functions
.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.15
__kernel_clock_gettime	LINUX_2.6.15
__kernel_datapage_offset	LINUX_2.6.15
__kernel_get_syscall_map	LINUX_2.6.15
__kernel_get_tbfreq	LINUX_2.6.15
__kernel_getcpu	LINUX_2.6.15
__kernel_gettimeofday	LINUX_2.6.15
__kernel_sigtramp_rt64	LINUX_2.6.15
__kernel_sync_dicache	LINUX_2.6.15
__kernel_sync_dicache_p5	LINUX_2.6.15
.TE
.if t \{\
.in
.ft P
\}
.SS s390 functions
.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.29
__kernel_clock_gettime	LINUX_2.6.29
__kernel_gettimeofday	LINUX_2.6.29
.TE
.if t \{\
.in
.ft P
\}
.SS s390x functions
.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.29
__kernel_clock_gettime	LINUX_2.6.29
__kernel_gettimeofday	LINUX_2.6.29
.TE
.if t \{\
.in
.ft P
\}
.SS sh (SuperH) functions
.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_rt_sigreturn	LINUX_2.6
__kernel_sigreturn	LINUX_2.6
__kernel_vsyscall	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SS i386 functions
.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_sigreturn	LINUX_2.5
__kernel_rt_sigreturn	LINUX_2.5
__kernel_vsyscall	LINUX_2.5
.TE
.if t \{\
.in
.ft P
\}
.SS x86_64 functions
.\" See linux/arch/x86/vdso/vdso.lds.S
Each of these symbols are also available without the "__vdso_" prefix, but
you should ignore those and stick to the names below.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__vdso_clock_gettime	LINUX_2.6
__vdso_getcpu	LINUX_2.6
__vdso_gettimeofday	LINUX_2.6
__vdso_time	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SS x86/x32 functions
.\" See linux/arch/x86/vdso/vdso32.lds.S
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__vdso_clock_gettime	LINUX_2.6
__vdso_getcpu	LINUX_2.6
__vdso_gettimeofday	LINUX_2.6
__vdso_time	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SH HISTORY
The vDSO was originally just a single function -- the vsyscall.
In older kernels, you might see that in a process's memory map rather than vdso.
Overtime, people realized that this was a great way to pass more functionality
to userspace, so it was reconceived as a vDSO in the current format.
.SH SEE ALSO
.BR syscalls (2),
.BR getauxval (3),
.BR proc (5)

The docs/examples/sources in the Linux sources:
.nf
Documentation/ABI/stable/vdso
linux/Documentation/ia64/fsys.txt
Documentation/vDSO/* (includes examples of using the vDSO)
find arch/ -iname '*vdso*' -o -iname '*gate*'
.fi

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vdso(7): new man page
       [not found]     ` <201304112128.47633.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2013-05-22 13:22       ` Michael Kerrisk
       [not found]         ` <519CC681.6080502-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Kerrisk @ 2013-05-22 13:22 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski

Hi Mike,

On 04/12/13 03:28, Mike Frysinger wrote:
> here's v2 w/Andy's feedback

Thanks for this--it's a nice piece of work. Could you take a 
look at my comments below and send a v3, please.

> .\" Written by Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
> .\"
> .\" %%%LICENSE_START(PUBLIC_DOMAIN)
> .\" This page is in the public domain.  Suck it.

Okay -- not my first choice for a license, but so be it.
But, how about we lose the "Suck it."...

> .\" %%%LICENSE_END
> .\"
> .TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
> .SH NAME
> vDSO \- overview of the virtual ELF dynamic shared object
> .SH SYNOPSIS
> .B #include <sys/auxv.h>
> 
> .B void *vdso = (uintptr_t)getauxval(AT_SYSINFO_EHDR);

Add space before "getauxval". (Usual convention for casts in code examples
in man pages.)

> .SH DESCRIPTION
> The "vDSO" is a small shared library that the kernel automatically maps into the
> address space of all userspace applications.

1,$s/userspace applications/user-space applications/

> Applications themselves usually need not concern themselves with this as it is
> most commonly called by the C library.

This last sentence doesn't quite make sense, since "this" and "it" refer to 
different things (I believe). Do you want something like:

	Applications generally do not need to care about the details since 
	the vDSO is automatically employed by the C library

?

> This way you can write using standard functions and the C library will take care
> of using any available functionality.
> 
> Why does this object exist at all?

s/this object/the vDSO/

> There are some facilities the kernel provides that userspace ends up using

s/userspace/user space/ 

(When used as a noun, and in other places in the page as well)


> frequently to the point that such calls can dominate overall performance.
> This is due both to the frequency of the call as well as the context overhead
> from exiting userspace and entering the kernel.
> 
> The rest of this documentation is geared towards the curious and/or C library
> writers rather than general developers.
> If you're trying to call the vDSO in your own application rather than using
> the C library, you're most likely doing it wrong.
> .SS Example Background

Convention for SS headings is that only the first word is capitalized (unless
English usage dictates otherwise--e.g., for a proper noun)

> Making syscalls can be slow.

1,$s/syscall/system call/

(and other instances in the page)

> In x86 32bit systems, you can trigger a software interrupt (int $0x80) to tell

s/32bit/32-bit/

> the kernel you wish to make a syscall.
> However, this instruction is expensive: it goes through the full interrupt
> handling paths in the processor's microcode as well as in the kernel.
> Newer processors have faster (but backwards incompatible) instructions to
> initiate system calls.
> Rather than require the C library to figure out if this functionality is
> available at runtime itself, it can use functions provided by the kernel in
> the vDSO.

That last point (after the comma) is the most interesting (IMO) of the use 
cases of the vDSO. If you cared to expand on the details (i.e., are what 
are mechanics of the operation of those functions provided by the kernel),
I think that would be interesting for the reader.

> Note that the terminology can be confusing.
> On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64,
> the term "vsyscall" also refers to an obsolete way to ask the kernel what time
> it is or what cpu the caller is on.

s/cpu/CPU/

> Another frequent system call is gettimeofday().
> This is called both directly by userspace applications as well as indirectly by
> the C library.
> Think timestamps or timing loops or polling -- all of these frequently need to
> know what time it is right now.
> This information is also not secret -- any application in any privilege mode
> (root or any user) will get the same answer.
> Thus the kernel arranges for the information required to answer this question
> to be placed in memory the process can access.
> Now a call to gettimeofday() changes from a syscall to a normal function call
> and a few memory accesses.
> .SS Finding The vDSO

s/The/the/

> The base address of the vDSO (if one exists) is passed by the kernel to each
> program in the initial auxiliary vector.
> Specifically, via the
> .B AT_SYSINFO_EHDR
> tag.
> 
> You must not assume the vDSO is mapped at any particular location in the
> user's memory map.
> The base address will usually be randomized at runtime every time a new is

Missing word after "new".

> processed (at
> .BR execve (2)
> time).
> This is done for security reasons to prevent standard "return-to-libc" attacks.
> 
> For some architectures, there is also a
> .B AT_SYSINFO
> tag.
> This is used only for locating the vsyscall entry point and is frequently
> omitted or set to 0 (meaning it's not available).
> It is a throw back to the initial vDSO work (see

s/throw back/throwback/

> .IR HISTORY
> below) and should be avoided.
> 
> Refer to
> .BR getauxval (3)
> for more details on accessing these fields.
> .SS File Format

s/Format/format/

> Since the vDSO is a fully formed ELF, you can do symbol lookups on it.

Missing word after ELF.

> This allows new symbols to be added with newer kernel releases, and for the
> C library to detect available functionality at runtime when running under
> different kernel versions.
> Often times the C library will do detection with the first call and then
> cache the result for subsequent calls.
> 
> All symbols are also versioned (using the GNU version format).
> This allows the kernel (in the very unlikely situation) to update the function

s/situation/case that it is necessary/

> signature without breaking backwards compatibility.
> This means changing the arguments that it accepts as well as the return value.

What is "it" in the previous line? (Please replace with a suitable noun.)

> When looking up a symbol in the vDSO, you must always include the version you
> are writing against.
> 
> Typically the vDSO follows the naming convention of prefixing all symbols with
> "__vdso_" or "__kernel_" so as to distinguish from other standard symbols.

s/distinguish/distinguish them/

> e.g. The "gettimeofday" function is named "__vdso_gettimeofday".
> 
> You use the standard C calling conventions when calling any of these functions.
> No need to worry about weird register or stack behavior.

That last sentence is a little incomplete. Could you expand/reword a little 
please. 

> .SH NOTES
> .SS Source
> When you compile the kernel, it will automatically compile and link the vDSO
> code for you.
> You will frequently find it under the arch specific dir:

s/arch specific dir/architecture-specific directory/

> .br

Change that last to a blank line, and then indent the next line by 4 spaces.

> find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
> 
> Note that the vDSO that is used is based on the ABI of your userspace code
> and not the ABI of the kernel.
> i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an

s/i.e. If/In other words, if/
s/32bit/32-big/g

> x86_64 64bit kernel, you'll get the same vDSO.

s/64bit/64-bit/

> So when referring to sections below, use the userspace ABI.

It's not clear what you mean here when you say "use the userspace ABI."
Could you clarify?

> .SS vDSO Names

s/Names/names/

> The name of this shared object varies across architectures.
> It will often show up in things like glibc's `ldd` output.
> The exact name should not matter to any code, so please do not hardcode it.

s/please//

> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> user ABI	vDSO name
> _
> aarch64	linux-vdso.so.1
> ia64	linux-gate.so.1
> ppc/32	linux-vdso32.so.1
> ppc/64	linux-vdso64.so.1
> s390	linux-vdso32.so.1
> s390x	linux-vdso64.so.1
> sh	linux-gate.so.1
> i386	linux-gate.so.1
> x86_64	linux-vdso.so.1
> x86/x32	linux-vdso.so.1
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS aarch64 functions
> .\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __kernel_rt_sigreturn	LINUX_2.6.39
> __kernel_gettimeofday	LINUX_2.6.39
> __kernel_clock_gettime	LINUX_2.6.39
> __kernel_clock_getres	LINUX_2.6.39
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS bfin (Blackfin) functions
> .\" See linux/arch/blackfin/kernel/fixed_code.S
> .\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code

Thanks -- adding references like the above in the source is helpful
for future maintenance.

> As this cpu lacks a MMU, it doesn't setup a vDSO in the normal sense.

s/cpu/CPU/
s/MMU/memory-management unit (MMU)/
s/setup/set up/

> Instead, it maps at boot time a few raw functions into a fixed location in
> memory.
> Userspace apps then call directly into that.

s/apps/applications/

> There is no provision for backwards compatibility beyond sniffing raw opcodes,
> but as this is an embedded CPU, it can get away with things -- some of the
> object formats it runs aren't even ELF based (they're bFLT/FLAT).
> 
> For documentation on this format, it's better you refer to the public docs:
> .br
> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
> .SS ia64 (Itanium) functions
> .\" See linux/arch/ia64/kernel/gate.lds.S
> .\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __kernel_sigtramp	LINUX_2.5
> __kernel_syscall_via_break	LINUX_2.5
> __kernel_syscall_via_epc	LINUX_2.5
> .TE
> .if t \{\
> .in
> .ft P
> \}
> 
> The Itanium port actually likes to get tricky.
> In addition to the vDSO above, it also has "light-weight system calls" aka

s/aka/also known as/

> "fast syscalls" aka "fsys".

s/aka/or/

> You can invoke these via the __kernel_syscall_via_epc vDSO helper.
> The system calls listed here have the same semantics as if you called them
> directly via
> .BR syscall (3),
> so refer to the relevant
> documentation for each.
> The table below lists the functions available via this mechanism.
> .if t \{\
> .ft CW
> \}
> .TS
> l.
> function
> _
> clock_gettime
> getcpu
> getpid
> getppid
> gettimeofday
> set_tid_address
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS ppc/32 functions
> .\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
> The functions marked with a
> .I *
> below are only available when the kernel is
> a powerpc64 (64bit) kernel.

s/64bit/64-bit/

> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __kernel_clock_getres	LINUX_2.6.15
> __kernel_clock_gettime	LINUX_2.6.15
> __kernel_datapage_offset	LINUX_2.6.15
> __kernel_get_syscall_map	LINUX_2.6.15
> __kernel_get_tbfreq	LINUX_2.6.15
> __kernel_getcpu \fI*\fR	LINUX_2.6.15
> __kernel_gettimeofday	LINUX_2.6.15
> __kernel_sigtramp_rt32	LINUX_2.6.15
> __kernel_sigtramp32	LINUX_2.6.15
> __kernel_sync_dicache	LINUX_2.6.15
> __kernel_sync_dicache_p5	LINUX_2.6.15
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS ppc/64 functions
> .\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __kernel_clock_getres	LINUX_2.6.15
> __kernel_clock_gettime	LINUX_2.6.15
> __kernel_datapage_offset	LINUX_2.6.15
> __kernel_get_syscall_map	LINUX_2.6.15
> __kernel_get_tbfreq	LINUX_2.6.15
> __kernel_getcpu	LINUX_2.6.15
> __kernel_gettimeofday	LINUX_2.6.15
> __kernel_sigtramp_rt64	LINUX_2.6.15
> __kernel_sync_dicache	LINUX_2.6.15
> __kernel_sync_dicache_p5	LINUX_2.6.15
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS s390 functions
> .\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __kernel_clock_getres	LINUX_2.6.29
> __kernel_clock_gettime	LINUX_2.6.29
> __kernel_gettimeofday	LINUX_2.6.29
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS s390x functions
> .\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __kernel_clock_getres	LINUX_2.6.29
> __kernel_clock_gettime	LINUX_2.6.29
> __kernel_gettimeofday	LINUX_2.6.29
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS sh (SuperH) functions
> .\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __kernel_rt_sigreturn	LINUX_2.6
> __kernel_sigreturn	LINUX_2.6
> __kernel_vsyscall	LINUX_2.6
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS i386 functions
> .\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __kernel_sigreturn	LINUX_2.5
> __kernel_rt_sigreturn	LINUX_2.5
> __kernel_vsyscall	LINUX_2.5
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS x86_64 functions
> .\" See linux/arch/x86/vdso/vdso.lds.S
> Each of these symbols are also available without the "__vdso_" prefix, but

Either:
s/Each of these symbols are/All of these symbols are/
or
s/Each of these symbols are/Each of these symbols is/

> you should ignore those and stick to the names below.
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __vdso_clock_gettime	LINUX_2.6
> __vdso_getcpu	LINUX_2.6
> __vdso_gettimeofday	LINUX_2.6
> __vdso_time	LINUX_2.6
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SS x86/x32 functions
> .\" See linux/arch/x86/vdso/vdso32.lds.S
> .if t \{\
> .ft CW
> \}
> .TS
> l l.
> symbol	version
> _
> __vdso_clock_gettime	LINUX_2.6
> __vdso_getcpu	LINUX_2.6
> __vdso_gettimeofday	LINUX_2.6
> __vdso_time	LINUX_2.6
> .TE
> .if t \{\
> .in
> .ft P
> \}
> .SH HISTORY

Better to have this as 

.SS History

> The vDSO was originally just a single function -- the vsyscall.
> In older kernels, you might see that in a process's memory map rather than vdso.
> Overtime, people realized that this was a great way to pass more functionality

s/Overtime/Over time/

> to userspace, so it was reconceived as a vDSO in the current format.
> .SH SEE ALSO
> .BR syscalls (2),
> .BR getauxval (3),
> .BR proc (5)
> 
> The docs/examples/sources in the Linux sources:
> .nf
> Documentation/ABI/stable/vdso
> linux/Documentation/ia64/fsys.txt
> Documentation/vDSO/* (includes examples of using the vDSO)
> find arch/ -iname '*vdso*' -o -iname '*gate*'
> .fi
> 

In the next iteration, could you include a second (separate) patch to 
syscalls.2  and getauxval.3 that adds
.BR vdso (7)
under SEE ALSO.

Thanks,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vdso(7): new man page
       [not found]         ` <519CC681.6080502-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2013-06-27  0:00           ` Michael Kerrisk (man-pages)
       [not found]             ` <CAKgNAkgwmfBeyijCHj+y2FSQbgSDY8izW-9DAqbw4wgD2y1pAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-12-31  7:32           ` Mike Frysinger
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-06-27  0:00 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: linux-man, Andy Lutomirski

Hi Mike,

Ping!

Cheers,

Michael



On Wed, May 22, 2013 at 3:22 PM, Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Mike,
>
> On 04/12/13 03:28, Mike Frysinger wrote:
>> here's v2 w/Andy's feedback
>
> Thanks for this--it's a nice piece of work. Could you take a
> look at my comments below and send a v3, please.
>
>> .\" Written by Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
>> .\"
>> .\" %%%LICENSE_START(PUBLIC_DOMAIN)
>> .\" This page is in the public domain.  Suck it.
>
> Okay -- not my first choice for a license, but so be it.
> But, how about we lose the "Suck it."...
>
>> .\" %%%LICENSE_END
>> .\"
>> .TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
>> .SH NAME
>> vDSO \- overview of the virtual ELF dynamic shared object
>> .SH SYNOPSIS
>> .B #include <sys/auxv.h>
>>
>> .B void *vdso = (uintptr_t)getauxval(AT_SYSINFO_EHDR);
>
> Add space before "getauxval". (Usual convention for casts in code examples
> in man pages.)
>
>> .SH DESCRIPTION
>> The "vDSO" is a small shared library that the kernel automatically maps into the
>> address space of all userspace applications.
>
> 1,$s/userspace applications/user-space applications/
>
>> Applications themselves usually need not concern themselves with this as it is
>> most commonly called by the C library.
>
> This last sentence doesn't quite make sense, since "this" and "it" refer to
> different things (I believe). Do you want something like:
>
>         Applications generally do not need to care about the details since
>         the vDSO is automatically employed by the C library
>
> ?
>
>> This way you can write using standard functions and the C library will take care
>> of using any available functionality.
>>
>> Why does this object exist at all?
>
> s/this object/the vDSO/
>
>> There are some facilities the kernel provides that userspace ends up using
>
> s/userspace/user space/
>
> (When used as a noun, and in other places in the page as well)
>
>
>> frequently to the point that such calls can dominate overall performance.
>> This is due both to the frequency of the call as well as the context overhead
>> from exiting userspace and entering the kernel.
>>
>> The rest of this documentation is geared towards the curious and/or C library
>> writers rather than general developers.
>> If you're trying to call the vDSO in your own application rather than using
>> the C library, you're most likely doing it wrong.
>> .SS Example Background
>
> Convention for SS headings is that only the first word is capitalized (unless
> English usage dictates otherwise--e.g., for a proper noun)
>
>> Making syscalls can be slow.
>
> 1,$s/syscall/system call/
>
> (and other instances in the page)
>
>> In x86 32bit systems, you can trigger a software interrupt (int $0x80) to tell
>
> s/32bit/32-bit/
>
>> the kernel you wish to make a syscall.
>> However, this instruction is expensive: it goes through the full interrupt
>> handling paths in the processor's microcode as well as in the kernel.
>> Newer processors have faster (but backwards incompatible) instructions to
>> initiate system calls.
>> Rather than require the C library to figure out if this functionality is
>> available at runtime itself, it can use functions provided by the kernel in
>> the vDSO.
>
> That last point (after the comma) is the most interesting (IMO) of the use
> cases of the vDSO. If you cared to expand on the details (i.e., are what
> are mechanics of the operation of those functions provided by the kernel),
> I think that would be interesting for the reader.
>
>> Note that the terminology can be confusing.
>> On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64,
>> the term "vsyscall" also refers to an obsolete way to ask the kernel what time
>> it is or what cpu the caller is on.
>
> s/cpu/CPU/
>
>> Another frequent system call is gettimeofday().
>> This is called both directly by userspace applications as well as indirectly by
>> the C library.
>> Think timestamps or timing loops or polling -- all of these frequently need to
>> know what time it is right now.
>> This information is also not secret -- any application in any privilege mode
>> (root or any user) will get the same answer.
>> Thus the kernel arranges for the information required to answer this question
>> to be placed in memory the process can access.
>> Now a call to gettimeofday() changes from a syscall to a normal function call
>> and a few memory accesses.
>> .SS Finding The vDSO
>
> s/The/the/
>
>> The base address of the vDSO (if one exists) is passed by the kernel to each
>> program in the initial auxiliary vector.
>> Specifically, via the
>> .B AT_SYSINFO_EHDR
>> tag.
>>
>> You must not assume the vDSO is mapped at any particular location in the
>> user's memory map.
>> The base address will usually be randomized at runtime every time a new is
>
> Missing word after "new".
>
>> processed (at
>> .BR execve (2)
>> time).
>> This is done for security reasons to prevent standard "return-to-libc" attacks.
>>
>> For some architectures, there is also a
>> .B AT_SYSINFO
>> tag.
>> This is used only for locating the vsyscall entry point and is frequently
>> omitted or set to 0 (meaning it's not available).
>> It is a throw back to the initial vDSO work (see
>
> s/throw back/throwback/
>
>> .IR HISTORY
>> below) and should be avoided.
>>
>> Refer to
>> .BR getauxval (3)
>> for more details on accessing these fields.
>> .SS File Format
>
> s/Format/format/
>
>> Since the vDSO is a fully formed ELF, you can do symbol lookups on it.
>
> Missing word after ELF.
>
>> This allows new symbols to be added with newer kernel releases, and for the
>> C library to detect available functionality at runtime when running under
>> different kernel versions.
>> Often times the C library will do detection with the first call and then
>> cache the result for subsequent calls.
>>
>> All symbols are also versioned (using the GNU version format).
>> This allows the kernel (in the very unlikely situation) to update the function
>
> s/situation/case that it is necessary/
>
>> signature without breaking backwards compatibility.
>> This means changing the arguments that it accepts as well as the return value.
>
> What is "it" in the previous line? (Please replace with a suitable noun.)
>
>> When looking up a symbol in the vDSO, you must always include the version you
>> are writing against.
>>
>> Typically the vDSO follows the naming convention of prefixing all symbols with
>> "__vdso_" or "__kernel_" so as to distinguish from other standard symbols.
>
> s/distinguish/distinguish them/
>
>> e.g. The "gettimeofday" function is named "__vdso_gettimeofday".
>>
>> You use the standard C calling conventions when calling any of these functions.
>> No need to worry about weird register or stack behavior.
>
> That last sentence is a little incomplete. Could you expand/reword a little
> please.
>
>> .SH NOTES
>> .SS Source
>> When you compile the kernel, it will automatically compile and link the vDSO
>> code for you.
>> You will frequently find it under the arch specific dir:
>
> s/arch specific dir/architecture-specific directory/
>
>> .br
>
> Change that last to a blank line, and then indent the next line by 4 spaces.
>
>> find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
>>
>> Note that the vDSO that is used is based on the ABI of your userspace code
>> and not the ABI of the kernel.
>> i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an
>
> s/i.e. If/In other words, if/
> s/32bit/32-big/g
>
>> x86_64 64bit kernel, you'll get the same vDSO.
>
> s/64bit/64-bit/
>
>> So when referring to sections below, use the userspace ABI.
>
> It's not clear what you mean here when you say "use the userspace ABI."
> Could you clarify?
>
>> .SS vDSO Names
>
> s/Names/names/
>
>> The name of this shared object varies across architectures.
>> It will often show up in things like glibc's `ldd` output.
>> The exact name should not matter to any code, so please do not hardcode it.
>
> s/please//
>
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> user ABI      vDSO name
>> _
>> aarch64       linux-vdso.so.1
>> ia64  linux-gate.so.1
>> ppc/32        linux-vdso32.so.1
>> ppc/64        linux-vdso64.so.1
>> s390  linux-vdso32.so.1
>> s390x linux-vdso64.so.1
>> sh    linux-gate.so.1
>> i386  linux-gate.so.1
>> x86_64        linux-vdso.so.1
>> x86/x32       linux-vdso.so.1
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS aarch64 functions
>> .\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_rt_sigreturn LINUX_2.6.39
>> __kernel_gettimeofday LINUX_2.6.39
>> __kernel_clock_gettime        LINUX_2.6.39
>> __kernel_clock_getres LINUX_2.6.39
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS bfin (Blackfin) functions
>> .\" See linux/arch/blackfin/kernel/fixed_code.S
>> .\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
>
> Thanks -- adding references like the above in the source is helpful
> for future maintenance.
>
>> As this cpu lacks a MMU, it doesn't setup a vDSO in the normal sense.
>
> s/cpu/CPU/
> s/MMU/memory-management unit (MMU)/
> s/setup/set up/
>
>> Instead, it maps at boot time a few raw functions into a fixed location in
>> memory.
>> Userspace apps then call directly into that.
>
> s/apps/applications/
>
>> There is no provision for backwards compatibility beyond sniffing raw opcodes,
>> but as this is an embedded CPU, it can get away with things -- some of the
>> object formats it runs aren't even ELF based (they're bFLT/FLAT).
>>
>> For documentation on this format, it's better you refer to the public docs:
>> .br
>> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
>> .SS ia64 (Itanium) functions
>> .\" See linux/arch/ia64/kernel/gate.lds.S
>> .\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_sigtramp     LINUX_2.5
>> __kernel_syscall_via_break    LINUX_2.5
>> __kernel_syscall_via_epc      LINUX_2.5
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>>
>> The Itanium port actually likes to get tricky.
>> In addition to the vDSO above, it also has "light-weight system calls" aka
>
> s/aka/also known as/
>
>> "fast syscalls" aka "fsys".
>
> s/aka/or/
>
>> You can invoke these via the __kernel_syscall_via_epc vDSO helper.
>> The system calls listed here have the same semantics as if you called them
>> directly via
>> .BR syscall (3),
>> so refer to the relevant
>> documentation for each.
>> The table below lists the functions available via this mechanism.
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l.
>> function
>> _
>> clock_gettime
>> getcpu
>> getpid
>> getppid
>> gettimeofday
>> set_tid_address
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS ppc/32 functions
>> .\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
>> The functions marked with a
>> .I *
>> below are only available when the kernel is
>> a powerpc64 (64bit) kernel.
>
> s/64bit/64-bit/
>
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_clock_getres LINUX_2.6.15
>> __kernel_clock_gettime        LINUX_2.6.15
>> __kernel_datapage_offset      LINUX_2.6.15
>> __kernel_get_syscall_map      LINUX_2.6.15
>> __kernel_get_tbfreq   LINUX_2.6.15
>> __kernel_getcpu \fI*\fR       LINUX_2.6.15
>> __kernel_gettimeofday LINUX_2.6.15
>> __kernel_sigtramp_rt32        LINUX_2.6.15
>> __kernel_sigtramp32   LINUX_2.6.15
>> __kernel_sync_dicache LINUX_2.6.15
>> __kernel_sync_dicache_p5      LINUX_2.6.15
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS ppc/64 functions
>> .\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_clock_getres LINUX_2.6.15
>> __kernel_clock_gettime        LINUX_2.6.15
>> __kernel_datapage_offset      LINUX_2.6.15
>> __kernel_get_syscall_map      LINUX_2.6.15
>> __kernel_get_tbfreq   LINUX_2.6.15
>> __kernel_getcpu       LINUX_2.6.15
>> __kernel_gettimeofday LINUX_2.6.15
>> __kernel_sigtramp_rt64        LINUX_2.6.15
>> __kernel_sync_dicache LINUX_2.6.15
>> __kernel_sync_dicache_p5      LINUX_2.6.15
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS s390 functions
>> .\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_clock_getres LINUX_2.6.29
>> __kernel_clock_gettime        LINUX_2.6.29
>> __kernel_gettimeofday LINUX_2.6.29
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS s390x functions
>> .\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_clock_getres LINUX_2.6.29
>> __kernel_clock_gettime        LINUX_2.6.29
>> __kernel_gettimeofday LINUX_2.6.29
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS sh (SuperH) functions
>> .\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_rt_sigreturn LINUX_2.6
>> __kernel_sigreturn    LINUX_2.6
>> __kernel_vsyscall     LINUX_2.6
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS i386 functions
>> .\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __kernel_sigreturn    LINUX_2.5
>> __kernel_rt_sigreturn LINUX_2.5
>> __kernel_vsyscall     LINUX_2.5
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS x86_64 functions
>> .\" See linux/arch/x86/vdso/vdso.lds.S
>> Each of these symbols are also available without the "__vdso_" prefix, but
>
> Either:
> s/Each of these symbols are/All of these symbols are/
> or
> s/Each of these symbols are/Each of these symbols is/
>
>> you should ignore those and stick to the names below.
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __vdso_clock_gettime  LINUX_2.6
>> __vdso_getcpu LINUX_2.6
>> __vdso_gettimeofday   LINUX_2.6
>> __vdso_time   LINUX_2.6
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SS x86/x32 functions
>> .\" See linux/arch/x86/vdso/vdso32.lds.S
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l l.
>> symbol        version
>> _
>> __vdso_clock_gettime  LINUX_2.6
>> __vdso_getcpu LINUX_2.6
>> __vdso_gettimeofday   LINUX_2.6
>> __vdso_time   LINUX_2.6
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>> .SH HISTORY
>
> Better to have this as
>
> .SS History
>
>> The vDSO was originally just a single function -- the vsyscall.
>> In older kernels, you might see that in a process's memory map rather than vdso.
>> Overtime, people realized that this was a great way to pass more functionality
>
> s/Overtime/Over time/
>
>> to userspace, so it was reconceived as a vDSO in the current format.
>> .SH SEE ALSO
>> .BR syscalls (2),
>> .BR getauxval (3),
>> .BR proc (5)
>>
>> The docs/examples/sources in the Linux sources:
>> .nf
>> Documentation/ABI/stable/vdso
>> linux/Documentation/ia64/fsys.txt
>> Documentation/vDSO/* (includes examples of using the vDSO)
>> find arch/ -iname '*vdso*' -o -iname '*gate*'
>> .fi
>>
>
> In the next iteration, could you include a second (separate) patch to
> syscalls.2  and getauxval.3 that adds
> .BR vdso (7)
> under SEE ALSO.
>
> Thanks,
>
> Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vdso(7): new man page
       [not found]             ` <CAKgNAkgwmfBeyijCHj+y2FSQbgSDY8izW-9DAqbw4wgD2y1pAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-12-30 11:27               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-12-30 11:27 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, linux-man, Andy Lutomirski

Hi Mike,

This page seems to have fallen on the floor. Would you have time 
to look at my comments below and submit a new version of this page?

Cheers,

Michael


On 06/27/13 12:00, Michael Kerrisk (man-pages) wrote:
> Hi Mike,
> 
> Ping!
> 
> Cheers,
> 
> Michael
> 
> 
> 
> On Wed, May 22, 2013 at 3:22 PM, Michael Kerrisk <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi Mike,
>>
>> On 04/12/13 03:28, Mike Frysinger wrote:
>>> here's v2 w/Andy's feedback
>>
>> Thanks for this--it's a nice piece of work. Could you take a
>> look at my comments below and send a v3, please.
>>
>>> .\" Written by Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
>>> .\"
>>> .\" %%%LICENSE_START(PUBLIC_DOMAIN)
>>> .\" This page is in the public domain.  Suck it.
>>
>> Okay -- not my first choice for a license, but so be it.
>> But, how about we lose the "Suck it."...
>>
>>> .\" %%%LICENSE_END
>>> .\"
>>> .TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
>>> .SH NAME
>>> vDSO \- overview of the virtual ELF dynamic shared object
>>> .SH SYNOPSIS
>>> .B #include <sys/auxv.h>
>>>
>>> .B void *vdso = (uintptr_t)getauxval(AT_SYSINFO_EHDR);
>>
>> Add space before "getauxval". (Usual convention for casts in code examples
>> in man pages.)
>>
>>> .SH DESCRIPTION
>>> The "vDSO" is a small shared library that the kernel automatically maps into the
>>> address space of all userspace applications.
>>
>> 1,$s/userspace applications/user-space applications/
>>
>>> Applications themselves usually need not concern themselves with this as it is
>>> most commonly called by the C library.
>>
>> This last sentence doesn't quite make sense, since "this" and "it" refer to
>> different things (I believe). Do you want something like:
>>
>>         Applications generally do not need to care about the details since
>>         the vDSO is automatically employed by the C library
>>
>> ?
>>
>>> This way you can write using standard functions and the C library will take care
>>> of using any available functionality.
>>>
>>> Why does this object exist at all?
>>
>> s/this object/the vDSO/
>>
>>> There are some facilities the kernel provides that userspace ends up using
>>
>> s/userspace/user space/
>>
>> (When used as a noun, and in other places in the page as well)
>>
>>
>>> frequently to the point that such calls can dominate overall performance.
>>> This is due both to the frequency of the call as well as the context overhead
>>> from exiting userspace and entering the kernel.
>>>
>>> The rest of this documentation is geared towards the curious and/or C library
>>> writers rather than general developers.
>>> If you're trying to call the vDSO in your own application rather than using
>>> the C library, you're most likely doing it wrong.
>>> .SS Example Background
>>
>> Convention for SS headings is that only the first word is capitalized (unless
>> English usage dictates otherwise--e.g., for a proper noun)
>>
>>> Making syscalls can be slow.
>>
>> 1,$s/syscall/system call/
>>
>> (and other instances in the page)
>>
>>> In x86 32bit systems, you can trigger a software interrupt (int $0x80) to tell
>>
>> s/32bit/32-bit/
>>
>>> the kernel you wish to make a syscall.
>>> However, this instruction is expensive: it goes through the full interrupt
>>> handling paths in the processor's microcode as well as in the kernel.
>>> Newer processors have faster (but backwards incompatible) instructions to
>>> initiate system calls.
>>> Rather than require the C library to figure out if this functionality is
>>> available at runtime itself, it can use functions provided by the kernel in
>>> the vDSO.
>>
>> That last point (after the comma) is the most interesting (IMO) of the use
>> cases of the vDSO. If you cared to expand on the details (i.e., are what
>> are mechanics of the operation of those functions provided by the kernel),
>> I think that would be interesting for the reader.
>>
>>> Note that the terminology can be confusing.
>>> On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64,
>>> the term "vsyscall" also refers to an obsolete way to ask the kernel what time
>>> it is or what cpu the caller is on.
>>
>> s/cpu/CPU/
>>
>>> Another frequent system call is gettimeofday().
>>> This is called both directly by userspace applications as well as indirectly by
>>> the C library.
>>> Think timestamps or timing loops or polling -- all of these frequently need to
>>> know what time it is right now.
>>> This information is also not secret -- any application in any privilege mode
>>> (root or any user) will get the same answer.
>>> Thus the kernel arranges for the information required to answer this question
>>> to be placed in memory the process can access.
>>> Now a call to gettimeofday() changes from a syscall to a normal function call
>>> and a few memory accesses.
>>> .SS Finding The vDSO
>>
>> s/The/the/
>>
>>> The base address of the vDSO (if one exists) is passed by the kernel to each
>>> program in the initial auxiliary vector.
>>> Specifically, via the
>>> .B AT_SYSINFO_EHDR
>>> tag.
>>>
>>> You must not assume the vDSO is mapped at any particular location in the
>>> user's memory map.
>>> The base address will usually be randomized at runtime every time a new is
>>
>> Missing word after "new".
>>
>>> processed (at
>>> .BR execve (2)
>>> time).
>>> This is done for security reasons to prevent standard "return-to-libc" attacks.
>>>
>>> For some architectures, there is also a
>>> .B AT_SYSINFO
>>> tag.
>>> This is used only for locating the vsyscall entry point and is frequently
>>> omitted or set to 0 (meaning it's not available).
>>> It is a throw back to the initial vDSO work (see
>>
>> s/throw back/throwback/
>>
>>> .IR HISTORY
>>> below) and should be avoided.
>>>
>>> Refer to
>>> .BR getauxval (3)
>>> for more details on accessing these fields.
>>> .SS File Format
>>
>> s/Format/format/
>>
>>> Since the vDSO is a fully formed ELF, you can do symbol lookups on it.
>>
>> Missing word after ELF.
>>
>>> This allows new symbols to be added with newer kernel releases, and for the
>>> C library to detect available functionality at runtime when running under
>>> different kernel versions.
>>> Often times the C library will do detection with the first call and then
>>> cache the result for subsequent calls.
>>>
>>> All symbols are also versioned (using the GNU version format).
>>> This allows the kernel (in the very unlikely situation) to update the function
>>
>> s/situation/case that it is necessary/
>>
>>> signature without breaking backwards compatibility.
>>> This means changing the arguments that it accepts as well as the return value.
>>
>> What is "it" in the previous line? (Please replace with a suitable noun.)
>>
>>> When looking up a symbol in the vDSO, you must always include the version you
>>> are writing against.
>>>
>>> Typically the vDSO follows the naming convention of prefixing all symbols with
>>> "__vdso_" or "__kernel_" so as to distinguish from other standard symbols.
>>
>> s/distinguish/distinguish them/
>>
>>> e.g. The "gettimeofday" function is named "__vdso_gettimeofday".
>>>
>>> You use the standard C calling conventions when calling any of these functions.
>>> No need to worry about weird register or stack behavior.
>>
>> That last sentence is a little incomplete. Could you expand/reword a little
>> please.
>>
>>> .SH NOTES
>>> .SS Source
>>> When you compile the kernel, it will automatically compile and link the vDSO
>>> code for you.
>>> You will frequently find it under the arch specific dir:
>>
>> s/arch specific dir/architecture-specific directory/
>>
>>> .br
>>
>> Change that last to a blank line, and then indent the next line by 4 spaces.
>>
>>> find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
>>>
>>> Note that the vDSO that is used is based on the ABI of your userspace code
>>> and not the ABI of the kernel.
>>> i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an
>>
>> s/i.e. If/In other words, if/
>> s/32bit/32-big/g
>>
>>> x86_64 64bit kernel, you'll get the same vDSO.
>>
>> s/64bit/64-bit/
>>
>>> So when referring to sections below, use the userspace ABI.
>>
>> It's not clear what you mean here when you say "use the userspace ABI."
>> Could you clarify?
>>
>>> .SS vDSO Names
>>
>> s/Names/names/
>>
>>> The name of this shared object varies across architectures.
>>> It will often show up in things like glibc's `ldd` output.
>>> The exact name should not matter to any code, so please do not hardcode it.
>>
>> s/please//
>>
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> user ABI      vDSO name
>>> _
>>> aarch64       linux-vdso.so.1
>>> ia64  linux-gate.so.1
>>> ppc/32        linux-vdso32.so.1
>>> ppc/64        linux-vdso64.so.1
>>> s390  linux-vdso32.so.1
>>> s390x linux-vdso64.so.1
>>> sh    linux-gate.so.1
>>> i386  linux-gate.so.1
>>> x86_64        linux-vdso.so.1
>>> x86/x32       linux-vdso.so.1
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS aarch64 functions
>>> .\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __kernel_rt_sigreturn LINUX_2.6.39
>>> __kernel_gettimeofday LINUX_2.6.39
>>> __kernel_clock_gettime        LINUX_2.6.39
>>> __kernel_clock_getres LINUX_2.6.39
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS bfin (Blackfin) functions
>>> .\" See linux/arch/blackfin/kernel/fixed_code.S
>>> .\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
>>
>> Thanks -- adding references like the above in the source is helpful
>> for future maintenance.
>>
>>> As this cpu lacks a MMU, it doesn't setup a vDSO in the normal sense.
>>
>> s/cpu/CPU/
>> s/MMU/memory-management unit (MMU)/
>> s/setup/set up/
>>
>>> Instead, it maps at boot time a few raw functions into a fixed location in
>>> memory.
>>> Userspace apps then call directly into that.
>>
>> s/apps/applications/
>>
>>> There is no provision for backwards compatibility beyond sniffing raw opcodes,
>>> but as this is an embedded CPU, it can get away with things -- some of the
>>> object formats it runs aren't even ELF based (they're bFLT/FLAT).
>>>
>>> For documentation on this format, it's better you refer to the public docs:
>>> .br
>>> http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
>>> .SS ia64 (Itanium) functions
>>> .\" See linux/arch/ia64/kernel/gate.lds.S
>>> .\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __kernel_sigtramp     LINUX_2.5
>>> __kernel_syscall_via_break    LINUX_2.5
>>> __kernel_syscall_via_epc      LINUX_2.5
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>>
>>> The Itanium port actually likes to get tricky.
>>> In addition to the vDSO above, it also has "light-weight system calls" aka
>>
>> s/aka/also known as/
>>
>>> "fast syscalls" aka "fsys".
>>
>> s/aka/or/
>>
>>> You can invoke these via the __kernel_syscall_via_epc vDSO helper.
>>> The system calls listed here have the same semantics as if you called them
>>> directly via
>>> .BR syscall (3),
>>> so refer to the relevant
>>> documentation for each.
>>> The table below lists the functions available via this mechanism.
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l.
>>> function
>>> _
>>> clock_gettime
>>> getcpu
>>> getpid
>>> getppid
>>> gettimeofday
>>> set_tid_address
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS ppc/32 functions
>>> .\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
>>> The functions marked with a
>>> .I *
>>> below are only available when the kernel is
>>> a powerpc64 (64bit) kernel.
>>
>> s/64bit/64-bit/
>>
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __kernel_clock_getres LINUX_2.6.15
>>> __kernel_clock_gettime        LINUX_2.6.15
>>> __kernel_datapage_offset      LINUX_2.6.15
>>> __kernel_get_syscall_map      LINUX_2.6.15
>>> __kernel_get_tbfreq   LINUX_2.6.15
>>> __kernel_getcpu \fI*\fR       LINUX_2.6.15
>>> __kernel_gettimeofday LINUX_2.6.15
>>> __kernel_sigtramp_rt32        LINUX_2.6.15
>>> __kernel_sigtramp32   LINUX_2.6.15
>>> __kernel_sync_dicache LINUX_2.6.15
>>> __kernel_sync_dicache_p5      LINUX_2.6.15
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS ppc/64 functions
>>> .\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __kernel_clock_getres LINUX_2.6.15
>>> __kernel_clock_gettime        LINUX_2.6.15
>>> __kernel_datapage_offset      LINUX_2.6.15
>>> __kernel_get_syscall_map      LINUX_2.6.15
>>> __kernel_get_tbfreq   LINUX_2.6.15
>>> __kernel_getcpu       LINUX_2.6.15
>>> __kernel_gettimeofday LINUX_2.6.15
>>> __kernel_sigtramp_rt64        LINUX_2.6.15
>>> __kernel_sync_dicache LINUX_2.6.15
>>> __kernel_sync_dicache_p5      LINUX_2.6.15
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS s390 functions
>>> .\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __kernel_clock_getres LINUX_2.6.29
>>> __kernel_clock_gettime        LINUX_2.6.29
>>> __kernel_gettimeofday LINUX_2.6.29
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS s390x functions
>>> .\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __kernel_clock_getres LINUX_2.6.29
>>> __kernel_clock_gettime        LINUX_2.6.29
>>> __kernel_gettimeofday LINUX_2.6.29
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS sh (SuperH) functions
>>> .\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __kernel_rt_sigreturn LINUX_2.6
>>> __kernel_sigreturn    LINUX_2.6
>>> __kernel_vsyscall     LINUX_2.6
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS i386 functions
>>> .\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __kernel_sigreturn    LINUX_2.5
>>> __kernel_rt_sigreturn LINUX_2.5
>>> __kernel_vsyscall     LINUX_2.5
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS x86_64 functions
>>> .\" See linux/arch/x86/vdso/vdso.lds.S
>>> Each of these symbols are also available without the "__vdso_" prefix, but
>>
>> Either:
>> s/Each of these symbols are/All of these symbols are/
>> or
>> s/Each of these symbols are/Each of these symbols is/
>>
>>> you should ignore those and stick to the names below.
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __vdso_clock_gettime  LINUX_2.6
>>> __vdso_getcpu LINUX_2.6
>>> __vdso_gettimeofday   LINUX_2.6
>>> __vdso_time   LINUX_2.6
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SS x86/x32 functions
>>> .\" See linux/arch/x86/vdso/vdso32.lds.S
>>> .if t \{\
>>> .ft CW
>>> \}
>>> .TS
>>> l l.
>>> symbol        version
>>> _
>>> __vdso_clock_gettime  LINUX_2.6
>>> __vdso_getcpu LINUX_2.6
>>> __vdso_gettimeofday   LINUX_2.6
>>> __vdso_time   LINUX_2.6
>>> .TE
>>> .if t \{\
>>> .in
>>> .ft P
>>> \}
>>> .SH HISTORY
>>
>> Better to have this as
>>
>> .SS History
>>
>>> The vDSO was originally just a single function -- the vsyscall.
>>> In older kernels, you might see that in a process's memory map rather than vdso.
>>> Overtime, people realized that this was a great way to pass more functionality
>>
>> s/Overtime/Over time/
>>
>>> to userspace, so it was reconceived as a vDSO in the current format.
>>> .SH SEE ALSO
>>> .BR syscalls (2),
>>> .BR getauxval (3),
>>> .BR proc (5)
>>>
>>> The docs/examples/sources in the Linux sources:
>>> .nf
>>> Documentation/ABI/stable/vdso
>>> linux/Documentation/ia64/fsys.txt
>>> Documentation/vDSO/* (includes examples of using the vDSO)
>>> find arch/ -iname '*vdso*' -o -iname '*gate*'
>>> .fi
>>>
>>
>> In the next iteration, could you include a second (separate) patch to
>> syscalls.2  and getauxval.3 that adds
>> .BR vdso (7)
>> under SEE ALSO.
>>
>> Thanks,
>>
>> Michael
> 
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vdso(7): new man page
       [not found]         ` <519CC681.6080502-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2013-06-27  0:00           ` Michael Kerrisk (man-pages)
@ 2013-12-31  7:32           ` Mike Frysinger
       [not found]             ` <201312310232.23392.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2013-12-31  7:32 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski

[-- Attachment #1: Type: Text/Plain, Size: 2749 bytes --]

On Wednesday 22 May 2013 09:22:09 Michael Kerrisk wrote:
> On 04/12/13 03:28, Mike Frysinger wrote:
> > here's v2 w/Andy's feedback
> 
> Thanks for this--it's a nice piece of work. Could you take a
> look at my comments below and send a v3, please.

anything i didn't explicitly respond to below i merged with my version

> > the kernel you wish to make a syscall.
> > However, this instruction is expensive: it goes through the full
> > interrupt handling paths in the processor's microcode as well as in the
> > kernel. Newer processors have faster (but backwards incompatible)
> > instructions to initiate system calls.
> > Rather than require the C library to figure out if this functionality is
> > available at runtime itself, it can use functions provided by the kernel
> > in the vDSO.
> 
> That last point (after the comma) is the most interesting (IMO) of the use
> cases of the vDSO. If you cared to expand on the details (i.e., are what
> are mechanics of the operation of those functions provided by the kernel),
> I think that would be interesting for the reader.

i think the paragraph after this explains things somewhat as you'd like (where 
it talks about gettimeofday) ?

> > All symbols are also versioned (using the GNU version format).
> > This allows the kernel (in the very unlikely situation) to update the
> > function
> 
> s/situation/case that it is necessary/

hmm, i see what you mean, but i think your version isn't really better ... 
just different.  i'll just delete the (...) text.

> > You use the standard C calling conventions when calling any of these
> > functions. No need to worry about weird register or stack behavior.
> 
> That last sentence is a little incomplete. Could you expand/reword a little
> please.

it's meant as a follow up to the previous sentence.  so the implication is 
that there are no functions which violate the C ABI for your particular 
target.  arguments get passed in the standard way (registers/stack), and all 
the registers have corresponding behavior: scratch are scratch, caller-
preserved are caller-preserved, callee-preserved are callee-preserved, etc...

> > Note that the vDSO that is used is based on the ABI of your userspace
> > code and not the ABI of the kernel.
> > i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an
> 
> s/i.e. If/In other words, if/

i.e. shows up a lot in man pages as does e.g. (and both show up in this new 
vdso(7) page) ...

> > So when referring to sections below, use the userspace ABI.
> 
> It's not clear what you mean here when you say "use the userspace ABI."
> Could you clarify?

the two sentences that preceded this one explained things ...
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] vdso(7): new man page
       [not found] ` <201304092317.01590.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  2013-04-11 18:31   ` Andy Lutomirski
  2013-04-12  1:28   ` Mike Frysinger
@ 2013-12-31  7:41   ` Mike Frysinger
       [not found]     ` <1388475665-18491-1-git-send-email-vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  2 siblings, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2013-12-31  7:41 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA

---
 man2/syscall.2   |   6 +-
 man2/syscalls.2  |   3 +-
 man3/getauxval.3 |   4 +-
 man7/libc.7      |   5 +-
 man7/vdso.7      | 457 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 468 insertions(+), 7 deletions(-)
 create mode 100644 man7/vdso.7

diff --git a/man2/syscall.2 b/man2/syscall.2
index e712b41..fe5f86d 100644
--- a/man2/syscall.2
+++ b/man2/syscall.2
@@ -145,7 +145,8 @@ The details for various architectures are listed in the two tables below.
 
 The first table lists the instruction used to transition to kernel mode,
 (which might not be the fastest or best way to transition to the kernel,
-so you might have to refer to the VDSO),
+so you might have to refer to
+.BR vdso (7)),
 the register used to indicate the system call number,
 and the register used to return the system call result.
 .if t \{\
@@ -219,4 +220,5 @@ main(int argc, char *argv[])
 .SH SEE ALSO
 .BR _syscall (2),
 .BR intro (2),
-.BR syscalls (2)
+.BR syscalls (2),
+.BR vdso (7)
diff --git a/man2/syscalls.2 b/man2/syscalls.2
index 265c654..0d085e1 100644
--- a/man2/syscalls.2
+++ b/man2/syscalls.2
@@ -833,4 +833,5 @@ and similarly
 .SH SEE ALSO
 .BR syscall (2),
 .BR unimplemented (2),
-.BR libc (7)
+.BR libc (7),
+.BR vdso (7)
diff --git a/man3/getauxval.3 b/man3/getauxval.3
index 8f27932..09d5bdc 100755
--- a/man3/getauxval.3
+++ b/man3/getauxval.3
@@ -210,7 +210,5 @@ see
 for more information.
 .SH SEE ALSO
 .BR secure_getenv (3),
+.BR vdso (7),
 .BR ld-linux.so (8)
-
-The kernel source file
-.IR Documentation/ABI/stable/vdso
diff --git a/man7/libc.7 b/man7/libc.7
index a9aeba2..f687ced 100644
--- a/man7/libc.7
+++ b/man7/libc.7
@@ -98,6 +98,9 @@ Details of these libraries are generally not covered by the
 project.
 .SH SEE ALSO
 .BR syscalls (2),
+.BR getauxval (3),
+.BR proc (5),
 .BR feature_test_macros (7),
 .BR man-pages (7),
-.BR standards (7)
+.BR standards (7),
+.BR vdso (7)
diff --git a/man7/vdso.7 b/man7/vdso.7
new file mode 100644
index 0000000..3c4b7fb
--- /dev/null
+++ b/man7/vdso.7
@@ -0,0 +1,457 @@
+.\" Written by Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
+.\"
+.\" %%%LICENSE_START(PUBLIC_DOMAIN)
+.\" This page is in the public domain.
+.\" %%%LICENSE_END
+.\"
+.TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
+.SH NAME
+vDSO \- overview of the virtual ELF dynamic shared object
+.SH SYNOPSIS
+.B #include <sys/auxv.h>
+
+.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
+.SH DESCRIPTION
+The "vDSO" is a small shared library that the kernel automatically maps into the
+address space of all user-space applications.
+Applications themselves usually need not concern themselves with these details
+as the vDSO is most commonly called by the C library.
+This way you can write using standard functions and the C library will take care
+of using any available functionality.
+
+Why does the vDSO exist at all?
+There are some facilities the kernel provides that user space ends up using
+frequently to the point that such calls can dominate overall performance.
+This is due both to the frequency of the call as well as the context overhead
+from exiting user space and entering the kernel.
+
+The rest of this documentation is geared towards the curious and/or C library
+writers rather than general developers.
+If you're trying to call the vDSO in your own application rather than using
+the C library, you're most likely doing it wrong.
+.SS Example background
+Making system calls can be slow.
+In x86 32-bit systems, you can trigger a software interrupt (int $0x80) to tell
+the kernel you wish to make a system call.
+However, this instruction is expensive: it goes through the full interrupt
+handling paths in the processor's microcode as well as in the kernel.
+Newer processors have faster (but backwards incompatible) instructions to
+initiate system calls.
+Rather than require the C library to figure out if this functionality is
+available at runtime itself, it can use functions provided by the kernel in
+the vDSO.
+
+Note that the terminology can be confusing.
+On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64,
+the term "vsyscall" also refers to an obsolete way to ask the kernel what time
+it is or what CPU the caller is on.
+
+One system call frequently called is gettimeofday().
+This is called both directly by user-space applications as well as indirectly by
+the C library.
+Think timestamps or timing loops or polling -- all of these frequently need to
+know what time it is right now.
+This information is also not secret -- any application in any privilege mode
+(root or any user) will get the same answer.
+Thus the kernel arranges for the information required to answer this question
+to be placed in memory the process can access.
+Now a call to gettimeofday() changes from a system call to a normal function
+call and a few memory accesses.
+.SS Finding the vDSO
+The base address of the vDSO (if one exists) is passed by the kernel to each
+program in the initial auxiliary vector.
+Specifically, via the
+.B AT_SYSINFO_EHDR
+tag.
+
+You must not assume the vDSO is mapped at any particular location in the
+user's memory map.
+The base address will usually be randomized at runtime every time a new
+process image is created (at
+.BR execve (2)
+time).
+This is done for security reasons to prevent standard "return-to-libc" attacks.
+
+For some architectures, there is also a
+.B AT_SYSINFO
+tag.
+This is used only for locating the vsyscall entry point and is frequently
+omitted or set to 0 (meaning it's not available).
+It is a throwback to the initial vDSO work (see
+.IR HISTORY
+below) and should be avoided.
+
+Refer to
+.BR getauxval (3)
+for more details on accessing these fields.
+.SS File format
+Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
+This allows new symbols to be added with newer kernel releases, and for the
+C library to detect available functionality at runtime when running under
+different kernel versions.
+Often times the C library will do detection with the first call and then
+cache the result for subsequent calls.
+
+All symbols are also versioned (using the GNU version format).
+This allows the kernel to update the function signature without breaking
+backwards compatibility.
+This means changing the arguments that the function accepts as well as the
+return value.
+Thus, when looking up a symbol in the vDSO, you must always include the version
+to match the ABI you expect.
+
+Typically the vDSO follows the naming convention of prefixing all symbols with
+"__vdso_" or "__kernel_" so as to distinguish them from other standard symbols.
+e.g. The "gettimeofday" function is named "__vdso_gettimeofday".
+
+You use the standard C calling conventions when calling any of these functions.
+No need to worry about weird register or stack behavior.
+.SH NOTES
+.SS Source
+When you compile the kernel, it will automatically compile and link the vDSO
+code for you.
+You will frequently find it under the architecture-specific dir:
+
+    find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
+
+Note that the vDSO that is used is based on the ABI of your user-space code
+and not the ABI of the kernel.
+i.e. If you run an i386 32-bit ELF under an i386 32-bit kernel or under an
+x86_64 64-bit kernel, you'll get the same vDSO.
+So when referring to sections below, use the user-space ABI.
+.SS vDSO names
+The name of this shared object varies across architectures.
+It will often show up in things like glibc's `ldd` output.
+The exact name should not matter to any code, so do not hardcode it.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+user ABI	vDSO name
+_
+aarch64	linux-vdso.so.1
+ia64	linux-gate.so.1
+ppc/32	linux-vdso32.so.1
+ppc/64	linux-vdso64.so.1
+s390	linux-vdso32.so.1
+s390x	linux-vdso64.so.1
+sh	linux-gate.so.1
+i386	linux-gate.so.1
+x86_64	linux-vdso.so.1
+x86/x32	linux-vdso.so.1
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS arm functions
+.\" See linux/arch/arm/kernel/entry-armv.S
+.\" See linux/Documentation/arm/kernel_user_helpers.txt
+The arm port has a code page full of utility functions.
+Since it's just a raw page of code, there is no ELF information for doing
+symbol lookups or versioning.
+It does provide support for different versions though.
+
+For documentation on this code page, it's better you refer to the kernel doc
+as it's extremely detailed and covers everything you need to know:
+.br
+Documentation/arm/kernel_user_helpers.txt
+.SS aarch64 functions
+.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__kernel_rt_sigreturn	LINUX_2.6.39
+__kernel_gettimeofday	LINUX_2.6.39
+__kernel_clock_gettime	LINUX_2.6.39
+__kernel_clock_getres	LINUX_2.6.39
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS bfin (Blackfin) functions
+.\" See linux/arch/blackfin/kernel/fixed_code.S
+.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
+As this CPU lacks a memory management unit (MMU), it doesn't set up a vDSO in
+the normal sense.
+Instead, it maps at boot time a few raw functions into a fixed location in
+memory.
+User-space applications then call directly into that region.
+There is no provision for backwards compatibility beyond sniffing raw opcodes,
+but as this is an embedded CPU, it can get away with things -- some of the
+object formats it runs aren't even ELF based (they're bFLT/FLAT).
+
+For documentation on this code page, it's better you refer to the public docs:
+.br
+http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
+.SS ia64 (Itanium) functions
+.\" See linux/arch/ia64/kernel/gate.lds.S
+.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__kernel_sigtramp	LINUX_2.5
+__kernel_syscall_via_break	LINUX_2.5
+__kernel_syscall_via_epc	LINUX_2.5
+.TE
+.if t \{\
+.in
+.ft P
+\}
+
+The Itanium port actually likes to get tricky.
+In addition to the vDSO above, it also has "light-weight system calls" (also
+known as "fast syscalls" or "fsys").
+You can invoke these via the __kernel_syscall_via_epc vDSO helper.
+The system calls listed here have the same semantics as if you called them
+directly via
+.BR syscall (3),
+so refer to the relevant
+documentation for each.
+The table below lists the functions available via this mechanism.
+.if t \{\
+.ft CW
+\}
+.TS
+l.
+function
+_
+clock_gettime
+getcpu
+getpid
+getppid
+gettimeofday
+set_tid_address
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS parisc (hppa) functions
+.\" See linux/arch/parisc/kernel/syscall.S
+.\" See linux/Documentation/parisc/registers
+The parisc port has a code page full of utility functions called a gateway page.
+Rather than use the normal ELF aux vector approach, it passes the address of
+the page to the process via the SR2 register.
+The permissions on the page are such that merely executing those addresses
+automatically executes with kernel privileges and not in user-space.
+This is done to match the way HP-UX works.
+
+Since it's just a raw page of code, there is no ELF information for doing
+symbol lookups or versioning.
+Simply call into the appropriate offset via the branch instruction, e.g.:
+.br
+ble <offset>(%sr2, %r0)
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+offset	function
+_
+00b0	lws_entry
+00e0	set_thread_pointer
+0100	linux_gateway_entry (syscall)
+0268	syscall_nosys
+0274	tracesys
+0324	tracesys_next
+0368	tracesys_exit
+03a0	tracesys_sigexit
+03b8	lws_start
+03dc	lws_exit_nosys
+03e0	lws_exit
+03e4	lws_compare_and_swap64
+03e8	lws_compare_and_swap
+0404	cas_wouldblock
+0410	cas_action
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS ppc/32 functions
+.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
+The functions marked with a
+.I *
+below are only available when the kernel is
+a powerpc64 (64-bit) kernel.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__kernel_clock_getres	LINUX_2.6.15
+__kernel_clock_gettime	LINUX_2.6.15
+__kernel_datapage_offset	LINUX_2.6.15
+__kernel_get_syscall_map	LINUX_2.6.15
+__kernel_get_tbfreq	LINUX_2.6.15
+__kernel_getcpu \fI*\fR	LINUX_2.6.15
+__kernel_gettimeofday	LINUX_2.6.15
+__kernel_sigtramp_rt32	LINUX_2.6.15
+__kernel_sigtramp32	LINUX_2.6.15
+__kernel_sync_dicache	LINUX_2.6.15
+__kernel_sync_dicache_p5	LINUX_2.6.15
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS ppc/64 functions
+.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__kernel_clock_getres	LINUX_2.6.15
+__kernel_clock_gettime	LINUX_2.6.15
+__kernel_datapage_offset	LINUX_2.6.15
+__kernel_get_syscall_map	LINUX_2.6.15
+__kernel_get_tbfreq	LINUX_2.6.15
+__kernel_getcpu	LINUX_2.6.15
+__kernel_gettimeofday	LINUX_2.6.15
+__kernel_sigtramp_rt64	LINUX_2.6.15
+__kernel_sync_dicache	LINUX_2.6.15
+__kernel_sync_dicache_p5	LINUX_2.6.15
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS s390 functions
+.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__kernel_clock_getres	LINUX_2.6.29
+__kernel_clock_gettime	LINUX_2.6.29
+__kernel_gettimeofday	LINUX_2.6.29
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS s390x functions
+.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__kernel_clock_getres	LINUX_2.6.29
+__kernel_clock_gettime	LINUX_2.6.29
+__kernel_gettimeofday	LINUX_2.6.29
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS sh (SuperH) functions
+.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__kernel_rt_sigreturn	LINUX_2.6
+__kernel_sigreturn	LINUX_2.6
+__kernel_vsyscall	LINUX_2.6
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS i386 functions
+.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__kernel_sigreturn	LINUX_2.5
+__kernel_rt_sigreturn	LINUX_2.5
+__kernel_vsyscall	LINUX_2.5
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS x86_64 functions
+.\" See linux/arch/x86/vdso/vdso.lds.S
+All of these symbols are also available without the "__vdso_" prefix, but
+you should ignore those and stick to the names below.
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__vdso_clock_gettime	LINUX_2.6
+__vdso_getcpu	LINUX_2.6
+__vdso_gettimeofday	LINUX_2.6
+__vdso_time	LINUX_2.6
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS x86/x32 functions
+.\" See linux/arch/x86/vdso/vdso32.lds.S
+.if t \{\
+.ft CW
+\}
+.TS
+l l.
+symbol	version
+_
+__vdso_clock_gettime	LINUX_2.6
+__vdso_getcpu	LINUX_2.6
+__vdso_gettimeofday	LINUX_2.6
+__vdso_time	LINUX_2.6
+.TE
+.if t \{\
+.in
+.ft P
+\}
+.SS History
+The vDSO was originally just a single function -- the vsyscall.
+In older kernels, you might see that in a process's memory map rather than vdso.
+Over time, people realized that this was a great way to pass more functionality
+to user space, so it was reconceived as a vDSO in the current format.
+.SH SEE ALSO
+.BR syscalls (2),
+.BR getauxval (3),
+.BR proc (5)
+
+The docs/examples/sources in the Linux sources:
+.nf
+Documentation/ABI/stable/vdso
+linux/Documentation/ia64/fsys.txt
+Documentation/vDSO/* (includes examples of using the vDSO)
+find arch/ -iname '*vdso*' -o -iname '*gate*'
+.fi
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: vdso(7): new man page
       [not found]             ` <201312310232.23392.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2014-01-01 10:36               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-01-01 10:36 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski

Hi Mike,

Thanks for following up on this.

On 12/31/13 20:32, Mike Frysinger wrote:
> On Wednesday 22 May 2013 09:22:09 Michael Kerrisk wrote:
>> On 04/12/13 03:28, Mike Frysinger wrote:
>>> here's v2 w/Andy's feedback
>>
>> Thanks for this--it's a nice piece of work. Could you take a
>> look at my comments below and send a v3, please.
> 
> anything i didn't explicitly respond to below i merged with my version
> 
>>> the kernel you wish to make a syscall.
>>> However, this instruction is expensive: it goes through the full
>>> interrupt handling paths in the processor's microcode as well as in the
>>> kernel. Newer processors have faster (but backwards incompatible)
>>> instructions to initiate system calls.
>>> Rather than require the C library to figure out if this functionality is
>>> available at runtime itself, it can use functions provided by the kernel
>>> in the vDSO.
>>
>> That last point (after the comma) is the most interesting (IMO) of the use
>> cases of the vDSO. If you cared to expand on the details (i.e., are what
>> are mechanics of the operation of those functions provided by the kernel),
>> I think that would be interesting for the reader.
> 
> i think the paragraph after this explains things somewhat as you'd like (where 
> it talks about gettimeofday) ?

Yes, thanks.

>>> All symbols are also versioned (using the GNU version format).
>>> This allows the kernel (in the very unlikely situation) to update the
>>> function
>>
>> s/situation/case that it is necessary/
> 
> hmm, i see what you mean, but i think your version isn't really better ... 
> just different.  i'll just delete the (...) text.

Okay.

>>> You use the standard C calling conventions when calling any of these
>>> functions. No need to worry about weird register or stack behavior.
>>
>> That last sentence is a little incomplete. Could you expand/reword a little
>> please.
> 
> it's meant as a follow up to the previous sentence.  so the implication is 
> that there are no functions which violate the C ABI for your particular 
> target.  arguments get passed in the standard way (registers/stack), and all 
> the registers have corresponding behavior: scratch are scratch, caller-
> preserved are caller-preserved, callee-preserved are callee-preserved, etc...

>>> Note that the vDSO that is used is based on the ABI of your userspace
>>> code and not the ABI of the kernel.
>>> i.e. If you run an i386 32bit ELF under an i386 32bit kernel or under an
>>
>> s/i.e. If/In other words, if/
> 
> i.e. shows up a lot in man pages as does e.g. (and both show up in this new 
> vdso(7) page) ...

I should have been clearer. I disfavor the use of "e.g." and "i.e.", except
in parenthetical asides. There were a very few exceptions to that guideline,
and I just now went through and stamped out most of them. And I edited your
page to be consistent with the guideline.


>>> So when referring to sections below, use the userspace ABI.
>>
>> It's not clear what you mean here when you say "use the userspace ABI."
>> Could you clarify?
> 
> the two sentences that preceded this one explained things ...

Sorry -- I still don't get it... (See my other reply.)

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] vdso(7): new man page
       [not found]     ` <1388475665-18491-1-git-send-email-vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2014-01-01 10:38       ` Michael Kerrisk (man-pages)
       [not found]         ` <52C3F01C.9080803-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-01-01 10:38 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA

Hi Mike,

Thanks for the updated patch.

I've applied your patches for the next man-pages release, but would be happy if
you could answer the questions below.

On 12/31/13 20:41, Mike Frysinger wrote:
> ---
>  man2/syscall.2   |   6 +-
>  man2/syscalls.2  |   3 +-
>  man3/getauxval.3 |   4 +-
>  man7/libc.7      |   5 +-
>  man7/vdso.7      | 457 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 468 insertions(+), 7 deletions(-)
>  create mode 100644 man7/vdso.7
> 
> diff --git a/man2/syscall.2 b/man2/syscall.2
> index e712b41..fe5f86d 100644
> --- a/man2/syscall.2
> +++ b/man2/syscall.2
> @@ -145,7 +145,8 @@ The details for various architectures are listed in the two tables below.
>  
>  The first table lists the instruction used to transition to kernel mode,
>  (which might not be the fastest or best way to transition to the kernel,
> -so you might have to refer to the VDSO),
> +so you might have to refer to
> +.BR vdso (7)),
>  the register used to indicate the system call number,
>  and the register used to return the system call result.
>  .if t \{\
> @@ -219,4 +220,5 @@ main(int argc, char *argv[])
>  .SH SEE ALSO
>  .BR _syscall (2),
>  .BR intro (2),
> -.BR syscalls (2)
> +.BR syscalls (2),
> +.BR vdso (7)
> diff --git a/man2/syscalls.2 b/man2/syscalls.2
> index 265c654..0d085e1 100644
> --- a/man2/syscalls.2
> +++ b/man2/syscalls.2
> @@ -833,4 +833,5 @@ and similarly
>  .SH SEE ALSO
>  .BR syscall (2),
>  .BR unimplemented (2),
> -.BR libc (7)
> +.BR libc (7),
> +.BR vdso (7)
> diff --git a/man3/getauxval.3 b/man3/getauxval.3
> index 8f27932..09d5bdc 100755
> --- a/man3/getauxval.3
> +++ b/man3/getauxval.3
> @@ -210,7 +210,5 @@ see
>  for more information.
>  .SH SEE ALSO
>  .BR secure_getenv (3),
> +.BR vdso (7),
>  .BR ld-linux.so (8)
> -
> -The kernel source file
> -.IR Documentation/ABI/stable/vdso
> diff --git a/man7/libc.7 b/man7/libc.7
> index a9aeba2..f687ced 100644
> --- a/man7/libc.7
> +++ b/man7/libc.7
> @@ -98,6 +98,9 @@ Details of these libraries are generally not covered by the
>  project.
>  .SH SEE ALSO
>  .BR syscalls (2),
> +.BR getauxval (3),
> +.BR proc (5),
>  .BR feature_test_macros (7),
>  .BR man-pages (7),
> -.BR standards (7)
> +.BR standards (7),
> +.BR vdso (7)
> diff --git a/man7/vdso.7 b/man7/vdso.7
> new file mode 100644
> index 0000000..3c4b7fb
> --- /dev/null
> +++ b/man7/vdso.7
> @@ -0,0 +1,457 @@
> +.\" Written by Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
> +.\"
> +.\" %%%LICENSE_START(PUBLIC_DOMAIN)
> +.\" This page is in the public domain.
> +.\" %%%LICENSE_END
> +.\"
> +.TH VDSO 7 2013-04-09 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +vDSO \- overview of the virtual ELF dynamic shared object
> +.SH SYNOPSIS
> +.B #include <sys/auxv.h>
> +
> +.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
> +.SH DESCRIPTION
> +The "vDSO" is a small shared library that the kernel automatically maps into the
> +address space of all user-space applications.
> +Applications themselves usually need not concern themselves with these details
> +as the vDSO is most commonly called by the C library.
> +This way you can write using standard functions and the C library will take care

After "write" I added "programs". Okay?

> +of using any available functionality.

I made this piece:

    of using any functionality that is available via the vDSO.

Okay?

> +
> +Why does the vDSO exist at all?
> +There are some facilities the kernel provides that user space ends up using

I changed "facilities" to "system calls". Okay?

> +frequently to the point that such calls can dominate overall performance.
> +This is due both to the frequency of the call as well as the context overhead
> +from exiting user space and entering the kernel.
> +
> +The rest of this documentation is geared towards the curious and/or C library
> +writers rather than general developers.
> +If you're trying to call the vDSO in your own application rather than using
> +the C library, you're most likely doing it wrong.
> +.SS Example background
> +Making system calls can be slow.
> +In x86 32-bit systems, you can trigger a software interrupt (int $0x80) to tell
> +the kernel you wish to make a system call.
> +However, this instruction is expensive: it goes through the full interrupt
> +handling paths in the processor's microcode as well as in the kernel.
> +Newer processors have faster (but backwards incompatible) instructions to
> +initiate system calls.
> +Rather than require the C library to figure out if this functionality is
> +available at runtime itself, it can use functions provided by the kernel in
> +the vDSO.
> +
> +Note that the terminology can be confusing.
> +On x86 systems, the vDSO function is named "__kernel_vsyscall", but on x86_64,

After "function" I added

    used to determine the preferred method of making a system call is

Okay?

> +the term "vsyscall" also refers to an obsolete way to ask the kernel what time
> +it is or what CPU the caller is on.
> +
> +One system call frequently called is gettimeofday().
> +This is called both directly by user-space applications as well as indirectly by
> +the C library.
> +Think timestamps or timing loops or polling -- all of these frequently need to
> +know what time it is right now.
> +This information is also not secret -- any application in any privilege mode
> +(root or any user) will get the same answer.
> +Thus the kernel arranges for the information required to answer this question
> +to be placed in memory the process can access.
> +Now a call to gettimeofday() changes from a system call to a normal function
> +call and a few memory accesses.
> +.SS Finding the vDSO
> +The base address of the vDSO (if one exists) is passed by the kernel to each
> +program in the initial auxiliary vector.
> +Specifically, via the
> +.B AT_SYSINFO_EHDR
> +tag.
> +
> +You must not assume the vDSO is mapped at any particular location in the
> +user's memory map.
> +The base address will usually be randomized at runtime every time a new
> +process image is created (at
> +.BR execve (2)
> +time).
> +This is done for security reasons to prevent standard "return-to-libc" attacks.
> +
> +For some architectures, there is also a
> +.B AT_SYSINFO
> +tag.
> +This is used only for locating the vsyscall entry point and is frequently
> +omitted or set to 0 (meaning it's not available).
> +It is a throwback to the initial vDSO work (see
> +.IR HISTORY
> +below) and should be avoided.
> +
> +Refer to
> +.BR getauxval (3)
> +for more details on accessing these fields.
> +.SS File format
> +Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
> +This allows new symbols to be added with newer kernel releases, and for the
> +C library to detect available functionality at runtime when running under
> +different kernel versions.
> +Often times the C library will do detection with the first call and then
> +cache the result for subsequent calls.
> +
> +All symbols are also versioned (using the GNU version format).
> +This allows the kernel to update the function signature without breaking
> +backwards compatibility.
> +This means changing the arguments that the function accepts as well as the
> +return value.
> +Thus, when looking up a symbol in the vDSO, you must always include the version
> +to match the ABI you expect.
> +
> +Typically the vDSO follows the naming convention of prefixing all symbols with
> +"__vdso_" or "__kernel_" so as to distinguish them from other standard symbols.
> +e.g. The "gettimeofday" function is named "__vdso_gettimeofday".
> +
> +You use the standard C calling conventions when calling any of these functions.
> +No need to worry about weird register or stack behavior.
> +.SH NOTES
> +.SS Source
> +When you compile the kernel, it will automatically compile and link the vDSO
> +code for you.
> +You will frequently find it under the architecture-specific dir:
> +
> +    find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
> +
> +Note that the vDSO that is used is based on the ABI of your user-space code
> +and not the ABI of the kernel.
> +i.e. If you run an i386 32-bit ELF under an i386 32-bit kernel or under an
> +x86_64 64-bit kernel, you'll get the same vDSO.
> +So when referring to sections below, use the user-space ABI.

I still can't make any sense of that last sentence. What are "sections"
in this context? What does it mean to "*use* the user-space ABI"?

> +.SS vDSO names
> +The name of this shared object varies across architectures.
> +It will often show up in things like glibc's `ldd` output.
> +The exact name should not matter to any code, so do not hardcode it.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +user ABI	vDSO name
> +_
> +aarch64	linux-vdso.so.1
> +ia64	linux-gate.so.1
> +ppc/32	linux-vdso32.so.1
> +ppc/64	linux-vdso64.so.1
> +s390	linux-vdso32.so.1
> +s390x	linux-vdso64.so.1
> +sh	linux-gate.so.1
> +i386	linux-gate.so.1
> +x86_64	linux-vdso.so.1
> +x86/x32	linux-vdso.so.1
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS arm functions
> +.\" See linux/arch/arm/kernel/entry-armv.S
> +.\" See linux/Documentation/arm/kernel_user_helpers.txt
> +The arm port has a code page full of utility functions.
> +Since it's just a raw page of code, there is no ELF information for doing
> +symbol lookups or versioning.
> +It does provide support for different versions though.
> +
> +For documentation on this code page, it's better you refer to the kernel doc
> +as it's extremely detailed and covers everything you need to know:
> +.br
> +Documentation/arm/kernel_user_helpers.txt
> +.SS aarch64 functions
> +.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version

You don't explicitly say what tables such as the below are about.
Could you provide me with a sentence to describe them?

Cheers,

Michael




> +_
> +__kernel_rt_sigreturn	LINUX_2.6.39
> +__kernel_gettimeofday	LINUX_2.6.39
> +__kernel_clock_gettime	LINUX_2.6.39
> +__kernel_clock_getres	LINUX_2.6.39
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS bfin (Blackfin) functions
> +.\" See linux/arch/blackfin/kernel/fixed_code.S
> +.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
> +As this CPU lacks a memory management unit (MMU), it doesn't set up a vDSO in
> +the normal sense.
> +Instead, it maps at boot time a few raw functions into a fixed location in
> +memory.
> +User-space applications then call directly into that region.
> +There is no provision for backwards compatibility beyond sniffing raw opcodes,
> +but as this is an embedded CPU, it can get away with things -- some of the
> +object formats it runs aren't even ELF based (they're bFLT/FLAT).
> +
> +For documentation on this code page, it's better you refer to the public docs:
> +.br
> +http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
> +.SS ia64 (Itanium) functions
> +.\" See linux/arch/ia64/kernel/gate.lds.S
> +.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_sigtramp	LINUX_2.5
> +__kernel_syscall_via_break	LINUX_2.5
> +__kernel_syscall_via_epc	LINUX_2.5
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +
> +The Itanium port actually likes to get tricky.
> +In addition to the vDSO above, it also has "light-weight system calls" (also
> +known as "fast syscalls" or "fsys").
> +You can invoke these via the __kernel_syscall_via_epc vDSO helper.
> +The system calls listed here have the same semantics as if you called them
> +directly via
> +.BR syscall (3),
> +so refer to the relevant
> +documentation for each.
> +The table below lists the functions available via this mechanism.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l.
> +function
> +_
> +clock_gettime
> +getcpu
> +getpid
> +getppid
> +gettimeofday
> +set_tid_address
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS parisc (hppa) functions
> +.\" See linux/arch/parisc/kernel/syscall.S
> +.\" See linux/Documentation/parisc/registers
> +The parisc port has a code page full of utility functions called a gateway page.
> +Rather than use the normal ELF aux vector approach, it passes the address of
> +the page to the process via the SR2 register.
> +The permissions on the page are such that merely executing those addresses
> +automatically executes with kernel privileges and not in user-space.
> +This is done to match the way HP-UX works.
> +
> +Since it's just a raw page of code, there is no ELF information for doing
> +symbol lookups or versioning.
> +Simply call into the appropriate offset via the branch instruction, e.g.:
> +.br
> +ble <offset>(%sr2, %r0)
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +offset	function
> +_
> +00b0	lws_entry
> +00e0	set_thread_pointer
> +0100	linux_gateway_entry (syscall)
> +0268	syscall_nosys
> +0274	tracesys
> +0324	tracesys_next
> +0368	tracesys_exit
> +03a0	tracesys_sigexit
> +03b8	lws_start
> +03dc	lws_exit_nosys
> +03e0	lws_exit
> +03e4	lws_compare_and_swap64
> +03e8	lws_compare_and_swap
> +0404	cas_wouldblock
> +0410	cas_action
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS ppc/32 functions
> +.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
> +The functions marked with a
> +.I *
> +below are only available when the kernel is
> +a powerpc64 (64-bit) kernel.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_clock_getres	LINUX_2.6.15
> +__kernel_clock_gettime	LINUX_2.6.15
> +__kernel_datapage_offset	LINUX_2.6.15
> +__kernel_get_syscall_map	LINUX_2.6.15
> +__kernel_get_tbfreq	LINUX_2.6.15
> +__kernel_getcpu \fI*\fR	LINUX_2.6.15
> +__kernel_gettimeofday	LINUX_2.6.15
> +__kernel_sigtramp_rt32	LINUX_2.6.15
> +__kernel_sigtramp32	LINUX_2.6.15
> +__kernel_sync_dicache	LINUX_2.6.15
> +__kernel_sync_dicache_p5	LINUX_2.6.15
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS ppc/64 functions
> +.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_clock_getres	LINUX_2.6.15
> +__kernel_clock_gettime	LINUX_2.6.15
> +__kernel_datapage_offset	LINUX_2.6.15
> +__kernel_get_syscall_map	LINUX_2.6.15
> +__kernel_get_tbfreq	LINUX_2.6.15
> +__kernel_getcpu	LINUX_2.6.15
> +__kernel_gettimeofday	LINUX_2.6.15
> +__kernel_sigtramp_rt64	LINUX_2.6.15
> +__kernel_sync_dicache	LINUX_2.6.15
> +__kernel_sync_dicache_p5	LINUX_2.6.15
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS s390 functions
> +.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_clock_getres	LINUX_2.6.29
> +__kernel_clock_gettime	LINUX_2.6.29
> +__kernel_gettimeofday	LINUX_2.6.29
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS s390x functions
> +.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_clock_getres	LINUX_2.6.29
> +__kernel_clock_gettime	LINUX_2.6.29
> +__kernel_gettimeofday	LINUX_2.6.29
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS sh (SuperH) functions
> +.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_rt_sigreturn	LINUX_2.6
> +__kernel_sigreturn	LINUX_2.6
> +__kernel_vsyscall	LINUX_2.6
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS i386 functions
> +.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__kernel_sigreturn	LINUX_2.5
> +__kernel_rt_sigreturn	LINUX_2.5
> +__kernel_vsyscall	LINUX_2.5
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS x86_64 functions
> +.\" See linux/arch/x86/vdso/vdso.lds.S
> +All of these symbols are also available without the "__vdso_" prefix, but
> +you should ignore those and stick to the names below.
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__vdso_clock_gettime	LINUX_2.6
> +__vdso_getcpu	LINUX_2.6
> +__vdso_gettimeofday	LINUX_2.6
> +__vdso_time	LINUX_2.6
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS x86/x32 functions
> +.\" See linux/arch/x86/vdso/vdso32.lds.S
> +.if t \{\
> +.ft CW
> +\}
> +.TS
> +l l.
> +symbol	version
> +_
> +__vdso_clock_gettime	LINUX_2.6
> +__vdso_getcpu	LINUX_2.6
> +__vdso_gettimeofday	LINUX_2.6
> +__vdso_time	LINUX_2.6
> +.TE
> +.if t \{\
> +.in
> +.ft P
> +\}
> +.SS History
> +The vDSO was originally just a single function -- the vsyscall.
> +In older kernels, you might see that in a process's memory map rather than vdso.
> +Over time, people realized that this was a great way to pass more functionality
> +to user space, so it was reconceived as a vDSO in the current format.
> +.SH SEE ALSO
> +.BR syscalls (2),
> +.BR getauxval (3),
> +.BR proc (5)
> +
> +The docs/examples/sources in the Linux sources:
> +.nf
> +Documentation/ABI/stable/vdso
> +linux/Documentation/ia64/fsys.txt
> +Documentation/vDSO/* (includes examples of using the vDSO)
> +find arch/ -iname '*vdso*' -o -iname '*gate*'
> +.fi
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] vdso(7): new man page
       [not found]         ` <52C3F01C.9080803-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-01-01 17:44           ` Mike Frysinger
       [not found]             ` <201401011244.13632.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2014-01-01 17:44 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: Text/Plain, Size: 3221 bytes --]

On Wednesday 01 January 2014 05:38:20 Michael Kerrisk (man-pages) wrote:
> On 12/31/13 20:41, Mike Frysinger wrote:
> > +The "vDSO" is a small shared library that the kernel automatically maps
> > into the +address space of all user-space applications.
> > +Applications themselves usually need not concern themselves with these
> > details +as the vDSO is most commonly called by the C library.
> > +This way you can write using standard functions and the C library will
> > take care
> 
> After "write" I added "programs". Okay?

you can write libraries too, but i think either wording is fine.  or maybe 
change "write" to "code" ?

> > +of using any available functionality.
> 
> I made this piece:
> 
>     of using any functionality that is available via the vDSO.
> 
> Okay?

np

> > +Why does the vDSO exist at all?
> > +There are some facilities the kernel provides that user space ends up
> > using
> 
> I changed "facilities" to "system calls". Okay?

that wasn't exactly what i was going for, but the nuances are probably lost, 
so it doesn't matter (the vDSO isn't purely a replacement for syscalls).

> > +Note that the terminology can be confusing.
> > +On x86 systems, the vDSO function is named "__kernel_vsyscall", but on
> > x86_64,
> 
> After "function" I added
> 
>     used to determine the preferred method of making a system call is
> 
> Okay?

maybe put in paren ?  either works.

> > Note that the vDSO that is used is based on the ABI of your user-space
> > code and not the ABI of the kernel.
> > i.e. If you run an i386 32-bit ELF under an i386 32-bit kernel or under
> > an x86_64 64-bit kernel, you'll get the same vDSO.
> > So when referring to sections below, use the user-space ABI.
> 
> I still can't make any sense of that last sentence. What are "sections"
> in this context?

"sections" refers to the .SS stuff following this paragraph.  e.g.
	.SS i386 functions
	.SS x86_64 functions
	.SS x86/x32 functions

so if your userspace program is compiled as a 32bit i386 ELF, you should refer 
to the "i386 functions" section even if your kernel is a 64bit x86_64 build.  
but if your userspace program is a 64bit x86_64 program, then refer to the 
x86_64 section.   a single kernel can support many ABIs and execute them 
simultaneously.  but the vDSO that is available is determined by the format of 
your program, not the kernel.

> What does it mean to "*use* the user-space ABI"?

use the userspace ABI as the index into the following sections.

> > +.SS aarch64 functions
> > +.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
> > +.if t \{\
> > +.ft CW
> > +\}
> > +.TS
> > +l l.
> > +symbol	version
> 
> You don't explicitly say what tables such as the below are about.
> Could you provide me with a sentence to describe them?

i only documented the deviations as they don't follow the vDSO standards (ELF 
object that has dynamic symbol information available).  all the standard ones 
may follow Documentation/ABI/stable/vdso/ and Documentation/vDSO/*.  but i 
guess a one line sentence could be added to each of these telling people to 
look at the kernel's vDSO/ dir for more details.
-mike

[-- Attachment #1.2: 0xB902B5271325F892AC251AD441633B9FE837F581.asc --]
[-- Type: application/pgp-keys, Size: 46825 bytes --]

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] vdso(7): new man page
       [not found]             ` <201401011244.13632.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2014-01-01 19:56               ` Michael Kerrisk (man-pages)
       [not found]                 ` <52C472DF.8020107-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-01-01 19:56 UTC (permalink / raw)
  To: Mike Frysinger
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On 01/02/14 06:44, Mike Frysinger wrote:
> On Wednesday 01 January 2014 05:38:20 Michael Kerrisk (man-pages) wrote:
>> On 12/31/13 20:41, Mike Frysinger wrote:
>>> +The "vDSO" is a small shared library that the kernel automatically maps
>>> into the +address space of all user-space applications.
>>> +Applications themselves usually need not concern themselves with these
>>> details +as the vDSO is most commonly called by the C library.
>>> +This way you can write using standard functions and the C library will
>>> take care
>>
>> After "write" I added "programs". Okay?
> 
> you can write libraries too, but i think either wording is fine.  or maybe 
> change "write" to "code" ?

Okay. I made it:

    This way you can code in the normal way using standard functions

>>> +Why does the vDSO exist at all?
>>> +There are some facilities the kernel provides that user space ends up
>>> using
>>
>> I changed "facilities" to "system calls". Okay?
> 
> that wasn't exactly what i was going for, but the nuances are probably lost, 
> so it doesn't matter (the vDSO isn't purely a replacement for syscalls).

I wondered whether you intended further nuances, but after mentioning 
"facilities", you seemed to discuss only system calls.

[...]

>>> Note that the vDSO that is used is based on the ABI of your user-space
>>> code and not the ABI of the kernel.
>>> i.e. If you run an i386 32-bit ELF under an i386 32-bit kernel or under
>>> an x86_64 64-bit kernel, you'll get the same vDSO.
>>> So when referring to sections below, use the user-space ABI.
>>
>> I still can't make any sense of that last sentence. What are "sections"
>> in this context?
> 
> "sections" refers to the .SS stuff following this paragraph.  e.g.
> 	.SS i386 functions
> 	.SS x86_64 functions
> 	.SS x86/x32 functions
> 
> so if your userspace program is compiled as a 32bit i386 ELF, you should refer 
> to the "i386 functions" section even if your kernel is a 64bit x86_64 build.  
> but if your userspace program is a 64bit x86_64 program, then refer to the 
> x86_64 section.   a single kernel can support many ABIs and execute them 
> simultaneously.  but the vDSO that is available is determined by the format of 
> your program, not the kernel.
> 
>> What does it mean to "*use* the user-space ABI"?
> 
> use the userspace ABI as the index into the following sections.

Thanks. I reworked that to:

    Thus, the name of the user-space ABI should be used to determine
    which of the sections below is relevant.

>>> +.SS aarch64 functions
>>> +.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
>>> +.if t \{\
>>> +.ft CW
>>> +\}
>>> +.TS
>>> +l l.
>>> +symbol	version
>>
>> You don't explicitly say what tables such as the below are about.
>> Could you provide me with a sentence to describe them?
> 
> i only documented the deviations as they don't follow the vDSO standards (ELF 
> object that has dynamic symbol information available).  all the standard ones 
> may follow Documentation/ABI/stable/vdso/ and Documentation/vDSO/*.  but i 
> guess a one line sentence could be added to each of these telling people to 
> look at the kernel's vDSO/ dir for more details.
> -mike

I reworked here somewhat, putting the arch-specific details under a
new .SH heading, and adding

    The table below lists the symbols exported by the vDSO.

at the start of many of the subsections.

I've appended the updated page. Could you please take a quick look
to make sure it's okay.

Cheers,

Michael


.\" Written by Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
.\"
.\" %%%LICENSE_START(PUBLIC_DOMAIN)
.\" This page is in the public domain.
.\" %%%LICENSE_END
.\"
.\" Useful background:
.\"   http://articles.manugarg.com/systemcallinlinux2_6.html
.\"   https://lwn.net/Articles/446528/
.\"   http://www.linuxjournal.com/content/creating-vdso-colonels-other-chicken
.\"   http://www.trilithium.com/johan/2005/08/linux-gate/
.\"
.TH VDSO 7 2014-01-01 "Linux" "Linux Programmer's Manual"
.SH NAME
vDSO \- overview of the virtual ELF dynamic shared object
.SH SYNOPSIS
.B #include <sys/auxv.h>

.B void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
.SH DESCRIPTION
The "vDSO" is a small shared library that
the kernel automatically maps into the
address space of all user-space applications.
Applications usually do not need to concern themselves with these details
as the vDSO is most commonly called by the C library.
This way you can code in the normal way using standard functions
and the C library will take care
of using any functionality that is available via the vDSO.

Why does the vDSO exist at all?
There are some system calls the kernel provides that
user space code ends up using frequently,
to the point that such calls can dominate overall performance.
This is due both to the frequency of the call as well as the
context-switch overhead that results from 
from exiting user space and entering the kernel.

The rest of this documentation is geared toward the curious and/or
C library writers rather than general developers.
If you're trying to call the vDSO in your own application rather than using
the C library, you're most likely doing it wrong.
.SS Example background
Making system calls can be slow.
In x86 32-bit systems, you can trigger a software interrupt
.RI ( "int $0x80" )
to tell the kernel you wish to make a system call.
However, this instruction is expensive: it goes through
the full interrupt-handling paths
in the processor's microcode as well as in the kernel.
Newer processors have faster (but backward incompatible) instructions to
initiate system calls.
Rather than require the C library to figure out if this functionality is
available at run time,
the C library can use functions provided by the kernel in
the vDSO.

Note that the terminology can be confusing.
On x86 systems, the vDSO function
used to determine the preferred method of making a system call is
named "__kernel_vsyscall", but on x86_64,
the term "vsyscall" also refers to an obsolete way to ask the kernel
what time it is or what CPU the caller is on.

One frequently used system call is
.BR gettimeofday (2).
This system call is called both directly by user-space applications
as well as indirectly by
the C library.
Think timestamps or timing loops or polling\(emall of these
frequently need to know what time it is right now.
This information is also not secret\(emany application in any
privilege mode (root or any unprivileged user) will get the same answer.
Thus the kernel arranges for the information required to answer
this question to be placed in memory the process can access.
Now a call to
.BR gettimeofday (2)
changes from a system call to a normal function
call and a few memory accesses.
.SS Finding the vDSO
The base address of the vDSO (if one exists) is passed by the kernel to
each program in the initial auxiliary vector (see
.BR getauxval (3)), 
via the
.B AT_SYSINFO_EHDR
tag.

You must not assume the vDSO is mapped at any particular location in the
user's memory map.
The base address will usually be randomized at run time every time a new
process image is created (at
.BR execve (2)
time).
This is done for security reasons,
to prevent "return-to-libc" attacks.

For some architectures, there is also an
.B AT_SYSINFO
tag.
This is used only for locating the vsyscall entry point and is frequently
omitted or set to 0 (meaning it's not available).
This tag is a throwback to the initial vDSO work (see
.IR History
below) and its use should be avoided.
.SS File format
Since the vDSO is a fully formed ELF image, you can do symbol lookups on it.
This allows new symbols to be added with newer kernel releases,
and allows the C library to detect available functionality at
run time when running under different kernel versions.
Oftentimes the C library will do detection with the first call and then
cache the result for subsequent calls.

All symbols are also versioned (using the GNU version format).
This allows the kernel to update the function signature without breaking
backward compatibility.
This means changing the arguments that the function accepts as well as the
return value.
Thus, when looking up a symbol in the vDSO,
you must always include the version
to match the ABI you expect.

Typically the vDSO follows the naming convention of prefixing
all symbols with "__vdso_" or "__kernel_"
so as to distinguish them from other standard symbols.
For example, the "gettimeofday" function is named "__vdso_gettimeofday".

You use the standard C calling conventions when calling
any of these functions.
No need to worry about weird register or stack behavior.
.SH NOTES
.SS Source
When you compile the kernel,
it will automatically compile and link the vDSO code for you.
You will frequently find it under the architecture-specific directory:

    find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'

.SS vDSO names
The name of vDSO shared object varies across architectures.
It will often show up in things like glibc's 
.BR ldd (1)
output.
The exact name should not matter to any code, so do not hardcode it.
.if t \{\
.ft CW
\}
.TS
l l.
user ABI	vDSO name
_
aarch64	linux-vdso.so.1
ia64	linux-gate.so.1
ppc/32	linux-vdso32.so.1
ppc/64	linux-vdso64.so.1
s390	linux-vdso32.so.1
s390x	linux-vdso64.so.1
sh	linux-gate.so.1
i386	linux-gate.so.1
x86_64	linux-vdso.so.1
x86/x32	linux-vdso.so.1
.TE
.if t \{\
.in
.ft P
\}
.SH ARCHITECTURE_SPECIFIC NOTES
The subsections below provide architecture-specific notes
on the vDSO.

Note that the vDSO that is used is based on the ABI of your user-space code
and not the ABI of the kernel.
Thus, for example,
when you run an i386 32-bit ELF binary,
you'll get the same vDSO regardless of whether you run it under
an i386 32-bit kernel or under an x86_64 64-bit kernel.
Thus, the name of the user-space ABI should be used to determine
which of the sections below is relevant.
.SS ARM functions
.\" See linux/arch/arm/kernel/entry-armv.S
.\" See linux/Documentation/arm/kernel_user_helpers.txt
The ARM port has a code page full of utility functions.
Since it's just a raw page of code, there is no ELF information for doing
symbol lookups or versioning.
It does provide support for different versions though.

For information on this code page,
it's best to refer to the kernel documentation
as it's extremely detailed and covers everything you need to know:
.IR Documentation/arm/kernel_user_helpers.txt .
.SS aarch64 functions
.\" See linux/arch/arm64/kernel/vdso/vdso.lds.S
The table below lists the symbols exported by the vDSO.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_rt_sigreturn	LINUX_2.6.39
__kernel_gettimeofday	LINUX_2.6.39
__kernel_clock_gettime	LINUX_2.6.39
__kernel_clock_getres	LINUX_2.6.39
.TE
.if t \{\
.in
.ft P
\}
.SS bfin (Blackfin) functions
.\" See linux/arch/blackfin/kernel/fixed_code.S
.\" See http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
As this CPU lacks a memory management unit (MMU),
it doesn't set up a vDSO in the normal sense.
Instead, it maps at boot time a few raw functions into
a fixed location in memory.
User-space applications then call directly into that region.
There is no provision for backward compatibility
beyond sniffing raw opcodes,
but as this is an embedded CPU, it can get away with things\(emsome of the
object formats it runs aren't even ELF based (they're bFLT/FLAT).

For information on this code page,
it's best to refer to the public documentation:
.br
http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
.SS ia64 (Itanium) functions
.\" See linux/arch/ia64/kernel/gate.lds.S
.\" Also linux/arch/ia64/kernel/fsys.S and linux/Documentation/ia64/fsys.txt
The table below lists the symbols exported by the vDSO.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_sigtramp	LINUX_2.5
__kernel_syscall_via_break	LINUX_2.5
__kernel_syscall_via_epc	LINUX_2.5
.TE
.if t \{\
.in
.ft P
\}

The Itanium port is somewhat tricky.
In addition to the vDSO above, it also has "light-weight system calls"
(also known as "fast syscalls" or "fsys").
You can invoke these via the
.I __kernel_syscall_via_epc
vDSO helper.
The system calls listed here have the same semantics as if you called them
directly via
.BR syscall (2),
so refer to the relevant
documentation for each.
The table below lists the functions available via this mechanism.
.if t \{\
.ft CW
\}
.TS
l.
function
_
clock_gettime
getcpu
getpid
getppid
gettimeofday
set_tid_address
.TE
.if t \{\
.in
.ft P
\}
.SS parisc (hppa) functions
.\" See linux/arch/parisc/kernel/syscall.S
.\" See linux/Documentation/parisc/registers
The parisc port has a code page full of utility functions
called a gateway page.
Rather than use the normal ELF auxiliary vector approach,
it passes the address of
the page to the process via the SR2 register.
The permissions on the page are such that merely executing those addresses
automatically executes with kernel privileges and not in user-space.
This is done to match the way HP-UX works.

Since it's just a raw page of code, there is no ELF information for doing
symbol lookups or versioning.
Simply call into the appropriate offset via the branch instruction,
for example:

    ble <offset>(%sr2, %r0)
.if t \{\
.ft CW
\}
.TS
l l.
offset	function
_
00b0	lws_entry
00e0	set_thread_pointer
0100	linux_gateway_entry (syscall)
0268	syscall_nosys
0274	tracesys
0324	tracesys_next
0368	tracesys_exit
03a0	tracesys_sigexit
03b8	lws_start
03dc	lws_exit_nosys
03e0	lws_exit
03e4	lws_compare_and_swap64
03e8	lws_compare_and_swap
0404	cas_wouldblock
0410	cas_action
.TE
.if t \{\
.in
.ft P
\}
.SS ppc/32 functions
.\" See linux/arch/powerpc/kernel/vdso32/vdso32.lds.S
The table below lists the symbols exported by the vDSO.
The functions marked with a
.I *
are available only when the kernel is
a PowerPC64 (64-bit) kernel.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.15
__kernel_clock_gettime	LINUX_2.6.15
__kernel_datapage_offset	LINUX_2.6.15
__kernel_get_syscall_map	LINUX_2.6.15
__kernel_get_tbfreq	LINUX_2.6.15
__kernel_getcpu \fI*\fR	LINUX_2.6.15
__kernel_gettimeofday	LINUX_2.6.15
__kernel_sigtramp_rt32	LINUX_2.6.15
__kernel_sigtramp32	LINUX_2.6.15
__kernel_sync_dicache	LINUX_2.6.15
__kernel_sync_dicache_p5	LINUX_2.6.15
.TE
.if t \{\
.in
.ft P
\}
.SS ppc/64 functions
.\" See linux/arch/powerpc/kernel/vdso64/vdso64.lds.S
The table below lists the symbols exported by the vDSO.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.15
__kernel_clock_gettime	LINUX_2.6.15
__kernel_datapage_offset	LINUX_2.6.15
__kernel_get_syscall_map	LINUX_2.6.15
__kernel_get_tbfreq	LINUX_2.6.15
__kernel_getcpu	LINUX_2.6.15
__kernel_gettimeofday	LINUX_2.6.15
__kernel_sigtramp_rt64	LINUX_2.6.15
__kernel_sync_dicache	LINUX_2.6.15
__kernel_sync_dicache_p5	LINUX_2.6.15
.TE
.if t \{\
.in
.ft P
\}
.SS s390 functions
.\" See linux/arch/s390/kernel/vdso32/vdso32.lds.S
The table below lists the symbols exported by the vDSO.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.29
__kernel_clock_gettime	LINUX_2.6.29
__kernel_gettimeofday	LINUX_2.6.29
.TE
.if t \{\
.in
.ft P
\}
.SS s390x functions
.\" See linux/arch/s390/kernel/vdso64/vdso64.lds.S
The table below lists the symbols exported by the vDSO.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_clock_getres	LINUX_2.6.29
__kernel_clock_gettime	LINUX_2.6.29
__kernel_gettimeofday	LINUX_2.6.29
.TE
.if t \{\
.in
.ft P
\}
.SS sh (SuperH) functions
.\" See linux/arch/sh/kernel/vsyscall/vsyscall.lds.S
The table below lists the symbols exported by the vDSO.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_rt_sigreturn	LINUX_2.6
__kernel_sigreturn	LINUX_2.6
__kernel_vsyscall	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SS i386 functions
.\" See linux/arch/x86/vdso/vdso32/vdso32.lds.S
The table below lists the symbols exported by the vDSO.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__kernel_sigreturn	LINUX_2.5
__kernel_rt_sigreturn	LINUX_2.5
__kernel_vsyscall	LINUX_2.5
.TE
.if t \{\
.in
.ft P
\}
.SS x86_64 functions
.\" See linux/arch/x86/vdso/vdso.lds.S
The table below lists the symbols exported by the vDSO.
All of these symbols are also available without the "__vdso_" prefix, but
you should ignore those and stick to the names below.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__vdso_clock_gettime	LINUX_2.6
__vdso_getcpu	LINUX_2.6
__vdso_gettimeofday	LINUX_2.6
__vdso_time	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SS x86/x32 functions
.\" See linux/arch/x86/vdso/vdso32.lds.S
The table below lists the symbols exported by the vDSO.
.if t \{\
.ft CW
\}
.TS
l l.
symbol	version
_
__vdso_clock_gettime	LINUX_2.6
__vdso_getcpu	LINUX_2.6
__vdso_gettimeofday	LINUX_2.6
__vdso_time	LINUX_2.6
.TE
.if t \{\
.in
.ft P
\}
.SS History
The vDSO was originally just a single function\(emthe vsyscall.
In older kernels, you might see that name
in a process's memory map rather than "vdso".
Over time, people realized that this mechanism 
was a great way to pass more functionality
to user space, so it was reconceived as a vDSO in the current format.
.SH SEE ALSO
.BR syscalls (2),
.BR getauxval (3),
.BR proc (5)

The documents, examples, and source code in the Linux source code tree:
.in +4n
.nf

Documentation/ABI/stable/vdso
Documentation/ia64/fsys.txt
Documentation/vDSO/* (includes examples of using the vDSO)

find arch/ -iname '*vdso*' -o -iname '*gate*'
.fi
.in
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] vdso(7): new man page
       [not found]                 ` <52C472DF.8020107-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-01-02 12:29                   ` Mike Frysinger
       [not found]                     ` <201401020729.05590.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Frysinger @ 2014-01-02 12:29 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: linux-man-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: Text/Plain, Size: 3449 bytes --]

On Wednesday 01 January 2014 14:56:15 Michael Kerrisk (man-pages) wrote:
> .SH DESCRIPTION
> The "vDSO" is a small shared library that
> the kernel automatically maps into the
> address space of all user-space applications.
> Applications usually do not need to concern themselves with these details
> as the vDSO is most commonly called by the C library.
> This way you can code in the normal way using standard functions
> and the C library will take care
> of using any functionality that is available via the vDSO.

seems like sentences in this new version are excessively wrapped.  for 
example, this first one will easily fit into two lines.  is this just due to the 
editing process ?  content changed but things weren't re-wrapped ?  or do you 
not wrap to 80 cols ?  (this is beyond the rule of "wrap to commas and 
periods").

> There are some system calls the kernel provides that
> user space code ends up using frequently,

shouldn't this be "user-space" now ?

> .SH ARCHITECTURE_SPECIFIC NOTES

change the _ to a space ?

> The subsections below provide architecture-specific notes
> on the vDSO.

another example of a sentence easily fitting on one line (there are many)

> Note that the vDSO that is used is based on the ABI of your user-space code
> and not the ABI of the kernel.
> Thus, for example,
> when you run an i386 32-bit ELF binary,
> you'll get the same vDSO regardless of whether you run it under
> an i386 32-bit kernel or under an x86_64 64-bit kernel.
> Thus, the name of the user-space ABI should be used to determine
> which of the sections below is relevant.

having two sentences in a row start with "Thus" is a little funny sounding.  
could change one to "So" and largely be the same.

> The system calls listed here have the same semantics as if you called them
> directly via
> .BR syscall (2),
> so refer to the relevant
> documentation for each.
> The table below lists the functions available via this mechanism.
> .if t \{\
> .ft CW
> \}
> .TS
> l.
> function
> _
> clock_gettime
> getcpu
> getpid
> getppid
> gettimeofday
> set_tid_address
> .TE
> .if t \{\
> .in
> .ft P
> \}

my troff foo is not strong.  this section renders funny for me -- there's three 
blank lines above the table.  do you see the same thing ?

	The  Itanium port is somewhat tricky.  In addition to the vDSO above, it
	also has "light-weight system calls" (also known as "fast syscalls" or
	"fsys").  You can invoke these via the __kernel_syscall_via_epc vDSO
	helper.  The system calls listed here have the same semantics as if you
	called them directly via syscall(2), so refer to the relevant
	documentation for each.  The table below lists the functions available
	via this mechanism.



       function
       ────────────────
       clock_gettime
       getcpu

> .SS parisc (hppa) functions
> .\" See linux/arch/parisc/kernel/syscall.S
> .\" See linux/Documentation/parisc/registers
> The parisc port has a code page full of utility functions
> called a gateway page.
> Rather than use the normal ELF auxiliary vector approach,
> it passes the address of
> the page to the process via the SR2 register.
> The permissions on the page are such that merely executing those addresses
> automatically executes with kernel privileges and not in user-space.

should be "user space" i think.
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] vdso(7): new man page
       [not found]                     ` <201401020729.05590.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
@ 2014-01-02 19:13                       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-01-02 19:13 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: linux-man

Hi Mike,

On Fri, Jan 3, 2014 at 1:29 AM, Mike Frysinger <vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org> wrote:
> On Wednesday 01 January 2014 14:56:15 Michael Kerrisk (man-pages) wrote:
>> .SH DESCRIPTION
>> The "vDSO" is a small shared library that
>> the kernel automatically maps into the
>> address space of all user-space applications.
>> Applications usually do not need to concern themselves with these details
>> as the vDSO is most commonly called by the C library.
>> This way you can code in the normal way using standard functions
>> and the C library will take care
>> of using any functionality that is available via the vDSO.
>
> seems like sentences in this new version are excessively wrapped.  for
> example, this first one will easily fit into two lines.  is this just due to the
> editing process ?  content changed but things weren't re-wrapped ?  or do you
> not wrap to 80 cols ?  (this is beyond the rule of "wrap to commas and
> periods").

I prefer wrapping to about 75 columns or less. (I document this in
man-pages(7), but perhaps that's not prominently enough.) Reduces the
chances of wrapping problems with patches in some mailers. (And I may
have been overenthusiastic in wrapping lines that were close to 75
chars.)

>> There are some system calls the kernel provides that
>> user space code ends up using frequently,
>
> shouldn't this be "user-space" now ?

Yep.

>> .SH ARCHITECTURE_SPECIFIC NOTES
>
> change the _ to a space ?

Typo. Should have been "-"

>> The subsections below provide architecture-specific notes
>> on the vDSO.
>
> another example of a sentence easily fitting on one line (there are many)
>
>> Note that the vDSO that is used is based on the ABI of your user-space code
>> and not the ABI of the kernel.
>> Thus, for example,
>> when you run an i386 32-bit ELF binary,
>> you'll get the same vDSO regardless of whether you run it under
>> an i386 32-bit kernel or under an x86_64 64-bit kernel.
>> Thus, the name of the user-space ABI should be used to determine
>> which of the sections below is relevant.
>
> having two sentences in a row start with "Thus" is a little funny sounding.
> could change one to "So" and largely be the same.

Thanks. Fixed now.

>> The system calls listed here have the same semantics as if you called them
>> directly via
>> .BR syscall (2),
>> so refer to the relevant
>> documentation for each.
>> The table below lists the functions available via this mechanism.
>> .if t \{\
>> .ft CW
>> \}
>> .TS
>> l.
>> function
>> _
>> clock_gettime
>> getcpu
>> getpid
>> getppid
>> gettimeofday
>> set_tid_address
>> .TE
>> .if t \{\
>> .in
>> .ft P
>> \}
>
> my troff foo is not strong.  this section renders funny for me -- there's three
> blank lines above the table.  do you see the same thing ?

Strange. I don't.

>         The  Itanium port is somewhat tricky.  In addition to the vDSO above, it
>         also has "light-weight system calls" (also known as "fast syscalls" or
>         "fsys").  You can invoke these via the __kernel_syscall_via_epc vDSO
>         helper.  The system calls listed here have the same semantics as if you
>         called them directly via syscall(2), so refer to the relevant
>         documentation for each.  The table below lists the functions available
>         via this mechanism.
>
>
>
>        function
>        ────────────────
>        clock_gettime
>        getcpu
>
>> .SS parisc (hppa) functions
>> .\" See linux/arch/parisc/kernel/syscall.S
>> .\" See linux/Documentation/parisc/registers
>> The parisc port has a code page full of utility functions
>> called a gateway page.
>> Rather than use the normal ELF auxiliary vector approach,
>> it passes the address of
>> the page to the process via the SR2 register.
>> The permissions on the page are such that merely executing those addresses
>> automatically executes with kernel privileges and not in user-space.
>
> should be "user space" i think.

Yup.

Thanks for checking the page, Mike.

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-01-02 19:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-10  3:17 vdso(7): new man page Mike Frysinger
     [not found] ` <201304092317.01590.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-04-11 18:31   ` Andy Lutomirski
     [not found]     ` <CALCETrXwfpH=dRZ82MqjWWL0oFohigcUHgLPnRPpnisOHYxKQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-04-12  1:28       ` Mike Frysinger
2013-04-12  1:28   ` Mike Frysinger
     [not found]     ` <201304112128.47633.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2013-05-22 13:22       ` Michael Kerrisk
     [not found]         ` <519CC681.6080502-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-06-27  0:00           ` Michael Kerrisk (man-pages)
     [not found]             ` <CAKgNAkgwmfBeyijCHj+y2FSQbgSDY8izW-9DAqbw4wgD2y1pAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-12-30 11:27               ` Michael Kerrisk (man-pages)
2013-12-31  7:32           ` Mike Frysinger
     [not found]             ` <201312310232.23392.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2014-01-01 10:36               ` Michael Kerrisk (man-pages)
2013-12-31  7:41   ` [PATCH v3] " Mike Frysinger
     [not found]     ` <1388475665-18491-1-git-send-email-vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2014-01-01 10:38       ` Michael Kerrisk (man-pages)
     [not found]         ` <52C3F01C.9080803-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-01-01 17:44           ` Mike Frysinger
     [not found]             ` <201401011244.13632.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2014-01-01 19:56               ` Michael Kerrisk (man-pages)
     [not found]                 ` <52C472DF.8020107-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-01-02 12:29                   ` Mike Frysinger
     [not found]                     ` <201401020729.05590.vapier-aBrp7R+bbdUdnm+yROfE0A@public.gmane.org>
2014-01-02 19:13                       ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).