From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Andi Kleen <ak@suse.de>
Cc: patches@x86-64.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu
Date: Mon, 30 Apr 2007 22:57:57 -0700 [thread overview]
Message-ID: <4636D6E5.8090006@goop.org> (raw)
In-Reply-To: <20070501035805.9FA9513CAF@wotan.suse.de>
Andi Kleen wrote:
> This implements new vDSO for x86-64. The concept is similar
> to the existing vDSOs on i386 and PPC. x86-64 has had static
> vsyscalls before, but these are not flexible enough anymore.
>
> A vDSO is a ELF shared library supplied by the kernel that is mapped into
> user address space. The vDSO mapping is randomized for each process
> for security reasons.
>
> Doing this was needed for clock_gettime, because clock_gettime
> always needs a syscall fallback and having one at a fixed
> address would have made buffer overflow exploits too easy to write.
>
> The vdso can be disabled with vdso=0
>
> It currently includes a new gettimeofday implemention and optimized
> clock_gettime(). The gettimeofday implementation is slightly faster
> than the one in the old vsyscall. clock_gettime is significantly faster
> than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
>
> The new calls are generally faster than the old vsyscall.
>
> TBD: add new benchmarks
>
> Advantages over the old x86-64 vsyscalls:
> - Extensible
> - Randomized
> - Cleaner
> - Easier to virtualize (the old static address range previously causes
> overhead e.g. for Xen because it has to create special page tables for it)
>
> Weak points:
> - glibc support still to be written
>
> The VM interface is partly based on Ingo Molnar's i386 version.
>
> Signed-off-by: Andi Kleen <ak@suse.de>
>
> ---
> Documentation/kernel-parameters.txt | 2
> arch/x86_64/Makefile | 3
> arch/x86_64/ia32/ia32_binfmt.c | 1
> arch/x86_64/kernel/time.c | 1
> arch/x86_64/kernel/vmlinux.lds.S | 12 +++
> arch/x86_64/kernel/vsyscall.c | 22 +----
> arch/x86_64/mm/init.c | 17 ++++
> arch/x86_64/vdso/Makefile | 49 ++++++++++++
> arch/x86_64/vdso/vclock_gettime.c | 120 +++++++++++++++++++++++++++++++
> arch/x86_64/vdso/vdso-note.S | 25 ++++++
> arch/x86_64/vdso/vdso-start.S | 2
> arch/x86_64/vdso/vdso.S | 2
> arch/x86_64/vdso/vdso.lds.S | 77 ++++++++++++++++++++
> arch/x86_64/vdso/vextern.h | 16 ++++
> arch/x86_64/vdso/vgetcpu.c | 50 +++++++++++++
> arch/x86_64/vdso/vma.c | 137 ++++++++++++++++++++++++++++++++++++
> arch/x86_64/vdso/voffset.h | 1
> arch/x86_64/vdso/vvar.c | 12 +++
> include/asm-x86_64/auxvec.h | 2
> include/asm-x86_64/elf.h | 13 +++
> include/asm-x86_64/mmu.h | 1
> include/asm-x86_64/pgtable.h | 8 +-
> include/asm-x86_64/vgtod.h | 29 +++++++
> include/asm-x86_64/vsyscall.h | 3
> 24 files changed, 583 insertions(+), 22 deletions(-)
>
> Index: linux/arch/x86_64/ia32/ia32_binfmt.c
> ===================================================================
> --- linux.orig/arch/x86_64/ia32/ia32_binfmt.c
> +++ linux/arch/x86_64/ia32/ia32_binfmt.c
> @@ -38,6 +38,7 @@
>
> int sysctl_vsyscall32 = 1;
>
> +#undef ARCH_DLINFO
> #define ARCH_DLINFO do { \
> if (sysctl_vsyscall32) { \
> NEW_AUX_ENT(AT_SYSINFO, (u32)(u64)VSYSCALL32_VSYSCALL); \
> Index: linux/arch/x86_64/kernel/vmlinux.lds.S
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/vmlinux.lds.S
> +++ linux/arch/x86_64/kernel/vmlinux.lds.S
> @@ -94,6 +94,9 @@ SECTIONS
> .vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data))
> { *(.vsyscall_gtod_data) }
> vsyscall_gtod_data = VVIRT(.vsyscall_gtod_data);
> + .vsyscall_clock : AT(VLOAD(.vsyscall_clock))
> + { *(.vsyscall_clock) }
> + vsyscall_clock = VVIRT(.vsyscall_clock);
>
>
> .vsyscall_1 ADDR(.vsyscall_0) + 1024: AT(VLOAD(.vsyscall_1))
> @@ -153,6 +156,8 @@ SECTIONS
>
> . = ALIGN(4096); /* Init code and data */
> __init_begin = .;
> +
> +
> .init.text : AT(ADDR(.init.text) - LOAD_OFFSET) {
> _sinittext = .;
> *(.init.text)
> @@ -190,6 +195,12 @@ SECTIONS
> .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
> .exit.data : AT(ADDR(.exit.data) - LOAD_OFFSET) { *(.exit.data) }
>
> +/* vdso blob that is mapped into user space */
> + vdso_start = . ;
> + .vdso : AT(ADDR(.vdso) - LOAD_OFFSET) { *(.vdso) }
> + . = ALIGN(4096);
> + vdso_end = .;
> +
> #ifdef CONFIG_BLK_DEV_INITRD
> . = ALIGN(4096);
> __initramfs_start = .;
> @@ -202,6 +213,7 @@ SECTIONS
> .data.percpu : AT(ADDR(.data.percpu) - LOAD_OFFSET) { *(.data.percpu) }
> __per_cpu_end = .;
> . = ALIGN(4096);
> +
> __init_end = .;
>
> . = ALIGN(4096);
> Index: linux/arch/x86_64/mm/init.c
> ===================================================================
> --- linux.orig/arch/x86_64/mm/init.c
> +++ linux/arch/x86_64/mm/init.c
> @@ -159,6 +159,14 @@ static __init void set_pte_phys(unsigned
> __flush_tlb_one(vaddr);
> }
>
> +void __init
> +set_kernel_map(void *vaddr,unsigned long len,unsigned long phys,pgprot_t prot)
> +{
> + void *end = vaddr + ALIGN(len, PAGE_SIZE);
> + for (; vaddr < end; vaddr += PAGE_SIZE, phys += PAGE_SIZE)
> + set_pte_phys((unsigned long)vaddr, phys, prot);
> +}
> +
> /* NOTE: this is meant to be run only at boot */
> void __init
> __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t prot)
> @@ -756,3 +764,12 @@ int in_gate_area_no_task(unsigned long a
> {
> return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
> }
> +
> +const char *arch_vma_name(struct vm_area_struct *vma)
> +{
> + if (vma->vm_mm && vma->vm_start == (long)vma->vm_mm->context.vdso)
> + return "[vdso]";
> + if (vma == &gate_vma)
> + return "[vsyscall]";
> + return NULL;
> +}
> Index: linux/arch/x86_64/vdso/vdso-note.S
> ===================================================================
> --- /dev/null
> +++ linux/arch/x86_64/vdso/vdso-note.S
> @@ -0,0 +1,25 @@
> +/*
> + * This supplies .note.* sections to go into the PT_NOTE inside the vDSO text.
> + * Here we can supply some information useful to userland.
> + */
> +
> +#include <linux/uts.h>
> +#include <linux/version.h>
> +
> +#define ASM_ELF_NOTE_BEGIN(name, flags, vendor, type) \
>
Use linux/elfnote.h?
J
next prev parent reply other threads:[~2007-05-01 5:57 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-01 3:57 [PATCH] [0/30] x86 candidate patches for review VII: VDSO, CPUID, NMI watchdog, MCE, misc Andi Kleen
2007-05-01 3:57 ` [PATCH] [1/30] x86_64: Dynamically adjust machine check interval Andi Kleen
2007-05-01 3:57 ` [PATCH] [2/30] x86_64: set node_possible_map at runtime - try 2 Andi Kleen
2007-05-01 3:58 ` [PATCH] [3/30] i386: Clean up NMI watchdog code Andi Kleen
2007-05-01 3:58 ` [PATCH] [4/30] x86_64: Use the 32bit wd_ops for 64bit too Andi Kleen
2007-05-01 3:58 ` [PATCH] [5/30] x86_64: Define IGNORE_IOCTL() macro for compat_ioctls Andi Kleen
2007-05-01 3:58 ` [PATCH] [6/30] x86_64: Shut up 32bit emulation for SIOCGIFCOUNT Andi Kleen
2007-05-01 3:58 ` [PATCH] [7/30] x86_64: Avoid overflows during apic timer calibration Andi Kleen
2007-05-01 3:58 ` [PATCH] [8/30] x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu Andi Kleen
2007-05-01 5:57 ` Jeremy Fitzhardinge [this message]
2007-05-01 7:23 ` Andi Kleen
2007-05-01 8:00 ` Jeremy Fitzhardinge
2007-05-01 3:58 ` [PATCH] [9/30] x86_64: Use symbolic CPU features in early CPUID check Andi Kleen
2007-05-01 3:58 ` [PATCH] [10/30] x86_64: Drop -traditional for arch/x86_64/boot Andi Kleen
2007-05-01 3:58 ` [PATCH] [11/30] i386: Drop -traditional in arch/i386/boot Andi Kleen
2007-05-01 3:58 ` [PATCH] [12/30] i386: Verify important CPUID bits in real mode Andi Kleen
2007-05-01 3:58 ` [PATCH] [13/30] i386: Evaluate constant cpu features at runtime Andi Kleen
2007-05-01 3:58 ` [PATCH] [14/30] i386: Implement alternative_io for i386 Andi Kleen
2007-05-01 3:58 ` [PATCH] [15/30] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 Andi Kleen
2007-05-01 3:58 ` [PATCH] [16/30] i386: Add X86_FEATURE_RDTSCP Andi Kleen
2007-05-01 3:58 ` [PATCH] [17/30] x86: Use RDTSCP for synchronous get_cycles if possible Andi Kleen
2007-05-01 3:58 ` [PATCH] [18/30] x86_64: Don't enable NUMA for a single node in K8 NUMA scanning Andi Kleen
2007-05-01 3:58 ` [PATCH] [19/30] i386: Little cleanups in smpboot.c Andi Kleen
2007-05-01 3:58 ` [PATCH] [20/30] i386: Remove copy_*_user BUG_ONs for (size < 0) Andi Kleen
2007-05-01 3:58 ` [PATCH] [21/30] x86_64: Print type and size correctly for unknown compat ioctls Andi Kleen
2007-05-01 3:58 ` [PATCH] [22/30] x86_64: Remove CONFIG_REORDER Andi Kleen
2007-05-01 3:58 ` [PATCH] [23/30] x86_64: Share identical video.S between i386 and x86-64 Andi Kleen
2007-05-01 3:58 ` [PATCH] [24/30] x86_64: Shut up warnings for vfat compat ioctls on other file systems Andi Kleen
2007-05-01 15:45 ` Chuck Ebbert
2007-05-02 10:46 ` Andi Kleen
2007-05-01 3:58 ` [PATCH] [25/30] x86_64: Fix allnoconfig error in genapic_flat.c Andi Kleen
2007-05-01 3:58 ` [PATCH] [26/30] i386: Drop noisy e820 debugging printks Andi Kleen
2007-05-01 3:58 ` [PATCH] [27/30] i386: white space fixes in i387.h Andi Kleen
2007-05-01 3:58 ` [PATCH] [28/30] i386: avoid redundant preempt_disable in __unlazy_fpu Andi Kleen
2007-05-01 3:58 ` [PATCH] [29/30] x86_64: Don't exclude asm-offsets.c in Documentation/dontdiff Andi Kleen
2007-05-01 3:58 ` [PATCH] [30/30] x86_64: Add missing !X86_PAE dependincy to the 2G/2G split Andi Kleen
2007-05-01 4:26 ` Eric Dumazet
2007-05-01 6:21 ` Andi Kleen
2007-05-01 13:01 ` Bill Irwin
2007-05-01 13:49 ` Mark Lord
2007-05-01 15:51 ` Eric Dumazet
2007-05-01 17:00 ` Bill Irwin
2007-05-01 17:17 ` Eric W. Biederman
2007-05-01 20:41 ` Eric Dumazet
2007-05-02 9:38 ` Andi Kleen
2007-05-01 4:37 ` William Lee Irwin III
2007-05-01 4:57 ` Eric Dumazet
2007-05-01 5:11 ` William Lee Irwin III
2007-05-01 5:35 ` Eric W. Biederman
2007-05-01 13:32 ` Mark Lord
2007-05-01 14:17 ` William Lee Irwin III
2007-05-01 14:20 ` Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4636D6E5.8090006@goop.org \
--to=jeremy@goop.org \
--cc=ak@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@x86-64.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox