* [patch 0/2] Updates to compat VDSOs @ 2007-04-05 4:58 Jeremy Fitzhardinge 2007-04-05 4:58 ` [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Jeremy Fitzhardinge 2007-04-05 4:58 ` [patch 2/2] Make COMPAT_VDSO runtime selectable Jeremy Fitzhardinge 0 siblings, 2 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-05 4:58 UTC (permalink / raw) To: Andi Kleen; +Cc: Andrew Morton, virtualization, lkml Hi Andi, Here's a couple of patches to fix up COMPAT_VDSO: The first is a straightforward implementation of Jan's original idea of relocating the VDSO to match its mapped location. Unlike Jan and Zach's version, I changed it to relocate based on the phdrs rather than the sections; the result is pleasantly compact. The second patch takes advantage of the fact that all the COMPAT_VDSO work happens at runtime now, and allows compat mode to be enabled dynamically. If you specify vdso=2 on the kernel command line, it comes up in compat mode; vdso=1 is normal vdso mode, and vdso=0 disables vdso altogether. You can also switch modes with sysctl. Thanks, J -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 4:58 [patch 0/2] Updates to compat VDSOs Jeremy Fitzhardinge @ 2007-04-05 4:58 ` Jeremy Fitzhardinge 2007-04-05 6:31 ` Roland McGrath 2007-04-05 7:10 ` Jan Beulich 2007-04-05 4:58 ` [patch 2/2] Make COMPAT_VDSO runtime selectable Jeremy Fitzhardinge 1 sibling, 2 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-05 4:58 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, virtualization, lkml, Zachary Amsden, Jan Beulich, Eric W. Biederman, Ingo Molnar, Roland McGrath [-- Attachment #1: relocate-COMPAT_VDSO.patch --] [-- Type: text/plain, Size: 8707 bytes --] Some versions of libc can't deal with a VDSO which doesn't have its ELF headers matching its mapped address. COMPAT_VDSO maps the VDSO at a specific system-wide fixed address. Previously this was all done at build time, on the grounds that the fixed VDSO address is always at the top of the address space. However, a hypervisor may reserve some of that address space, pushing the fixmap address down. This patch does the adjustment dynamically at runtime, depending on the runtime location of the VDSO fixmap. [ Patch has been through several hands: Jan Beulich wrote the orignal version; Zach reworked it, and Jeremy converted it to relocate phdrs rather than sections. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> --- arch/i386/kernel/entry.S | 4 - arch/i386/kernel/sysenter.c | 95 ++++++++++++++++++++++++++++++++++++------- arch/i386/mm/pgtable.c | 6 -- include/asm-i386/elf.h | 28 ++++-------- include/asm-i386/fixmap.h | 8 --- include/linux/elf.h | 3 + 6 files changed, 95 insertions(+), 49 deletions(-) =================================================================== --- a/arch/i386/kernel/entry.S +++ b/arch/i386/kernel/entry.S @@ -305,16 +305,12 @@ sysenter_past_esp: pushl $(__USER_CS) CFI_ADJUST_CFA_OFFSET 4 /*CFI_REL_OFFSET cs, 0*/ -#ifndef CONFIG_COMPAT_VDSO /* * Push current_thread_info()->sysenter_return to the stack. * A tiny bit of offset fixup is necessary - 4*4 means the 4 words * pushed above; +8 corresponds to copy_thread's esp0 setting. */ pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp) -#else - pushl $SYSENTER_RETURN -#endif CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET eip, 0 =================================================================== --- a/arch/i386/kernel/sysenter.c +++ b/arch/i386/kernel/sysenter.c @@ -22,6 +22,7 @@ #include <asm/msr.h> #include <asm/pgtable.h> #include <asm/unistd.h> +#include <asm/elf.h> /* * Should the kernel map a VDSO page into processes and pass its @@ -41,6 +42,54 @@ __setup("vdso=", vdso_setup); __setup("vdso=", vdso_setup); extern asmlinkage void sysenter_entry(void); + +#ifdef CONFIG_COMPAT_VDSO +static __cpuinit void reloc_dyn(Elf32_Ehdr *ehdr, unsigned offset) +{ + Elf32_Dyn *dyn = (void *)ehdr + offset; + + for(; dyn->d_tag != DT_NULL; dyn++) + switch(dyn->d_tag) { + case DT_PLTGOT: + case DT_HASH: + case DT_STRTAB: + case DT_SYMTAB: + case DT_RELA: + case DT_INIT: + case DT_FINI: + case DT_REL: + case DT_JMPREL: + case DT_VERSYM: + case DT_VERDEF: + case DT_VERNEED: + dyn->d_un.d_val += VDSO_HIGH_BASE; + } +} + +static __cpuinit void relocate_vdso(Elf32_Ehdr *ehdr) +{ + Elf32_Phdr *phdr; + int i; + + BUG_ON(memcmp(ehdr->e_ident, ELFMAG, 4) != 0 || + !elf_check_arch(ehdr) || + ehdr->e_type != ET_DYN); + + ehdr->e_entry += VDSO_HIGH_BASE; + + phdr = (void *)ehdr + ehdr->e_phoff; + for (i = 0; i < ehdr->e_phnum; i++) { + phdr[i].p_vaddr += VDSO_HIGH_BASE; + + if (phdr[i].p_type == PT_DYNAMIC) + reloc_dyn(ehdr, phdr[i].p_offset); + } +} +#else +static inline void relocate_vdso(Elf32_Ehdr *ehdr) +{ +} +#endif /* COMPAT_VDSO */ void enable_sep_cpu(void) { @@ -71,6 +120,9 @@ int __cpuinit sysenter_setup(void) int __cpuinit sysenter_setup(void) { void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC); + const void *vsyscall; + size_t vsyscall_len; + syscall_pages[0] = virt_to_page(syscall_page); #ifdef CONFIG_COMPAT_VDSO @@ -79,23 +131,23 @@ int __cpuinit sysenter_setup(void) #endif if (!boot_cpu_has(X86_FEATURE_SEP)) { - memcpy(syscall_page, - &vsyscall_int80_start, - &vsyscall_int80_end - &vsyscall_int80_start); - return 0; - } - - memcpy(syscall_page, - &vsyscall_sysenter_start, - &vsyscall_sysenter_end - &vsyscall_sysenter_start); - - return 0; -} - -#ifndef CONFIG_COMPAT_VDSO + vsyscall = &vsyscall_int80_start; + vsyscall_len = &vsyscall_int80_end - &vsyscall_int80_start; + } else { + vsyscall = &vsyscall_sysenter_start; + vsyscall_len = &vsyscall_sysenter_end - &vsyscall_sysenter_start; + } + + memcpy(syscall_page, vsyscall, vsyscall_len); + relocate_vdso(syscall_page); + + return 0; +} + /* Defined in vsyscall-sysenter.S */ extern void SYSENTER_RETURN; +#ifdef __HAVE_ARCH_GATE_AREA /* Setup a VMA at program startup for the vsyscall page */ int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack) { @@ -155,4 +207,17 @@ int in_gate_area_no_task(unsigned long a { return 0; } -#endif +#else /* !__HAVE_ARCH_GATE_AREA */ +int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack) +{ + /* + * If not creating userspace VMA, simply set vdso to point to + * fixmap page. + */ + current->mm->context.vdso = (void *)VDSO_HIGH_BASE; + current_thread_info()->sysenter_return = + (void *)VDSO_SYM(&SYSENTER_RETURN); + + return 0; +} +#endif /* __HAVE_ARCH_GATE_AREA */ =================================================================== --- a/arch/i386/mm/pgtable.c +++ b/arch/i386/mm/pgtable.c @@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un } static int fixmaps; -#ifndef CONFIG_COMPAT_VDSO unsigned long __FIXADDR_TOP = 0xfffff000; EXPORT_SYMBOL(__FIXADDR_TOP); -#endif void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags) { @@ -173,12 +171,8 @@ void reserve_top_address(unsigned long r BUG_ON(fixmaps > 0); printk(KERN_INFO "Reserving virtual address space above 0x%08x\n", (int)-reserve); -#ifdef CONFIG_COMPAT_VDSO - BUG_ON(reserve != 0); -#else __FIXADDR_TOP = -reserve - PAGE_SIZE; __VMALLOC_RESERVE += reserve; -#endif } pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) =================================================================== --- a/include/asm-i386/elf.h +++ b/include/asm-i386/elf.h @@ -133,39 +133,31 @@ extern int dump_task_extended_fpu (struc #define ELF_CORE_COPY_XFPREGS(tsk, elf_xfpregs) dump_task_extended_fpu(tsk, elf_xfpregs) #define VDSO_HIGH_BASE (__fix_to_virt(FIX_VDSO)) -#define VDSO_BASE ((unsigned long)current->mm->context.vdso) - -#ifdef CONFIG_COMPAT_VDSO -# define VDSO_COMPAT_BASE VDSO_HIGH_BASE -# define VDSO_PRELINK VDSO_HIGH_BASE -#else -# define VDSO_COMPAT_BASE VDSO_BASE -# define VDSO_PRELINK 0 -#endif +#define VDSO_CURRENT_BASE ((unsigned long)current->mm->context.vdso) +#define VDSO_PRELINK 0 #define VDSO_SYM(x) \ - (VDSO_COMPAT_BASE + (unsigned long)(x) - VDSO_PRELINK) + (VDSO_CURRENT_BASE + (unsigned long)(x) - VDSO_PRELINK) #define VDSO_HIGH_EHDR ((const struct elfhdr *) VDSO_HIGH_BASE) -#define VDSO_EHDR ((const struct elfhdr *) VDSO_COMPAT_BASE) +#define VDSO_EHDR ((const struct elfhdr *) VDSO_CURRENT_BASE) extern void __kernel_vsyscall; #define VDSO_ENTRY VDSO_SYM(&__kernel_vsyscall) -#ifndef CONFIG_COMPAT_VDSO +struct linux_binprm; + #define ARCH_HAS_SETUP_ADDITIONAL_PAGES -struct linux_binprm; extern int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack); -#endif extern unsigned int vdso_enabled; -#define ARCH_DLINFO \ -do if (vdso_enabled) { \ - NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \ - NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_COMPAT_BASE); \ +#define ARCH_DLINFO \ +do if (vdso_enabled) { \ + NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \ + NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_CURRENT_BASE); \ } while (0) #endif =================================================================== --- a/include/asm-i386/fixmap.h +++ b/include/asm-i386/fixmap.h @@ -19,13 +19,9 @@ * Leave one empty page between vmalloc'ed areas and * the start of the fixmap. */ -#ifndef CONFIG_COMPAT_VDSO extern unsigned long __FIXADDR_TOP; -#else -#define __FIXADDR_TOP 0xfffff000 -#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO) -#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1) -#endif +#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO) +#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1) #ifndef __ASSEMBLY__ #include <linux/kernel.h> =================================================================== --- a/include/linux/elf.h +++ b/include/linux/elf.h @@ -83,6 +83,9 @@ typedef __s64 Elf64_Sxword; #define DT_DEBUG 21 #define DT_TEXTREL 22 #define DT_JMPREL 23 +#define DT_VERSYM 0x6ffffff0 +#define DT_VERDEF 0x6ffffffc +#define DT_VERNEED 0x6ffffffe #define DT_LOPROC 0x70000000 #define DT_HIPROC 0x7fffffff -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 4:58 ` [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Jeremy Fitzhardinge @ 2007-04-05 6:31 ` Roland McGrath 2007-04-05 6:46 ` Jeremy Fitzhardinge 2007-04-05 7:10 ` Jan Beulich 1 sibling, 1 reply; 15+ messages in thread From: Roland McGrath @ 2007-04-05 6:31 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: virtualization, Eric W. Biederman, Andrew Morton, Ingo Molnar, Jan Beulich, lkml The patch looks nice and clean. However, it does not relocate the symbol table(s) values. I thought that was done in an earlier version of this I saw, but I might be misremembering. Though not fatal, this is a regression from the previous CONFIG_COMPAT_VDSO behavior. It will show up in things like __kernel_* name display in backtraces. If with your other patch CONFIG_COMPAT_VDSO will become other than a rarely-used compatibility option, then this should be fixed. Note that with your second patch this will also break the symbol values in the randomly-located vma vdso; non-ancient glibc doesn't care if the vdso isn't mapped where its phdrs say, but everything does still care that the symbol tables in an ELF file use addresses matching the phdrs in the same file. Thanks, Roland ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 6:31 ` Roland McGrath @ 2007-04-05 6:46 ` Jeremy Fitzhardinge 2007-04-05 8:14 ` Roland McGrath 0 siblings, 1 reply; 15+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-05 6:46 UTC (permalink / raw) To: Roland McGrath Cc: Andi Kleen, Andrew Morton, virtualization, lkml, Zachary Amsden, Jan Beulich, Eric W. Biederman, Ingo Molnar Roland McGrath wrote: > The patch looks nice and clean. However, it does not relocate the symbol > table(s) values. I thought that was done in an earlier version of this I > saw, but I might be misremembering. Though not fatal, this is a regression > from the previous CONFIG_COMPAT_VDSO behavior. It will show up in things > like __kernel_* name display in backtraces. Hm, OK. It does, but I wasn't sure if it would matter. It should be fairly simple to fix up. > If with your other patch > CONFIG_COMPAT_VDSO will become other than a rarely-used compatibility > option, then this should be fixed. Note that with your second patch this > will also break the symbol values in the randomly-located vma vdso; > non-ancient glibc doesn't care if the vdso isn't mapped where its phdrs > say, but everything does still care that the symbol tables in an ELF file > use addresses matching the phdrs in the same file. > I did the second patch because I could, and to see if it would provoke some comment. But effectively removing a kernel config option seems like a good idea to me. J ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 6:46 ` Jeremy Fitzhardinge @ 2007-04-05 8:14 ` Roland McGrath 0 siblings, 0 replies; 15+ messages in thread From: Roland McGrath @ 2007-04-05 8:14 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: virtualization, Eric W. Biederman, Andrew Morton, Ingo Molnar, Jan Beulich, lkml > I did the second patch because I could, and to see if it would provoke > some comment. But effectively removing a kernel config option seems > like a good idea to me. Well, it provoked me to care whether the first patch the relocation "really right". Thanks, Roland ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 4:58 ` [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Jeremy Fitzhardinge 2007-04-05 6:31 ` Roland McGrath @ 2007-04-05 7:10 ` Jan Beulich 2007-04-05 7:31 ` Jeremy Fitzhardinge 2007-04-05 8:14 ` Roland McGrath 1 sibling, 2 replies; 15+ messages in thread From: Jan Beulich @ 2007-04-05 7:10 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Ingo Molnar, Andrew Morton, virtualization, Roland McGrath, Andi Kleen, lkml, Zachary Amsden, Eric W. Biederman >+static __cpuinit void reloc_dyn(Elf32_Ehdr *ehdr, unsigned offset) >+{ >+ Elf32_Dyn *dyn = (void *)ehdr + offset; >+ >+ for(; dyn->d_tag != DT_NULL; dyn++) >+ switch(dyn->d_tag) { >+ case DT_PLTGOT: >+ case DT_HASH: >+ case DT_STRTAB: >+ case DT_SYMTAB: >+ case DT_RELA: >+ case DT_INIT: >+ case DT_FINI: >+ case DT_REL: >+ case DT_JMPREL: >+ case DT_VERSYM: >+ case DT_VERDEF: >+ case DT_VERNEED: >+ dyn->d_un.d_val += VDSO_HIGH_BASE; >+ } >+} While there's a certain level of control on what DT_* may appear in the vDSO, not even considering other than the above types seems fragile to me. Since future additions to the set are supposedly following a fixed scheme (distinguishing pointers and values via the low bit when below OLD_DT_LOOS, and using sub-ranges when between DT_HIOS and OLD_DT_HIOS), at least also handling those would seem like a good idea, as would warning about unrecognized types. Also, even though it shouldn't matter for the final result, if doing things spec-conforming here you should use d_un.d_ptr. In addition to Roland's remarks about missing symbol table relocation, I would also assume section headers, if present, should be relocated. Jan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 7:10 ` Jan Beulich @ 2007-04-05 7:31 ` Jeremy Fitzhardinge 2007-04-05 7:45 ` Raharjo, Cahyo (cahr) 2007-04-05 7:47 ` Jan Beulich 2007-04-05 8:14 ` Roland McGrath 1 sibling, 2 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-05 7:31 UTC (permalink / raw) To: Jan Beulich Cc: virtualization, Eric W. Biederman, Andrew Morton, Ingo Molnar, lkml, Roland McGrath Jan Beulich wrote: > While there's a certain level of control on what DT_* may appear in the > vDSO, not even considering other than the above types seems fragile to > me. Since future additions to the set are supposedly following a fixed > scheme (distinguishing pointers and values via the low bit when below > OLD_DT_LOOS, and using sub-ranges when between DT_HIOS and > OLD_DT_HIOS), at least also handling those would seem like a good > idea, as would warning about unrecognized types. > I wasn't aware of this scheme. Where is it documented? > Also, even though it shouldn't matter for the final result, if doing things > spec-conforming here you should use d_un.d_ptr. > Yes, I've already fixed that. > In addition to Roland's remarks about missing symbol table relocation, I > would also assume section headers, if present, should be relocated. > Yes, I suppose that's easy enough to add. J ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 7:31 ` Jeremy Fitzhardinge @ 2007-04-05 7:45 ` Raharjo, Cahyo (cahr) 2007-04-05 7:47 ` Jan Beulich 1 sibling, 0 replies; 15+ messages in thread From: Raharjo, Cahyo (cahr) @ 2007-04-05 7:45 UTC (permalink / raw) To: Jeremy Fitzhardinge, Jan Beulich Cc: virtualization, Eric W. Biederman, Andrew Morton, Ingo Molnar, lkml, Roland McGrath Dear All, In this few days, I got emails from this list. I tried to exclude thru majordomo, but it failed. Kindly need your help to exclude me from this list. Thank You, CR -----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Jeremy Fitzhardinge Sent: Thursday, April 05, 2007 2:32 PM To: Jan Beulich Cc: Ingo Molnar; Andrew Morton; virtualization@lists.osdl.org; Roland McGrath; Andi Kleen; lkml; Zachary Amsden; Eric W. Biederman Subject: Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Jan Beulich wrote: > While there's a certain level of control on what DT_* may appear in > the vDSO, not even considering other than the above types seems > fragile to me. Since future additions to the set are supposedly > following a fixed scheme (distinguishing pointers and values via the > low bit when below OLD_DT_LOOS, and using sub-ranges when between > DT_HIOS and OLD_DT_HIOS), at least also handling those would seem like > a good idea, as would warning about unrecognized types. > I wasn't aware of this scheme. Where is it documented? > Also, even though it shouldn't matter for the final result, if doing > things spec-conforming here you should use d_un.d_ptr. > Yes, I've already fixed that. > In addition to Roland's remarks about missing symbol table relocation, > I would also assume section headers, if present, should be relocated. > Yes, I suppose that's easy enough to add. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 7:31 ` Jeremy Fitzhardinge 2007-04-05 7:45 ` Raharjo, Cahyo (cahr) @ 2007-04-05 7:47 ` Jan Beulich 1 sibling, 0 replies; 15+ messages in thread From: Jan Beulich @ 2007-04-05 7:47 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Ingo Molnar, Andrew Morton, virtualization, Roland McGrath, Andi Kleen, lkml, Zachary Amsden, Eric W. Biederman >>> Jeremy Fitzhardinge <jeremy@goop.org> 05.04.07 09:31 >>> >Jan Beulich wrote: >> While there's a certain level of control on what DT_* may appear in the >> vDSO, not even considering other than the above types seems fragile to >> me. Since future additions to the set are supposedly following a fixed >> scheme (distinguishing pointers and values via the low bit when below >> OLD_DT_LOOS, and using sub-ranges when between DT_HIOS and >> OLD_DT_HIOS), at least also handling those would seem like a good >> idea, as would warning about unrecognized types. > >I wasn't aware of this scheme. Where is it documented? Regarding the low bit part, quoting from working gABI chapter 5: "To make it simpler for tools to interpret the contents of dynamic section entries, the value of each tag, except for those in two special compatibility ranges, will determine the interpretation of the d_un union. A tag whose value is an even number indicates a dynamic section entry that uses d_ptr. A tag whose value is an odd number indicates a dynamic section entry that uses d_val or that uses neither d_ptr nor d_val. Tags whose values are less than the special value DT_ENCODING and tags whose values fall between DT_HIOS and DT_LOPROC do not follow these rules." Regarding the OS range, all I can point you to is binutils' include/elf/common.h. Jan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 7:10 ` Jan Beulich 2007-04-05 7:31 ` Jeremy Fitzhardinge @ 2007-04-05 8:14 ` Roland McGrath 2007-04-05 8:18 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 15+ messages in thread From: Roland McGrath @ 2007-04-05 8:14 UTC (permalink / raw) To: Jan Beulich Cc: virtualization, Eric W. Biederman, Andrew Morton, Ingo Molnar, lkml > In addition to Roland's remarks about missing symbol table relocation, I > would also assume section headers, if present, should be relocated. Yes, and also the .symtab as well as .dynsym just in case one ever has one (though I think they are built stripped now, it's not hard to check sh_type for SHT_SYMTAB and call the same routine that works from DT_SYMTAB). Thanks, Roland ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 8:14 ` Roland McGrath @ 2007-04-05 8:18 ` Jeremy Fitzhardinge 2007-04-05 8:54 ` Jan Beulich 2007-04-05 8:58 ` Roland McGrath 0 siblings, 2 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-05 8:18 UTC (permalink / raw) To: Roland McGrath Cc: Jan Beulich, Ingo Molnar, Andrew Morton, virtualization, Andi Kleen, lkml, Zachary Amsden, Eric W. Biederman Roland McGrath wrote: >> In addition to Roland's remarks about missing symbol table relocation, I >> would also assume section headers, if present, should be relocated. >> > > Yes, and also the .symtab as well as .dynsym just in case one ever has one > (though I think they are built stripped now, it's not hard to check sh_type > for SHT_SYMTAB and call the same routine that works from DT_SYMTAB). > OK, how does this look? Does it need to worry about DT_REL/DT_RELA tags? (Not that there are any.) J Subject: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Some versions of libc can't deal with a VDSO which doesn't have its ELF headers matching its mapped address. COMPAT_VDSO maps the VDSO at a specific system-wide fixed address. Previously this was all done at build time, on the grounds that the fixed VDSO address is always at the top of the address space. However, a hypervisor may reserve some of that address space, pushing the fixmap address down. This patch does the adjustment dynamically at runtime, depending on the runtime location of the VDSO fixmap. [ Patch has been through several hands: Jan Beulich wrote the orignal version; Zach reworked it, and Jeremy converted it to relocate phdrs rather than sections. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> --- arch/i386/kernel/entry.S | 4 - arch/i386/kernel/sysenter.c | 95 ++++++++++++++++++++++++++++++++++++------- arch/i386/mm/pgtable.c | 6 -- include/asm-i386/elf.h | 28 ++++-------- include/asm-i386/fixmap.h | 8 --- include/linux/elf.h | 3 + 6 files changed, 95 insertions(+), 49 deletions(-) diff -r 97a792fb6127 arch/i386/kernel/entry.S --- a/arch/i386/kernel/entry.S Wed Apr 04 23:26:37 2007 -0700 +++ b/arch/i386/kernel/entry.S Thu Apr 05 01:11:25 2007 -0700 @@ -305,16 +305,12 @@ sysenter_past_esp: pushl $(__USER_CS) CFI_ADJUST_CFA_OFFSET 4 /*CFI_REL_OFFSET cs, 0*/ -#ifndef CONFIG_COMPAT_VDSO /* * Push current_thread_info()->sysenter_return to the stack. * A tiny bit of offset fixup is necessary - 4*4 means the 4 words * pushed above; +8 corresponds to copy_thread's esp0 setting. */ pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp) -#else - pushl $SYSENTER_RETURN -#endif CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET eip, 0 diff -r 97a792fb6127 arch/i386/kernel/sysenter.c --- a/arch/i386/kernel/sysenter.c Wed Apr 04 23:26:37 2007 -0700 +++ b/arch/i386/kernel/sysenter.c Thu Apr 05 01:11:25 2007 -0700 @@ -22,6 +22,7 @@ #include <asm/msr.h> #include <asm/pgtable.h> #include <asm/unistd.h> +#include <asm/elf.h> /* * Should the kernel map a VDSO page into processes and pass its @@ -41,6 +42,102 @@ __setup("vdso=", vdso_setup); __setup("vdso=", vdso_setup); extern asmlinkage void sysenter_entry(void); + +#ifdef CONFIG_COMPAT_VDSO +static __cpuinit void reloc_symtab(Elf32_Ehdr *ehdr, + unsigned offset, unsigned size) +{ + Elf32_Sym *sym = (void *)ehdr + offset; + unsigned nsym = size / sizeof(*sym); + unsigned i; + + for(i = 0; i < nsym; i++, sym++) { + if (sym->st_shndx == SHN_UNDEF || + sym->st_shndx == SHN_ABS) + continue; /* skip */ + + switch(ELF_ST_TYPE(sym->st_info)) { + case STT_OBJECT: + case STT_FUNC: + case STT_SECTION: + case STT_FILE: + sym->st_value += VDSO_HIGH_BASE; + } + } +} + +struct elfhash { + Elf32_Word nbucket, nchain; +}; + +static __cpuinit void reloc_dyn(Elf32_Ehdr *ehdr, unsigned offset) +{ + Elf32_Dyn *dyn = (void *)ehdr + offset; + + for(; dyn->d_tag != DT_NULL; dyn++) + switch(dyn->d_tag) { + case DT_PLTGOT: + case DT_HASH: + case DT_STRTAB: + case DT_SYMTAB: + case DT_RELA: + case DT_INIT: + case DT_FINI: + case DT_REL: + case DT_DEBUG: + case DT_JMPREL: + case DT_VERSYM: + case DT_VERDEF: + case DT_VERNEED: + case DT_ENCODING ... DT_HIOS: + /* tags above DT_ENCODING are even if they're + a pointer, so skip odd ones */ + if (dyn->d_tag >= DT_ENCODING && + (dyn->d_tag & 1) == 1) + break; + + dyn->d_un.d_ptr += VDSO_HIGH_BASE; + } +} + +static __cpuinit void relocate_vdso(Elf32_Ehdr *ehdr) +{ + Elf32_Phdr *phdr; + Elf32_Shdr *shdr; + int i; + + BUG_ON(memcmp(ehdr->e_ident, ELFMAG, 4) != 0 || + !elf_check_arch(ehdr) || + ehdr->e_type != ET_DYN); + + ehdr->e_entry += VDSO_HIGH_BASE; + + /* rebase phdrs */ + phdr = (void *)ehdr + ehdr->e_phoff; + for (i = 0; i < ehdr->e_phnum; i++) { + phdr[i].p_vaddr += VDSO_HIGH_BASE; + + /* relocate dynamic stuff */ + if (phdr[i].p_type == PT_DYNAMIC) + reloc_dyn(ehdr, phdr[i].p_offset); + } + + /* rebase sections */ + shdr = (void *)ehdr + ehdr->e_shoff; + for(i = 0; i < ehdr->e_shnum; i++) { + shdr[i].sh_addr += VDSO_HIGH_BASE; + + if (shdr[i].sh_type == SHT_SYMTAB || + shdr[i].sh_type == SHT_DYNSYM) + reloc_symtab(ehdr, shdr[i].sh_offset, + shdr[i].sh_size); + } +} +#else +static inline void relocate_vdso(Elf32_Ehdr *ehdr) +{ +} +#endif /* COMPAT_VDSO */ void enable_sep_cpu(void) { @@ -71,6 +168,9 @@ int __cpuinit sysenter_setup(void) int __cpuinit sysenter_setup(void) { void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC); + const void *vsyscall; + size_t vsyscall_len; + syscall_pages[0] = virt_to_page(syscall_page); #ifdef CONFIG_COMPAT_VDSO @@ -79,23 +179,23 @@ int __cpuinit sysenter_setup(void) #endif if (!boot_cpu_has(X86_FEATURE_SEP)) { - memcpy(syscall_page, - &vsyscall_int80_start, - &vsyscall_int80_end - &vsyscall_int80_start); - return 0; - } - - memcpy(syscall_page, - &vsyscall_sysenter_start, - &vsyscall_sysenter_end - &vsyscall_sysenter_start); - - return 0; -} - -#ifndef CONFIG_COMPAT_VDSO + vsyscall = &vsyscall_int80_start; + vsyscall_len = &vsyscall_int80_end - &vsyscall_int80_start; + } else { + vsyscall = &vsyscall_sysenter_start; + vsyscall_len = &vsyscall_sysenter_end - &vsyscall_sysenter_start; + } + + memcpy(syscall_page, vsyscall, vsyscall_len); + relocate_vdso(syscall_page); + + return 0; +} + /* Defined in vsyscall-sysenter.S */ extern void SYSENTER_RETURN; +#ifdef __HAVE_ARCH_GATE_AREA /* Setup a VMA at program startup for the vsyscall page */ int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack) { @@ -155,4 +255,17 @@ int in_gate_area_no_task(unsigned long a { return 0; } -#endif +#else /* !__HAVE_ARCH_GATE_AREA */ +int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack) +{ + /* + * If not creating userspace VMA, simply set vdso to point to + * fixmap page. + */ + current->mm->context.vdso = (void *)VDSO_HIGH_BASE; + current_thread_info()->sysenter_return = + (void *)VDSO_SYM(&SYSENTER_RETURN); + + return 0; +} +#endif /* __HAVE_ARCH_GATE_AREA */ diff -r 97a792fb6127 arch/i386/mm/pgtable.c --- a/arch/i386/mm/pgtable.c Wed Apr 04 23:26:37 2007 -0700 +++ b/arch/i386/mm/pgtable.c Thu Apr 05 01:11:25 2007 -0700 @@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un } static int fixmaps; -#ifndef CONFIG_COMPAT_VDSO unsigned long __FIXADDR_TOP = 0xfffff000; EXPORT_SYMBOL(__FIXADDR_TOP); -#endif void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags) { @@ -173,12 +171,8 @@ void reserve_top_address(unsigned long r BUG_ON(fixmaps > 0); printk(KERN_INFO "Reserving virtual address space above 0x%08x\n", (int)-reserve); -#ifdef CONFIG_COMPAT_VDSO - BUG_ON(reserve != 0); -#else __FIXADDR_TOP = -reserve - PAGE_SIZE; __VMALLOC_RESERVE += reserve; -#endif } pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) diff -r 97a792fb6127 include/asm-i386/elf.h --- a/include/asm-i386/elf.h Wed Apr 04 23:26:37 2007 -0700 +++ b/include/asm-i386/elf.h Thu Apr 05 01:11:25 2007 -0700 @@ -133,39 +133,31 @@ extern int dump_task_extended_fpu (struc #define ELF_CORE_COPY_XFPREGS(tsk, elf_xfpregs) dump_task_extended_fpu(tsk, elf_xfpregs) #define VDSO_HIGH_BASE (__fix_to_virt(FIX_VDSO)) -#define VDSO_BASE ((unsigned long)current->mm->context.vdso) - -#ifdef CONFIG_COMPAT_VDSO -# define VDSO_COMPAT_BASE VDSO_HIGH_BASE -# define VDSO_PRELINK VDSO_HIGH_BASE -#else -# define VDSO_COMPAT_BASE VDSO_BASE -# define VDSO_PRELINK 0 -#endif +#define VDSO_CURRENT_BASE ((unsigned long)current->mm->context.vdso) +#define VDSO_PRELINK 0 #define VDSO_SYM(x) \ - (VDSO_COMPAT_BASE + (unsigned long)(x) - VDSO_PRELINK) + (VDSO_CURRENT_BASE + (unsigned long)(x) - VDSO_PRELINK) #define VDSO_HIGH_EHDR ((const struct elfhdr *) VDSO_HIGH_BASE) -#define VDSO_EHDR ((const struct elfhdr *) VDSO_COMPAT_BASE) +#define VDSO_EHDR ((const struct elfhdr *) VDSO_CURRENT_BASE) extern void __kernel_vsyscall; #define VDSO_ENTRY VDSO_SYM(&__kernel_vsyscall) -#ifndef CONFIG_COMPAT_VDSO +struct linux_binprm; + #define ARCH_HAS_SETUP_ADDITIONAL_PAGES -struct linux_binprm; extern int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack); -#endif extern unsigned int vdso_enabled; -#define ARCH_DLINFO \ -do if (vdso_enabled) { \ - NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \ - NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_COMPAT_BASE); \ +#define ARCH_DLINFO \ +do if (vdso_enabled) { \ + NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \ + NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_CURRENT_BASE); \ } while (0) #endif diff -r 97a792fb6127 include/asm-i386/fixmap.h --- a/include/asm-i386/fixmap.h Wed Apr 04 23:26:37 2007 -0700 +++ b/include/asm-i386/fixmap.h Thu Apr 05 01:11:25 2007 -0700 @@ -19,13 +19,9 @@ * Leave one empty page between vmalloc'ed areas and * the start of the fixmap. */ -#ifndef CONFIG_COMPAT_VDSO extern unsigned long __FIXADDR_TOP; -#else -#define __FIXADDR_TOP 0xfffff000 -#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO) -#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1) -#endif +#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO) +#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1) #ifndef __ASSEMBLY__ #include <linux/kernel.h> diff -r 97a792fb6127 include/linux/elf.h --- a/include/linux/elf.h Wed Apr 04 23:26:37 2007 -0700 +++ b/include/linux/elf.h Thu Apr 05 01:11:25 2007 -0700 @@ -83,6 +83,12 @@ typedef __s64 Elf64_Sxword; #define DT_DEBUG 21 #define DT_TEXTREL 22 #define DT_JMPREL 23 +#define DT_ENCODING 32 +#define DT_LOOS 0x6000000D +#define DT_HIOS 0x6ffff000 +#define DT_VERSYM 0x6ffffff0 +#define DT_VERDEF 0x6ffffffc +#define DT_VERNEED 0x6ffffffe #define DT_LOPROC 0x70000000 #define DT_HIPROC 0x7fffffff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 8:18 ` Jeremy Fitzhardinge @ 2007-04-05 8:54 ` Jan Beulich 2007-04-05 8:58 ` Roland McGrath 1 sibling, 0 replies; 15+ messages in thread From: Jan Beulich @ 2007-04-05 8:54 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Ingo Molnar, Andrew Morton, virtualization, Roland McGrath, Andi Kleen, lkml, Zachary Amsden, Eric W. Biederman >+ for(; dyn->d_tag != DT_NULL; dyn++) >+ switch(dyn->d_tag) { >+ case DT_PLTGOT: >+ case DT_HASH: >+ case DT_STRTAB: >+ case DT_SYMTAB: >+ case DT_RELA: >+ case DT_INIT: >+ case DT_FINI: >+ case DT_REL: >+ case DT_DEBUG: >+ case DT_JMPREL: >+ case DT_VERSYM: >+ case DT_VERDEF: >+ case DT_VERNEED: >+ case DT_ENCODING ... DT_HIOS: >+ /* tags above DT_ENCODING are even if they're >+ a pointer, so skip odd ones */ >+ if (dyn->d_tag >= DT_ENCODING && >+ (dyn->d_tag & 1) == 1) >+ break; >+ >+ dyn->d_un.d_ptr += VDSO_HIGH_BASE; >+ } I'm pretty certain the range OLD_DT_LOOS ... DT_LOOS must be excluded here (the document version I'm looking at is inconsistent in itself here, saying in one place to stop at DT_LOOS, in a second to stop at DT_HIOS, and in a third to include DT_LOOS ... ST_HIOS - the inconsistency goes away if assuming that the stop at DT_LOOS really means stop at OLD_DT_LOOS, and the stop at DT_HIOS misses to special-case the OLD_DT_LOOS ... DT_LOOS range. Additionally I'm a little worried about excluding DT_ADDRRNGLO ... DT_ADDRRNGHI in case future binutils ever start defaulting to generate any of these (namely DT_GNU_HASH). >+#define DT_ENCODING 32 Hmm, I was about to say this ought to be 31 when I realized the discrepancy between document and binutils. I'll have to ask about this... Jan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 8:18 ` Jeremy Fitzhardinge 2007-04-05 8:54 ` Jan Beulich @ 2007-04-05 8:58 ` Roland McGrath 1 sibling, 0 replies; 15+ messages in thread From: Roland McGrath @ 2007-04-05 8:58 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Jan Beulich, Ingo Molnar, Andrew Morton, virtualization, Andi Kleen, lkml, Zachary Amsden, Eric W. Biederman Don't change sh_addr in sections with (sh_flags & SHF_ALLOC) == 0. Also, I'd concur with Jan's suggestion about paranoia warnings for unexpected DT_* types (and for unexpected st_shndx >= SHN_LORESERVE). Otherwise, looks ok to me. Thanks, Roland ^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch 2/2] Make COMPAT_VDSO runtime selectable. 2007-04-05 4:58 [patch 0/2] Updates to compat VDSOs Jeremy Fitzhardinge 2007-04-05 4:58 ` [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Jeremy Fitzhardinge @ 2007-04-05 4:58 ` Jeremy Fitzhardinge 1 sibling, 0 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-05 4:58 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, virtualization, lkml, Zachary Amsden, Jan Beulich, Eric W. Biederman, Ingo Molnar, Roland McGrath [-- Attachment #1: runtime-COMPAT_VDSO-selection.patch --] [-- Type: text/plain, Size: 7163 bytes --] Now that relocation of the VDSO for COMPAT_VDSO users is done at runtime rather than compile time, it is possible to enable/disable compat mode at runtime. This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the kernel command line, or via sysctl. The COMPAT_VDSO config option still exists, but if enabled it just makes vdso_enabled default to VDSO_COMPAT. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> --- Documentation/kernel-parameters.txt | 1 arch/i386/kernel/sysenter.c | 131 ++++++++++++++++++++++------------- include/asm-i386/page.h | 2 3 files changed, 84 insertions(+), 50 deletions(-) =================================================================== --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1807,6 +1807,7 @@ and is between 256 and 4096 characters. [USBHID] The interval which mice are to be polled at. vdso= [IA-32,SH] + vdso=2: enable compat VDSO (default with COMPAT_VDSO) vdso=1: enable VDSO (default) vdso=0: disable VDSO mapping =================================================================== --- a/arch/i386/kernel/sysenter.c +++ b/arch/i386/kernel/sysenter.c @@ -24,11 +24,23 @@ #include <asm/unistd.h> #include <asm/elf.h> +enum { + VDSO_DISABLED = 0, + VDSO_ENABLED = 1, + VDSO_COMPAT = 2, +}; + +#ifdef CONFIG_COMPAT_VDSO +#define VDSO_DEFAULT VDSO_COMPAT +#else +#define VDSO_DEFAULT VDSO_ENABLED +#endif + /* * Should the kernel map a VDSO page into processes and pass its * address down to glibc upon exec()? */ -unsigned int __read_mostly vdso_enabled = 1; +unsigned int __read_mostly vdso_enabled = VDSO_DEFAULT; EXPORT_SYMBOL_GPL(vdso_enabled); @@ -43,7 +55,6 @@ __setup("vdso=", vdso_setup); extern asmlinkage void sysenter_entry(void); -#ifdef CONFIG_COMPAT_VDSO static __cpuinit void reloc_dyn(Elf32_Ehdr *ehdr, unsigned offset) { Elf32_Dyn *dyn = (void *)ehdr + offset; @@ -85,11 +96,6 @@ static __cpuinit void relocate_vdso(Elf3 reloc_dyn(ehdr, phdr[i].p_offset); } } -#else -static inline void relocate_vdso(Elf32_Ehdr *ehdr) -{ -} -#endif /* COMPAT_VDSO */ void enable_sep_cpu(void) { @@ -109,6 +115,25 @@ void enable_sep_cpu(void) put_cpu(); } +static struct vm_area_struct gate_vma; + +static int __cpuinit gate_vma_init(void) +{ + gate_vma.vm_mm = NULL; + gate_vma.vm_start = FIXADDR_USER_START; + gate_vma.vm_end = FIXADDR_USER_END; + gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC; + gate_vma.vm_page_prot = __P101; + /* + * Make sure the vDSO gets into every core dump. + * Dumping its contents makes post-mortem fully interpretable later + * without matching up the same kernel and hardware config to see + * what PC values meant. + */ + gate_vma.vm_flags |= VM_ALWAYSDUMP; + return 0; +} + /* * These symbols are defined by vsyscall.o to mark the bounds * of the ELF DSO images included therein. @@ -117,6 +142,19 @@ extern const char vsyscall_sysenter_star extern const char vsyscall_sysenter_start, vsyscall_sysenter_end; static struct page *syscall_pages[1]; +static void map_compat_vdso(int map) +{ + static int vdso_mapped; + + if (map == vdso_mapped) + return; + + vdso_mapped = map; + + __set_fixmap(FIX_VDSO, page_to_pfn(syscall_pages[0]) << PAGE_SHIFT, + map ? PAGE_READONLY_EXEC : PAGE_NONE); +} + int __cpuinit sysenter_setup(void) { void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC); @@ -125,10 +163,9 @@ int __cpuinit sysenter_setup(void) syscall_pages[0] = virt_to_page(syscall_page); -#ifdef CONFIG_COMPAT_VDSO - __set_fixmap(FIX_VDSO, __pa(syscall_page), PAGE_READONLY_EXEC); + gate_vma_init(); + printk("Compat vDSO mapped to %08lx.\n", __fix_to_virt(FIX_VDSO)); -#endif if (!boot_cpu_has(X86_FEATURE_SEP)) { vsyscall = &vsyscall_int80_start; @@ -147,7 +184,6 @@ int __cpuinit sysenter_setup(void) /* Defined in vsyscall-sysenter.S */ extern void SYSENTER_RETURN; -#ifdef __HAVE_ARCH_GATE_AREA /* Setup a VMA at program startup for the vsyscall page */ int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack) { @@ -156,33 +192,44 @@ int arch_setup_additional_pages(struct l int ret; down_write(&mm->mmap_sem); - addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0); - if (IS_ERR_VALUE(addr)) { - ret = addr; - goto up_fail; - } - - /* - * MAYWRITE to allow gdb to COW and set breakpoints - * - * Make sure the vDSO gets into every core dump. - * Dumping its contents makes post-mortem fully interpretable later - * without matching up the same kernel and hardware config to see - * what PC values meant. - */ - ret = install_special_mapping(mm, addr, PAGE_SIZE, - VM_READ|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| - VM_ALWAYSDUMP, - syscall_pages); - if (ret) - goto up_fail; + + map_compat_vdso(vdso_enabled == VDSO_COMPAT); + + if (vdso_enabled == VDSO_COMPAT) + addr = VDSO_HIGH_BASE; + else { + addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0); + if (IS_ERR_VALUE(addr)) { + ret = addr; + goto up_fail; + } + + /* + * MAYWRITE to allow gdb to COW and set breakpoints + * + * Make sure the vDSO gets into every core dump. + * Dumping its contents makes post-mortem fully + * interpretable later without matching up the same + * kernel and hardware config to see what PC values + * meant. + */ + ret = install_special_mapping(mm, addr, PAGE_SIZE, + VM_READ|VM_EXEC| + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| + VM_ALWAYSDUMP, + syscall_pages); + + if (ret) + goto up_fail; + } current->mm->context.vdso = (void *)addr; current_thread_info()->sysenter_return = - (void *)VDSO_SYM(&SYSENTER_RETURN); -up_fail: + (void *)VDSO_SYM(&SYSENTER_RETURN); + + up_fail: up_write(&mm->mmap_sem); + return ret; } @@ -195,6 +242,8 @@ const char *arch_vma_name(struct vm_area struct vm_area_struct *get_gate_vma(struct task_struct *tsk) { + if (vdso_enabled == VDSO_COMPAT) + return &gate_vma; return NULL; } @@ -207,17 +256,3 @@ int in_gate_area_no_task(unsigned long a { return 0; } -#else /* !__HAVE_ARCH_GATE_AREA */ -int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack) -{ - /* - * If not creating userspace VMA, simply set vdso to point to - * fixmap page. - */ - current->mm->context.vdso = (void *)VDSO_HIGH_BASE; - current_thread_info()->sysenter_return = - (void *)VDSO_SYM(&SYSENTER_RETURN); - - return 0; -} -#endif /* __HAVE_ARCH_GATE_AREA */ =================================================================== --- a/include/asm-i386/page.h +++ b/include/asm-i386/page.h @@ -143,9 +143,7 @@ extern int page_is_ram(unsigned long pag #include <asm-generic/memory_model.h> #include <asm-generic/page.h> -#ifndef CONFIG_COMPAT_VDSO #define __HAVE_ARCH_GATE_AREA 1 -#endif #endif /* __KERNEL__ */ #endif /* _I386_PAGE_H */ -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch 0/2] Updates to compat VDSOs @ 2007-04-05 15:53 Jeremy Fitzhardinge 2007-04-05 15:53 ` [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Jeremy Fitzhardinge 0 siblings, 1 reply; 15+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-05 15:53 UTC (permalink / raw) To: Andi Kleen Cc: Andrew Morton, virtualization, lkml, Zachary Amsden, Jan Beulich, Eric W. Biederman, Ingo Molnar, Roland McGrath Hi Andi, Here's a couple of patches to fix up COMPAT_VDSO: The first is a straightforward implementation of Jan's original idea of relocating the VDSO to match its mapped location. Unlike Jan and Zach's version, I changed it to relocate based on the phdrs rather than the sections; the result is pleasantly compact. The second patch takes advantage of the fact that all the COMPAT_VDSO work happens at runtime now, and allows compat mode to be enabled dynamically. If you specify vdso=2 on the kernel command line, it comes up in compat mode; vdso=1 is normal vdso mode, and vdso=0 disables vdso altogether. You can also switch modes with sysctl. Changes since last posting: - rebase sections as well as phdrs - relocate symbols - more robust DT_tag handling - handle dynamic compat mode switches a bit better Thanks to Jan and Roland for the review. Thanks, J -- ^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO 2007-04-05 15:53 [patch 0/2] Updates to compat VDSOs Jeremy Fitzhardinge @ 2007-04-05 15:53 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-05 15:53 UTC (permalink / raw) To: Andi Kleen Cc: virtualization, Eric W. Biederman, Roland McGrath, Andrew Morton, Ingo Molnar, lkml, Jan Beulich [-- Attachment #1: relocate-COMPAT_VDSO.patch --] [-- Type: text/plain, Size: 10762 bytes --] Some versions of libc can't deal with a VDSO which doesn't have its ELF headers matching its mapped address. COMPAT_VDSO maps the VDSO at a specific system-wide fixed address. Previously this was all done at build time, on the grounds that the fixed VDSO address is always at the top of the address space. However, a hypervisor may reserve some of that address space, pushing the fixmap address down. This patch does the adjustment dynamically at runtime, depending on the runtime location of the VDSO fixmap. [ Patch has been through several hands: Jan Beulich wrote the orignal version; Zach reworked it, and Jeremy converted it to relocate phdrs as well as sections. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> --- arch/i386/kernel/entry.S | 4 - arch/i386/kernel/sysenter.c | 156 ++++++++++++++++++++++++++++++++++++++----- arch/i386/mm/pgtable.c | 6 - include/asm-i386/elf.h | 28 ++----- include/asm-i386/fixmap.h | 8 -- include/linux/elf.h | 10 ++ 6 files changed, 163 insertions(+), 49 deletions(-) =================================================================== --- a/arch/i386/kernel/entry.S +++ b/arch/i386/kernel/entry.S @@ -305,16 +305,12 @@ sysenter_past_esp: pushl $(__USER_CS) CFI_ADJUST_CFA_OFFSET 4 /*CFI_REL_OFFSET cs, 0*/ -#ifndef CONFIG_COMPAT_VDSO /* * Push current_thread_info()->sysenter_return to the stack. * A tiny bit of offset fixup is necessary - 4*4 means the 4 words * pushed above; +8 corresponds to copy_thread's esp0 setting. */ pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp) -#else - pushl $SYSENTER_RETURN -#endif CFI_ADJUST_CFA_OFFSET 4 CFI_REL_OFFSET eip, 0 =================================================================== --- a/arch/i386/kernel/sysenter.c +++ b/arch/i386/kernel/sysenter.c @@ -22,6 +22,7 @@ #include <asm/msr.h> #include <asm/pgtable.h> #include <asm/unistd.h> +#include <asm/elf.h> /* * Should the kernel map a VDSO page into processes and pass its @@ -41,6 +42,115 @@ __setup("vdso=", vdso_setup); __setup("vdso=", vdso_setup); extern asmlinkage void sysenter_entry(void); + +#ifdef CONFIG_COMPAT_VDSO +static __cpuinit void reloc_symtab(Elf32_Ehdr *ehdr, + unsigned offset, unsigned size) +{ + Elf32_Sym *sym = (void *)ehdr + offset; + unsigned nsym = size / sizeof(*sym); + unsigned i; + + for(i = 0; i < nsym; i++, sym++) { + if (sym->st_shndx == SHN_UNDEF || + sym->st_shndx == SHN_ABS) + continue; /* skip */ + + if (sym->st_shndx > SHN_LORESERVE) { + printk(KERN_INFO "VDSO: unexpected st_shndx %x\n", + sym->st_shndx); + continue; + } + + switch(ELF_ST_TYPE(sym->st_info)) { + case STT_OBJECT: + case STT_FUNC: + case STT_SECTION: + case STT_FILE: + sym->st_value += VDSO_HIGH_BASE; + } + } +} + +static __cpuinit void reloc_dyn(Elf32_Ehdr *ehdr, unsigned offset) +{ + Elf32_Dyn *dyn = (void *)ehdr + offset; + + for(; dyn->d_tag != DT_NULL; dyn++) + switch(dyn->d_tag) { + case DT_PLTGOT: + case DT_HASH: + case DT_STRTAB: + case DT_SYMTAB: + case DT_RELA: + case DT_INIT: + case DT_FINI: + case DT_REL: + case DT_DEBUG: + case DT_JMPREL: + case DT_VERSYM: + case DT_VERDEF: + case DT_VERNEED: + case DT_ADDRRNGLO ... DT_ADDRRNGHI: + dyn->d_un.d_ptr += VDSO_HIGH_BASE; + break; + + case DT_ENCODING ... OLD_DT_LOOS: + case DT_LOOS ... DT_HIOS: + /* Tags above DT_ENCODING are pointers if + they're even */ + if (dyn->d_tag >= DT_ENCODING && + (dyn->d_tag & 1) == 0) + dyn->d_un.d_ptr += VDSO_HIGH_BASE; + break; + + default: + printk(KERN_INFO "VDSO: unexpected DT_tag %d\n", + dyn->d_tag); + } +} + +static __cpuinit void relocate_vdso(Elf32_Ehdr *ehdr) +{ + Elf32_Phdr *phdr; + Elf32_Shdr *shdr; + int i; + + BUG_ON(memcmp(ehdr->e_ident, ELFMAG, 4) != 0 || + !elf_check_arch(ehdr) || + ehdr->e_type != ET_DYN); + + ehdr->e_entry += VDSO_HIGH_BASE; + + /* rebase phdrs */ + phdr = (void *)ehdr + ehdr->e_phoff; + for (i = 0; i < ehdr->e_phnum; i++) { + phdr[i].p_vaddr += VDSO_HIGH_BASE; + + /* relocate dynamic stuff */ + if (phdr[i].p_type == PT_DYNAMIC) + reloc_dyn(ehdr, phdr[i].p_offset); + } + + /* rebase sections */ + shdr = (void *)ehdr + ehdr->e_shoff; + for(i = 0; i < ehdr->e_shnum; i++) { + if (!(shdr[i].sh_flags & SHF_ALLOC)) + continue; + + shdr[i].sh_addr += VDSO_HIGH_BASE; + + if (shdr[i].sh_type == SHT_SYMTAB || + shdr[i].sh_type == SHT_DYNSYM) + reloc_symtab(ehdr, shdr[i].sh_offset, + shdr[i].sh_size); + } +} +#else +static inline void relocate_vdso(Elf32_Ehdr *ehdr) +{ +} +#endif /* COMPAT_VDSO */ void enable_sep_cpu(void) { @@ -71,6 +181,9 @@ int __cpuinit sysenter_setup(void) int __cpuinit sysenter_setup(void) { void *syscall_page = (void *)get_zeroed_page(GFP_ATOMIC); + const void *vsyscall; + size_t vsyscall_len; + syscall_pages[0] = virt_to_page(syscall_page); #ifdef CONFIG_COMPAT_VDSO @@ -79,23 +192,23 @@ int __cpuinit sysenter_setup(void) #endif if (!boot_cpu_has(X86_FEATURE_SEP)) { - memcpy(syscall_page, - &vsyscall_int80_start, - &vsyscall_int80_end - &vsyscall_int80_start); - return 0; - } - - memcpy(syscall_page, - &vsyscall_sysenter_start, - &vsyscall_sysenter_end - &vsyscall_sysenter_start); - - return 0; -} - -#ifndef CONFIG_COMPAT_VDSO + vsyscall = &vsyscall_int80_start; + vsyscall_len = &vsyscall_int80_end - &vsyscall_int80_start; + } else { + vsyscall = &vsyscall_sysenter_start; + vsyscall_len = &vsyscall_sysenter_end - &vsyscall_sysenter_start; + } + + memcpy(syscall_page, vsyscall, vsyscall_len); + relocate_vdso(syscall_page); + + return 0; +} + /* Defined in vsyscall-sysenter.S */ extern void SYSENTER_RETURN; +#ifdef __HAVE_ARCH_GATE_AREA /* Setup a VMA at program startup for the vsyscall page */ int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack) { @@ -155,4 +268,17 @@ int in_gate_area_no_task(unsigned long a { return 0; } -#endif +#else /* !__HAVE_ARCH_GATE_AREA */ +int arch_setup_additional_pages(struct linux_binprm *bprm, int exstack) +{ + /* + * If not creating userspace VMA, simply set vdso to point to + * fixmap page. + */ + current->mm->context.vdso = (void *)VDSO_HIGH_BASE; + current_thread_info()->sysenter_return = + (void *)VDSO_SYM(&SYSENTER_RETURN); + + return 0; +} +#endif /* __HAVE_ARCH_GATE_AREA */ =================================================================== --- a/arch/i386/mm/pgtable.c +++ b/arch/i386/mm/pgtable.c @@ -144,10 +144,8 @@ void set_pmd_pfn(unsigned long vaddr, un } static int fixmaps; -#ifndef CONFIG_COMPAT_VDSO unsigned long __FIXADDR_TOP = 0xfffff000; EXPORT_SYMBOL(__FIXADDR_TOP); -#endif void __set_fixmap (enum fixed_addresses idx, unsigned long phys, pgprot_t flags) { @@ -173,12 +171,8 @@ void reserve_top_address(unsigned long r BUG_ON(fixmaps > 0); printk(KERN_INFO "Reserving virtual address space above 0x%08x\n", (int)-reserve); -#ifdef CONFIG_COMPAT_VDSO - BUG_ON(reserve != 0); -#else __FIXADDR_TOP = -reserve - PAGE_SIZE; __VMALLOC_RESERVE += reserve; -#endif } pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address) =================================================================== --- a/include/asm-i386/elf.h +++ b/include/asm-i386/elf.h @@ -133,39 +133,31 @@ extern int dump_task_extended_fpu (struc #define ELF_CORE_COPY_XFPREGS(tsk, elf_xfpregs) dump_task_extended_fpu(tsk, elf_xfpregs) #define VDSO_HIGH_BASE (__fix_to_virt(FIX_VDSO)) -#define VDSO_BASE ((unsigned long)current->mm->context.vdso) - -#ifdef CONFIG_COMPAT_VDSO -# define VDSO_COMPAT_BASE VDSO_HIGH_BASE -# define VDSO_PRELINK VDSO_HIGH_BASE -#else -# define VDSO_COMPAT_BASE VDSO_BASE -# define VDSO_PRELINK 0 -#endif +#define VDSO_CURRENT_BASE ((unsigned long)current->mm->context.vdso) +#define VDSO_PRELINK 0 #define VDSO_SYM(x) \ - (VDSO_COMPAT_BASE + (unsigned long)(x) - VDSO_PRELINK) + (VDSO_CURRENT_BASE + (unsigned long)(x) - VDSO_PRELINK) #define VDSO_HIGH_EHDR ((const struct elfhdr *) VDSO_HIGH_BASE) -#define VDSO_EHDR ((const struct elfhdr *) VDSO_COMPAT_BASE) +#define VDSO_EHDR ((const struct elfhdr *) VDSO_CURRENT_BASE) extern void __kernel_vsyscall; #define VDSO_ENTRY VDSO_SYM(&__kernel_vsyscall) -#ifndef CONFIG_COMPAT_VDSO +struct linux_binprm; + #define ARCH_HAS_SETUP_ADDITIONAL_PAGES -struct linux_binprm; extern int arch_setup_additional_pages(struct linux_binprm *bprm, int executable_stack); -#endif extern unsigned int vdso_enabled; -#define ARCH_DLINFO \ -do if (vdso_enabled) { \ - NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \ - NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_COMPAT_BASE); \ +#define ARCH_DLINFO \ +do if (vdso_enabled) { \ + NEW_AUX_ENT(AT_SYSINFO, VDSO_ENTRY); \ + NEW_AUX_ENT(AT_SYSINFO_EHDR, VDSO_CURRENT_BASE); \ } while (0) #endif =================================================================== --- a/include/asm-i386/fixmap.h +++ b/include/asm-i386/fixmap.h @@ -19,13 +19,9 @@ * Leave one empty page between vmalloc'ed areas and * the start of the fixmap. */ -#ifndef CONFIG_COMPAT_VDSO extern unsigned long __FIXADDR_TOP; -#else -#define __FIXADDR_TOP 0xfffff000 -#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO) -#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1) -#endif +#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO) +#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1) #ifndef __ASSEMBLY__ #include <linux/kernel.h> =================================================================== --- a/include/linux/elf.h +++ b/include/linux/elf.h @@ -83,6 +83,16 @@ typedef __s64 Elf64_Sxword; #define DT_DEBUG 21 #define DT_TEXTREL 22 #define DT_JMPREL 23 +#define DT_ENCODING 32 +#define OLD_DT_LOOS 0x60000000 +#define DT_LOOS 0x6000000d +#define DT_HIOS 0x6ffff000 +#define DT_ADDRRNGLO 0x6ffffe00 +#define DT_ADDRRNGHI 0x6ffffeff +#define DT_VERSYM 0x6ffffff0 +#define DT_VERDEF 0x6ffffffc +#define DT_VERNEED 0x6ffffffe +#define OLD_DT_HIOS 0x6fffffff #define DT_LOPROC 0x70000000 #define DT_HIPROC 0x7fffffff -- ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-04-05 15:53 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-05 4:58 [patch 0/2] Updates to compat VDSOs Jeremy Fitzhardinge 2007-04-05 4:58 ` [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Jeremy Fitzhardinge 2007-04-05 6:31 ` Roland McGrath 2007-04-05 6:46 ` Jeremy Fitzhardinge 2007-04-05 8:14 ` Roland McGrath 2007-04-05 7:10 ` Jan Beulich 2007-04-05 7:31 ` Jeremy Fitzhardinge 2007-04-05 7:45 ` Raharjo, Cahyo (cahr) 2007-04-05 7:47 ` Jan Beulich 2007-04-05 8:14 ` Roland McGrath 2007-04-05 8:18 ` Jeremy Fitzhardinge 2007-04-05 8:54 ` Jan Beulich 2007-04-05 8:58 ` Roland McGrath 2007-04-05 4:58 ` [patch 2/2] Make COMPAT_VDSO runtime selectable Jeremy Fitzhardinge -- strict thread matches above, loose matches on Subject: below -- 2007-04-05 15:53 [patch 0/2] Updates to compat VDSOs Jeremy Fitzhardinge 2007-04-05 15:53 ` [patch 1/2] Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).