* [PATCH v2] x86: Remove compat vdso support
@ 2014-03-11 1:03 Andy Lutomirski
2014-03-11 1:39 ` Linus Torvalds
0 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-11 1:03 UTC (permalink / raw)
To: H. Peter Anvin, Linus Torvalds, x86
Cc: Stefani Seibold, Andreas Brief, Martin Runge,
Linux Kernel Mailing List, Dave Jones, Andy Lutomirski
The compat vDSO is a complicated hack that's needed to maintain
compatibility with a small range of never-released glibc versions.
This removes it and replaces it with a much simpler hack: a config
option to disable the 32-bit vDSO by default.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
This is a bit of an abuse of the no-breaking-userspace policy. It
breaks OpenSUSE 9. The breakage can be fixed up with a config option or
with a boot parameter, though.
This applies to Linus' tree. It does not apply to tip/x86/vdso (and IMO
if it's accepted then the 32-bit vDSO stuff should be rebased on top,
since a couple of the 32-bit vDSO changes will be unnecessary).
I've tested this lightly in 32-bit and 64-bit configurations. It might
be nice to get this into -next to see what shakes loose. The behavior
is as expected on OpenSUSE 9.
The renaming of the config option is very much intentional --
CONFIG_COMPAT_VDSO currently defaults to y and I want everyone to
rethink their default, since I suspect that >99% of people using the
default are doing so incorrectly.
If something like this works, it has a major benefit: we never have to
think about the compat vDSO again :)
Changes from v1: Note that OpenSUSE 9 is affected in the config text.
Documentation/kernel-parameters.txt | 11 +-
arch/x86/Kconfig | 23 ++--
arch/x86/include/asm/elf.h | 4 -
arch/x86/include/asm/fixmap.h | 8 --
arch/x86/include/asm/pgtable_types.h | 7 +-
arch/x86/include/asm/vdso.h | 5 +-
arch/x86/vdso/vdso-layout.lds.S | 2 +-
arch/x86/vdso/vdso32-setup.c | 234 ++++-------------------------------
arch/x86/vdso/vdso32/vdso32.lds.S | 2 -
9 files changed, 47 insertions(+), 249 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 7116fda..67d8e7b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3409,14 +3409,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
of CONFIG_HIGHPTE.
vdso= [X86,SH]
- vdso=2: enable compat VDSO (default with COMPAT_VDSO)
- vdso=1: enable VDSO (default)
+ On X86_32, this is an alias for vdso32=. Otherwise:
+
+ vdso=1: enable VDSO (the default)
vdso=0: disable VDSO mapping
vdso32= [X86]
- vdso32=2: enable compat VDSO (default with COMPAT_VDSO)
- vdso32=1: enable 32-bit VDSO (default)
- vdso32=0: disable 32-bit VDSO mapping
+ vdso32=1: enable 32-bit VDSO (default if
+ ENABLE_VDSO32_BY_DEFAULT)
+ vdso32=0: disable 32-bit VDSO
vector= [IA-64,SMP]
vector=percpu: enable percpu vector domain
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..14461ef 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1835,18 +1835,27 @@ config DEBUG_HOTPLUG_CPU0
If unsure, say N.
-config COMPAT_VDSO
+config ENABLE_VDSO32_BY_DEFAULT
def_bool y
- prompt "Compat VDSO support"
+ prompt "Enable the 32-bit vDSO by default"
depends on X86_32 || IA32_EMULATION
---help---
- Map the 32-bit VDSO to the predictable old-style address too.
+ Certain buggy versions of glibc (those that support a vDSO but
+ do not include commit 49ad572a70b8aeb91e57483a11dd1b) will
+ crash if they are presented with a 32-bit vDSO that not mapped
+ at the address indicated in its segment table.
- Say N here if you are running a sufficiently recent glibc
- version (2.3.3 or later), to remove the high-mapped
- VDSO mapping and to exclusively use the randomized VDSO.
+ There is no released version of glibc that has this problem.
+ OpenSUSE 9, however, uses a buggy glibc.
- If unsure, say Y.
+ While it is theoretically possible for the kernel to provide a
+ usable vDSO for these versions of glibc, doing so is not
+ currently supported.
+
+ Disiabling this option will change the kernel's default to
+ vdso32=0 to workaround the problem. Doing so may cause a
+ performance loss on all system calls. Unless you have a buggy
+ glibc, say Y.
config CMDLINE_BOOL
bool "Built-in kernel command line"
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 9c999c1..2c71182 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -281,16 +281,12 @@ do { \
#define STACK_RND_MASK (0x7ff)
-#define VDSO_HIGH_BASE (__fix_to_virt(FIX_VDSO))
-
#define ARCH_DLINFO ARCH_DLINFO_IA32(vdso_enabled)
/* update AT_VECTOR_SIZE_ARCH if the number of NEW_AUX_ENT entries changes */
#else /* CONFIG_X86_32 */
-#define VDSO_HIGH_BASE 0xffffe000U /* CONFIG_COMPAT_VDSO address */
-
/* 1GB for 64bit, 8MB for 32bit */
#define STACK_RND_MASK (test_thread_flag(TIF_ADDR32) ? 0x7ff : 0x3fffff)
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 7252cd3..2377f56 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -40,15 +40,8 @@
*/
extern unsigned long __FIXADDR_TOP;
#define FIXADDR_TOP ((unsigned long)__FIXADDR_TOP)
-
-#define FIXADDR_USER_START __fix_to_virt(FIX_VDSO)
-#define FIXADDR_USER_END __fix_to_virt(FIX_VDSO - 1)
#else
#define FIXADDR_TOP (VSYSCALL_END-PAGE_SIZE)
-
-/* Only covers 32bit vsyscalls currently. Need another set for 64bit. */
-#define FIXADDR_USER_START ((unsigned long)VSYSCALL32_VSYSCALL)
-#define FIXADDR_USER_END (FIXADDR_USER_START + PAGE_SIZE)
#endif
@@ -74,7 +67,6 @@ extern unsigned long __FIXADDR_TOP;
enum fixed_addresses {
#ifdef CONFIG_X86_32
FIX_HOLE,
- FIX_VDSO,
#else
VSYSCALL_LAST_PAGE,
VSYSCALL_FIRST_PAGE = VSYSCALL_LAST_PAGE
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 1aa9ccd..943f166 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -214,13 +214,8 @@
#ifdef CONFIG_X86_64
#define __PAGE_KERNEL_IDENT_LARGE_EXEC __PAGE_KERNEL_LARGE_EXEC
#else
-/*
- * For PDE_IDENT_ATTR include USER bit. As the PDE and PTE protection
- * bits are combined, this will alow user to access the high address mapped
- * VDSO in the presence of CONFIG_COMPAT_VDSO
- */
#define PTE_IDENT_ATTR 0x003 /* PRESENT+RW */
-#define PDE_IDENT_ATTR 0x067 /* PRESENT+RW+USER+DIRTY+ACCESSED */
+#define PDE_IDENT_ATTR 0x063 /* PRESENT+RW+USER+DIRTY+ACCESSED */
#define PGD_IDENT_ATTR 0x001 /* PRESENT (no other attributes) */
#endif
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index fddb53d..5594e84 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -2,8 +2,6 @@
#define _ASM_X86_VDSO_H
#if defined CONFIG_X86_32 || defined CONFIG_COMPAT
-extern const char VDSO32_PRELINK[];
-
/*
* Given a pointer to the vDSO image, find the pointer to VDSO32_name
* as that symbol is defined in the vDSO sources or linker script.
@@ -11,8 +9,7 @@ extern const char VDSO32_PRELINK[];
#define VDSO32_SYMBOL(base, name) \
({ \
extern const char VDSO32_##name[]; \
- (void __user *)(VDSO32_##name - VDSO32_PRELINK + \
- (unsigned long)(base)); \
+ (void __user *)(VDSO32_##name + (unsigned long)(base)); \
})
#endif
diff --git a/arch/x86/vdso/vdso-layout.lds.S b/arch/x86/vdso/vdso-layout.lds.S
index 634a2cf..8c550c1 100644
--- a/arch/x86/vdso/vdso-layout.lds.S
+++ b/arch/x86/vdso/vdso-layout.lds.S
@@ -6,7 +6,7 @@
SECTIONS
{
- . = VDSO_PRELINK + SIZEOF_HEADERS;
+ . = SIZEOF_HEADERS;
.hash : { *(.hash) } :text
.gnu.hash : { *(.gnu.hash) }
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index d6bfb87..4290ff7 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -26,16 +26,10 @@
#include <asm/vdso.h>
#include <asm/proto.h>
-enum {
- VDSO_DISABLED = 0,
- VDSO_ENABLED = 1,
- VDSO_COMPAT = 2,
-};
-
-#ifdef CONFIG_COMPAT_VDSO
-#define VDSO_DEFAULT VDSO_COMPAT
+#ifdef CONFIG_ENABLE_VDSO32_BY_DEFAULT
+#define VDSO_DEFAULT 1
#else
-#define VDSO_DEFAULT VDSO_ENABLED
+#define VDSO_DEFAULT 0
#endif
#ifdef CONFIG_X86_64
@@ -44,13 +38,6 @@ enum {
#endif
/*
- * This is the difference between the prelinked addresses in the vDSO images
- * and the VDSO_HIGH_BASE address where CONFIG_COMPAT_VDSO places the vDSO
- * in the user address space.
- */
-#define VDSO_ADDR_ADJUST (VDSO_HIGH_BASE - (unsigned long)VDSO32_PRELINK)
-
-/*
* Should the kernel map a VDSO page into processes and pass its
* address down to glibc upon exec()?
*/
@@ -60,6 +47,9 @@ static int __init vdso_setup(char *s)
{
vdso_enabled = simple_strtoul(s, NULL, 0);
+ if (vdso_enabled > 1)
+ printk(KERN_WARNING "vdso32 values other than 0 and 1 are no longer allowed; vdso disabled\n");
+
return 1;
}
@@ -76,123 +66,6 @@ __setup_param("vdso=", vdso32_setup, vdso_setup, 0);
EXPORT_SYMBOL_GPL(vdso_enabled);
#endif
-static __init void reloc_symtab(Elf32_Ehdr *ehdr,
- unsigned offset, unsigned size)
-{
- Elf32_Sym *sym = (void *)ehdr + offset;
- unsigned nsym = size / sizeof(*sym);
- unsigned i;
-
- for(i = 0; i < nsym; i++, sym++) {
- if (sym->st_shndx == SHN_UNDEF ||
- sym->st_shndx == SHN_ABS)
- continue; /* skip */
-
- if (sym->st_shndx > SHN_LORESERVE) {
- printk(KERN_INFO "VDSO: unexpected st_shndx %x\n",
- sym->st_shndx);
- continue;
- }
-
- switch(ELF_ST_TYPE(sym->st_info)) {
- case STT_OBJECT:
- case STT_FUNC:
- case STT_SECTION:
- case STT_FILE:
- sym->st_value += VDSO_ADDR_ADJUST;
- }
- }
-}
-
-static __init void reloc_dyn(Elf32_Ehdr *ehdr, unsigned offset)
-{
- Elf32_Dyn *dyn = (void *)ehdr + offset;
-
- for(; dyn->d_tag != DT_NULL; dyn++)
- switch(dyn->d_tag) {
- case DT_PLTGOT:
- case DT_HASH:
- case DT_STRTAB:
- case DT_SYMTAB:
- case DT_RELA:
- case DT_INIT:
- case DT_FINI:
- case DT_REL:
- case DT_DEBUG:
- case DT_JMPREL:
- case DT_VERSYM:
- case DT_VERDEF:
- case DT_VERNEED:
- case DT_ADDRRNGLO ... DT_ADDRRNGHI:
- /* definitely pointers needing relocation */
- dyn->d_un.d_ptr += VDSO_ADDR_ADJUST;
- break;
-
- case DT_ENCODING ... OLD_DT_LOOS-1:
- case DT_LOOS ... DT_HIOS-1:
- /* Tags above DT_ENCODING are pointers if
- they're even */
- if (dyn->d_tag >= DT_ENCODING &&
- (dyn->d_tag & 1) == 0)
- dyn->d_un.d_ptr += VDSO_ADDR_ADJUST;
- break;
-
- case DT_VERDEFNUM:
- case DT_VERNEEDNUM:
- case DT_FLAGS_1:
- case DT_RELACOUNT:
- case DT_RELCOUNT:
- case DT_VALRNGLO ... DT_VALRNGHI:
- /* definitely not pointers */
- break;
-
- case OLD_DT_LOOS ... DT_LOOS-1:
- case DT_HIOS ... DT_VALRNGLO-1:
- default:
- if (dyn->d_tag > DT_ENCODING)
- printk(KERN_INFO "VDSO: unexpected DT_tag %x\n",
- dyn->d_tag);
- break;
- }
-}
-
-static __init void relocate_vdso(Elf32_Ehdr *ehdr)
-{
- Elf32_Phdr *phdr;
- Elf32_Shdr *shdr;
- int i;
-
- BUG_ON(memcmp(ehdr->e_ident, ELFMAG, SELFMAG) != 0 ||
- !elf_check_arch_ia32(ehdr) ||
- ehdr->e_type != ET_DYN);
-
- ehdr->e_entry += VDSO_ADDR_ADJUST;
-
- /* rebase phdrs */
- phdr = (void *)ehdr + ehdr->e_phoff;
- for (i = 0; i < ehdr->e_phnum; i++) {
- phdr[i].p_vaddr += VDSO_ADDR_ADJUST;
-
- /* relocate dynamic stuff */
- if (phdr[i].p_type == PT_DYNAMIC)
- reloc_dyn(ehdr, phdr[i].p_offset);
- }
-
- /* rebase sections */
- shdr = (void *)ehdr + ehdr->e_shoff;
- for(i = 0; i < ehdr->e_shnum; i++) {
- if (!(shdr[i].sh_flags & SHF_ALLOC))
- continue;
-
- shdr[i].sh_addr += VDSO_ADDR_ADJUST;
-
- if (shdr[i].sh_type == SHT_SYMTAB ||
- shdr[i].sh_type == SHT_DYNSYM)
- reloc_symtab(ehdr, shdr[i].sh_offset,
- shdr[i].sh_size);
- }
-}
-
static struct page *vdso32_pages[1];
#ifdef CONFIG_X86_64
@@ -212,12 +85,6 @@ void syscall32_cpu_init(void)
wrmsrl(MSR_CSTAR, ia32_cstar_target);
}
-#define compat_uses_vma 1
-
-static inline void map_compat_vdso(int map)
-{
-}
-
#else /* CONFIG_X86_32 */
#define vdso32_sysenter() (boot_cpu_has(X86_FEATURE_SEP))
@@ -241,37 +108,6 @@ void enable_sep_cpu(void)
put_cpu();
}
-static struct vm_area_struct gate_vma;
-
-static int __init gate_vma_init(void)
-{
- gate_vma.vm_mm = NULL;
- gate_vma.vm_start = FIXADDR_USER_START;
- gate_vma.vm_end = FIXADDR_USER_END;
- gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC;
- gate_vma.vm_page_prot = __P101;
-
- return 0;
-}
-
-#define compat_uses_vma 0
-
-static void map_compat_vdso(int map)
-{
- static int vdso_mapped;
-
- if (map == vdso_mapped)
- return;
-
- vdso_mapped = map;
-
- __set_fixmap(FIX_VDSO, page_to_pfn(vdso32_pages[0]) << PAGE_SHIFT,
- map ? PAGE_READONLY_EXEC : PAGE_NONE);
-
- /* flush stray tlbs */
- flush_tlb_all();
-}
-
#endif /* CONFIG_X86_64 */
int __init sysenter_setup(void)
@@ -282,10 +118,6 @@ int __init sysenter_setup(void)
vdso32_pages[0] = virt_to_page(syscall_page);
-#ifdef CONFIG_X86_32
- gate_vma_init();
-#endif
-
if (vdso32_syscall()) {
vsyscall = &vdso32_syscall_start;
vsyscall_len = &vdso32_syscall_end - &vdso32_syscall_start;
@@ -298,7 +130,6 @@ int __init sysenter_setup(void)
}
memcpy(syscall_page, vsyscall, vsyscall_len);
- relocate_vdso(syscall_page);
return 0;
}
@@ -309,48 +140,35 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
struct mm_struct *mm = current->mm;
unsigned long addr;
int ret = 0;
- bool compat;
#ifdef CONFIG_X86_X32_ABI
if (test_thread_flag(TIF_X32))
return x32_setup_additional_pages(bprm, uses_interp);
#endif
- if (vdso_enabled == VDSO_DISABLED)
+ if (vdso_enabled != 1) /* Other values all mean "disabled" */
return 0;
down_write(&mm->mmap_sem);
- /* Test compat mode once here, in case someone
- changes it via sysctl */
- compat = (vdso_enabled == VDSO_COMPAT);
-
- map_compat_vdso(compat);
-
- if (compat)
- addr = VDSO_HIGH_BASE;
- else {
- addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
- if (IS_ERR_VALUE(addr)) {
- ret = addr;
- goto up_fail;
- }
+ addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
+ if (IS_ERR_VALUE(addr)) {
+ ret = addr;
+ goto up_fail;
}
current->mm->context.vdso = (void *)addr;
- if (compat_uses_vma || !compat) {
- /*
- * MAYWRITE to allow gdb to COW and set breakpoints
- */
- ret = install_special_mapping(mm, addr, PAGE_SIZE,
- VM_READ|VM_EXEC|
- VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
- vdso32_pages);
-
- if (ret)
- goto up_fail;
- }
+ /*
+ * MAYWRITE to allow gdb to COW and set breakpoints
+ */
+ ret = install_special_mapping(mm, addr, PAGE_SIZE,
+ VM_READ|VM_EXEC|
+ VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+ vdso32_pages);
+
+ if (ret)
+ goto up_fail;
current_thread_info()->sysenter_return =
VDSO32_SYMBOL(addr, SYSENTER_RETURN);
@@ -411,20 +229,12 @@ const char *arch_vma_name(struct vm_area_struct *vma)
struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
- /*
- * Check to see if the corresponding task was created in compat vdso
- * mode.
- */
- if (mm && mm->context.vdso == (void *)VDSO_HIGH_BASE)
- return &gate_vma;
return NULL;
}
int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
- const struct vm_area_struct *vma = get_gate_vma(mm);
-
- return vma && addr >= vma->vm_start && addr < vma->vm_end;
+ return 0;
}
int in_gate_area_no_mm(unsigned long addr)
diff --git a/arch/x86/vdso/vdso32/vdso32.lds.S b/arch/x86/vdso/vdso32/vdso32.lds.S
index 976124b..90e7aa9 100644
--- a/arch/x86/vdso/vdso32/vdso32.lds.S
+++ b/arch/x86/vdso/vdso32/vdso32.lds.S
@@ -8,7 +8,6 @@
* values visible using the asm-x86/vdso.h macros from the kernel proper.
*/
-#define VDSO_PRELINK 0
#include "../vdso-layout.lds.S"
/* The ELF entry point can be used to set the AT_SYSINFO value. */
@@ -31,7 +30,6 @@ VERSION
/*
* Symbols we define here called VDSO* get their values into vdso32-syms.h.
*/
-VDSO32_PRELINK = VDSO_PRELINK;
VDSO32_vsyscall = __kernel_vsyscall;
VDSO32_sigreturn = __kernel_sigreturn;
VDSO32_rt_sigreturn = __kernel_rt_sigreturn;
--
1.8.5.3
^ permalink raw reply related [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 1:03 [PATCH v2] x86: Remove compat vdso support Andy Lutomirski
@ 2014-03-11 1:39 ` Linus Torvalds
2014-03-11 2:37 ` Andy Lutomirski
0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2014-03-11 1:39 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, the arch/x86 maintainers, Stefani Seibold,
Andreas Brief, Martin Runge, Linux Kernel Mailing List,
Dave Jones
On Mon, Mar 10, 2014 at 6:03 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> This is a bit of an abuse of the no-breaking-userspace policy.
No it's not, because it won't be applied.
You need to fix it.
I'm not sure what goes wrong, since it *looks* like you handle the
"vdso_enabled" thing correctly, so I find it surprising that you say
that
echo 0 >/proc/sys/abi/vsyscall32
makes it work, since it should be zero already, and that echo should
be a no-op. But maybe I'm missing something.
Maybe you can just fake the boot parameter and fix the OpenSuSE
breakage that way (presumably that "init" sees it if it's some
user-space setup thing), but I'd like to know why that "echo 0" works,
but just initializing it to zero does not?
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 1:39 ` Linus Torvalds
@ 2014-03-11 2:37 ` Andy Lutomirski
2014-03-11 3:09 ` Linus Torvalds
0 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-11 2:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, the arch/x86 maintainers, Stefani Seibold,
Andreas Brief, Martin Runge, Linux Kernel Mailing List,
Dave Jones
On Mon, Mar 10, 2014 at 6:39 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Mar 10, 2014 at 6:03 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> This is a bit of an abuse of the no-breaking-userspace policy.
>
> No it's not, because it won't be applied.
>
> You need to fix it.
>
> I'm not sure what goes wrong, since it *looks* like you handle the
> "vdso_enabled" thing correctly, so I find it surprising that you say
> that
>
> echo 0 >/proc/sys/abi/vsyscall32
>
> makes it work, since it should be zero already, and that echo should
> be a no-op. But maybe I'm missing something.
>
> Maybe you can just fake the boot parameter and fix the OpenSuSE
> breakage that way (presumably that "init" sees it if it's some
> user-space setup thing), but I'd like to know why that "echo 0" works,
> but just initializing it to zero does not?
It does. My patch breaks OpenSuSE 9 when
CONFIG_ENABLE_VDSO32_BY_DEFAULT=y unless it's overridden by sysctl or
boot option.
The behavior with my patch is:
If ENABLE_VDSO32_BY_DEFAULT (which is the default), then OpenSuSE 9
breaks. Everything else works. Booting with vdso=0, vdso=2,
vdso32=0, or vdso32=2, or setting abi.vsyscall32=0 will switch to the
no-vDSO behavior.
If !ENABLE_VDSO32_BY_DEFAULT, then OpenSuSE 9 breaks and other 32-bit
code runs without a vDSO, which slows it down a bit.
I did that because I seem to remember that it's not so bad to break
small amounts of userspace as long as there's a backwards
compatibility path (there are plenty of kernel options that turn on
"legacy" things needed for small numbers of users).
If this is not okay, then I can redo the patch, leaving it with
CONFIG_COMPAT_VDSO as the option name, so that anyone with a working
config will keep working if they run 'make oldconfig' (as opposed to
being prompted). If I do that, I'd still prefer to make the
non-compatible version be the default, since it's the right choice for
the vast majority of users. Currently CONFIG_COMPAT_VDSO is default
y, which seems like an odd choice to me.
--Andy
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 2:37 ` Andy Lutomirski
@ 2014-03-11 3:09 ` Linus Torvalds
2014-03-11 4:10 ` Andy Lutomirski
0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2014-03-11 3:09 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, the arch/x86 maintainers, Stefani Seibold,
Andreas Brief, Martin Runge, Linux Kernel Mailing List,
Dave Jones
On Mon, Mar 10, 2014 at 7:37 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> It does. My patch breaks OpenSuSE 9 when
> CONFIG_ENABLE_VDSO32_BY_DEFAULT=y unless it's overridden by sysctl or
> boot option.
Oh, I missed that "when =y" part.
But why do we then want to have that "=y" as an option at all?
If the situation is that everybody is fine with that being disabled by
default, let's just make it the default. And I'd even be ok with
removing it as an option *entirely*.
That would seem to be *much* preferable that having an option that
nobody really wants anyway, and where the default value would break
some users. THAT just seems completely insane.
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 3:09 ` Linus Torvalds
@ 2014-03-11 4:10 ` Andy Lutomirski
2014-03-11 8:37 ` Ingo Molnar
2014-03-11 9:36 ` Linus Torvalds
0 siblings, 2 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-11 4:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, the arch/x86 maintainers, Stefani Seibold,
Andreas Brief, Martin Runge, Linux Kernel Mailing List,
Dave Jones
On Mon, Mar 10, 2014 at 8:09 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Mar 10, 2014 at 7:37 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> It does. My patch breaks OpenSuSE 9 when
>> CONFIG_ENABLE_VDSO32_BY_DEFAULT=y unless it's overridden by sysctl or
>> boot option.
>
> Oh, I missed that "when =y" part.
>
> But why do we then want to have that "=y" as an option at all?
>
> If the situation is that everybody is fine with that being disabled by
> default, let's just make it the default. And I'd even be ok with
> removing it as an option *entirely*.
>
> That would seem to be *much* preferable that having an option that
> nobody really wants anyway, and where the default value would break
> some users. THAT just seems completely insane.
I suspect that a lot of 32-bit Linux users want syscall and/or
sysenter, and Stefani certainly wants the fast timing that the vDSO
can provide. Also, presumably __kernel_sigreturn serves some purpose
:)
The basic issue is that most old glibc versions (and all versions that
were ever tagged in a release) work correctly regardless of whether
there is a vDSO, and newer glibc versions (since 2004) will take
advantage of a vDSO if one exists, but OpenSuSE 9 shipped with a
creatively broken version that blows up if presented with a vDSO
that's not prelinked to its actual address.
Currently there are three options: sane vDSO, no vDSO, and OpenSuSE
9-compatible vDSO. The latter is a mess to maintain and breaks ASLR
(even for users of modern glibc), and having a vDSO is apparently
important enough that people are willing to pay to enhance it. The
default is OpenSuSE 9-compatible vDSO, which is IMO an odd choice.
ISTM the right solution is to make OpenSuSE 9 users turn off the vDSO
(which is a performance hit for them, but not a correctness issue) and
let everyone else have a simpler kernel that has no ASLR issues.
I'm a bit heartened by the fact that the failure mode on OpenSuSE 9 is
a rather distinctive and easily Googlable message. Most of the hits
offer abi.vsyscall32=0 or vdso=0 as suggestions, both of which
continue to work with my patch.
--Andy
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 4:10 ` Andy Lutomirski
@ 2014-03-11 8:37 ` Ingo Molnar
2014-03-11 9:36 ` Linus Torvalds
1 sibling, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2014-03-11 8:37 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, H. Peter Anvin, the arch/x86 maintainers,
Stefani Seibold, Andreas Brief, Martin Runge,
Linux Kernel Mailing List, Dave Jones
* Andy Lutomirski <luto@amacapital.net> wrote:
> [...]
>
> Currently there are three options: sane vDSO, no vDSO, and OpenSuSE
> 9-compatible vDSO. The latter is a mess to maintain and breaks ASLR
> (even for users of modern glibc), and having a vDSO is apparently
> important enough that people are willing to pay to enhance it. The
> default is OpenSuSE 9-compatible vDSO, which is IMO an odd choice.
The 'odd choice' was to not break the ABI by default...
> ISTM the right solution is to make OpenSuSE 9 users turn off the
> vDSO (which is a performance hit for them, but not a correctness
> issue) and let everyone else have a simpler kernel that has no ASLR
> issues.
Could we just remove the option and automagically disable the vdso on
OpenSuSE-9, without any boot flags? Is the segfault distinctive enough
to base a disable-vdso quirk on, either to disable the vdso, or to map
it into the compatibility position on demand?
That would remove most of this complication. Being somewhat slower on
an old distro with a new kernel is perfectly OK. The question is, can
this be done easily enough - chances are that it cannot be done.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 4:10 ` Andy Lutomirski
2014-03-11 8:37 ` Ingo Molnar
@ 2014-03-11 9:36 ` Linus Torvalds
2014-03-11 14:53 ` Andy Lutomirski
1 sibling, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2014-03-11 9:36 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, the arch/x86 maintainers, Stefani Seibold,
Andreas Brief, Martin Runge, Linux Kernel Mailing List,
Dave Jones
On Mon, Mar 10, 2014 at 9:10 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I suspect that a lot of 32-bit Linux users want syscall and/or
> sysenter, and Stefani certainly wants the fast timing that the vDSO
> can provide. Also, presumably __kernel_sigreturn serves some purpose
> :)
Are we talking about the same thing at all?
We're talking about the *COMPAT* vdso, right?
The one you were just told Fedora had never _ever_ enabled? And you
are seriously arguing that "peformance" is relevant, while at the same
time trying to claim that the fact that it DOES NOT WORK on SuSE -
which *did* enable it - is not such a big deal, and that we should
ignore the "don't break user space" rule?
Seriously?
WTF?
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 9:36 ` Linus Torvalds
@ 2014-03-11 14:53 ` Andy Lutomirski
2014-03-11 15:30 ` Linus Torvalds
0 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-11 14:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel@vger.kernel.org, H. Peter Anvin, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On Mar 11, 2014 2:36 AM, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>
> On Mon, Mar 10, 2014 at 9:10 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> >
> > I suspect that a lot of 32-bit Linux users want syscall and/or
> > sysenter, and Stefani certainly wants the fast timing that the vDSO
> > can provide. Also, presumably __kernel_sigreturn serves some purpose
> > :)
>
> Are we talking about the same thing at all?
>
> We're talking about the *COMPAT* vdso, right?
>
> The one you were just told Fedora had never _ever_ enabled? And you
> are seriously arguing that "peformance" is relevant, while at the same
> time trying to claim that the fact that it DOES NOT WORK on SuSE -
> which *did* enable it - is not such a big deal, and that we should
> ignore the "don't break user space" rule?
There isn't a separate "compat" vdso. There may or may not be a vdso,
and that vdso may or may not be "compat". Since the kernel can't
easily tell whether ld.so has the bug, the kernel can't easily decide
which vdso to present, if any. So current kernels use only the compat
vdso by default.
The compat vdso *does* work on opensuse 9. I'm arguing the the number
of people who use opensuse 9 is low enough that we should support them
by offering to turn off the vdso instead of by offering to fudge the
address at which the vdso is mapped. The compat vdso also works on
modern glibc, but that comes at the expense of ASLR, some fixmap
entries that no one likes, and general maintainability of the code.
I wonder if we can actually detect buggy glibc versions at runtime.
The relevant commits are:
f866314b89d56845f55e6f365e18b31ec978ec3a: (Sun May 4 04:30:13 2003
+0000): add vdso support
3b3ddb4f7db98ec9e912ccdf54d35df4aa30e04a (Thu Feb 26 20:06:58 2004
+0000): remove the buggy assertion
49ad572a70b8aeb91e57483a11dd1b77e31c4468 (Sat Feb 28 17:56:22 2004
+0000): actually fix the code
Unfortunately, I was wrong about the affected versions. Glibc 2.3.3
was released with the bug; glibc 2.3.2 and 2.3.4 appear to be okay.
It's okay for a quirk to incorrectly flag older glibc versions -- they
won't use a vdso regardless of what aux entries we set. Incorrectly
flagging versions that are too new as quirky is unfortunate.
Checking for the text "(void *) ph->p_vaddr ==
_rtld_local._dl_sysinfo_dso" in ld.so will detect all but two days
worth of bad glibc versions. I don't think that we actually want to
fault in the whole ELF loader on each program load, though.
We can detect glibc 2.3 in general by checking the symbol version
definitions. That will incorrectly penalize a large range of later
versions, though.
We could do something really weird: we could look at the *name* of the
ELF loader. This would have to resolve symlinks, which is rather ugly
and probably a net loss of sanity.
--Andy
>
> Seriously?
>
> WTF?
>
> Linus
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 14:53 ` Andy Lutomirski
@ 2014-03-11 15:30 ` Linus Torvalds
2014-03-11 16:14 ` H. Peter Anvin
0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2014-03-11 15:30 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-kernel@vger.kernel.org, H. Peter Anvin, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On Tue, Mar 11, 2014 at 7:53 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I wonder if we can actually detect buggy glibc versions at runtime.
No, don't do that. That way lies madness.
What might be acceptable then is to just keep the old config name, and
if the COMPAT_VDSO config is enabled, you just disable the non-compat
vdso. At least that way, presumably any opensuse people would have
their kernel config continue working.
Then if people have that enabled but didn't need it, you can enable
it at runtime with
echo 1 > /proc/sys/abi/vsyscall32
which presumably would need to be exposed on 32-bit kernels too (it
looks like a x86-64-only thing right now)
The important thing is that we do *not* break user space. Not ever.
Not knowingly.
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 15:30 ` Linus Torvalds
@ 2014-03-11 16:14 ` H. Peter Anvin
2014-03-11 16:30 ` Linus Torvalds
0 siblings, 1 reply; 39+ messages in thread
From: H. Peter Anvin @ 2014-03-11 16:14 UTC (permalink / raw)
To: Linus Torvalds, Andy Lutomirski
Cc: linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On 03/11/2014 08:30 AM, Linus Torvalds wrote:
> On Tue, Mar 11, 2014 at 7:53 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> I wonder if we can actually detect buggy glibc versions at runtime.
>
> No, don't do that. That way lies madness.
>
> What might be acceptable then is to just keep the old config name, and
> if the COMPAT_VDSO config is enabled, you just disable the non-compat
> vdso. At least that way, presumably any opensuse people would have
> their kernel config continue working.
>
> Then if people have that enabled but didn't need it, you can enable
> it at runtime with
>
> echo 1 > /proc/sys/abi/vsyscall32
>
> which presumably would need to be exposed on 32-bit kernels too (it
> looks like a x86-64-only thing right now)
>
> The important thing is that we do *not* break user space. Not ever.
> Not knowingly.
>
As much as I wouldn't mind getting rid of the compat vdso, I really
don't understand why the trivial solution is being ruled out -- the
trivial solution being to just reserve a little more space in the fixmap
area.
I know Andy wants to move the vdso into a normal vma, which I certainly
support, but it is definitely the non-compat case.
-hpa
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:14 ` H. Peter Anvin
@ 2014-03-11 16:30 ` Linus Torvalds
2014-03-11 16:42 ` Andy Lutomirski
2014-03-11 16:42 ` H. Peter Anvin
0 siblings, 2 replies; 39+ messages in thread
From: Linus Torvalds @ 2014-03-11 16:30 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On Tue, Mar 11, 2014 at 9:14 AM, H. Peter Anvin <hpa@linux.intel.com> wrote:
>
> As much as I wouldn't mind getting rid of the compat vdso, I really
> don't understand why the trivial solution is being ruled out -- the
> trivial solution being to just reserve a little more space in the fixmap
> area.
No, the trivial solution is to stop adding crap to it.
And no, "just reserve a little more space for it" is neither trivial
nor a good idea. The fixed VDSO address is very much at the top of the
address space, so you can't allocate more space for it unless you do
one of
(a) make it non-contiguous
(b) get rid of the hole that is the very last page
(c) mess with the vsyscall pages and make it contiguous "backwards"
all of which sound like *horrible* ideas. Certainly not "trivial solution".
No, the trivial solution is to not mess with that legacy page at all.
Why is *that* trivial solution not on the table? Why the heck are
people hell-bent on changing this stupid legacy page around?
I find this whole thread very annoying. We shouldn't care about
x86-32, and certainly not from a performance angle - we should
consider it a "it's done, don't touch it" issue.
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:30 ` Linus Torvalds
@ 2014-03-11 16:42 ` Andy Lutomirski
2014-03-11 16:42 ` H. Peter Anvin
1 sibling, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-11 16:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On Tue, Mar 11, 2014 at 9:30 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Mar 11, 2014 at 9:14 AM, H. Peter Anvin <hpa@linux.intel.com> wrote:
>>
>> As much as I wouldn't mind getting rid of the compat vdso, I really
>> don't understand why the trivial solution is being ruled out -- the
>> trivial solution being to just reserve a little more space in the fixmap
>> area.
>
> No, the trivial solution is to stop adding crap to it.
>
> And no, "just reserve a little more space for it" is neither trivial
> nor a good idea. The fixed VDSO address is very much at the top of the
> address space, so you can't allocate more space for it unless you do
> one of
>
> (a) make it non-contiguous
> (b) get rid of the hole that is the very last page
> (c) mess with the vsyscall pages and make it contiguous "backwards"
>
> all of which sound like *horrible* ideas. Certainly not "trivial solution".
We can move it freely -- we just have to move it *once* when the
system boots. There isn't even any real requirement for it to live in
the kernel range.
Is there an address where it's more or less guaranteed to be possible
to stick a vma? The top of the non-randomized stack sounds like a
decent place, but I'm not really familiar with the address space
layout.
>
> No, the trivial solution is to not mess with that legacy page at all.
>
> Why is *that* trivial solution not on the table? Why the heck are
> people hell-bent on changing this stupid legacy page around?
>
> I find this whole thread very annoying. We shouldn't care about
> x86-32, and certainly not from a performance angle - we should
> consider it a "it's done, don't touch it" issue.
I'll let other people fight this particular battle :)
>
> Linus
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:30 ` Linus Torvalds
2014-03-11 16:42 ` Andy Lutomirski
@ 2014-03-11 16:42 ` H. Peter Anvin
2014-03-11 16:45 ` Andy Lutomirski
1 sibling, 1 reply; 39+ messages in thread
From: H. Peter Anvin @ 2014-03-11 16:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On 03/11/2014 09:30 AM, Linus Torvalds wrote:
>
> No, the trivial solution is to stop adding crap to it.
>
> And no, "just reserve a little more space for it" is neither trivial
> nor a good idea. The fixed VDSO address is very much at the top of the
> address space, so you can't allocate more space for it unless you do
> one of
>
> (a) make it non-contiguous
> (b) get rid of the hole that is the very last page
> (c) mess with the vsyscall pages and make it contiguous "backwards"
>
> all of which sound like *horrible* ideas. Certainly not "trivial solution".
>
> No, the trivial solution is to not mess with that legacy page at all.
>
> Why is *that* trivial solution not on the table? Why the heck are
> people hell-bent on changing this stupid legacy page around?
>
> I find this whole thread very annoying. We shouldn't care about
> x86-32, and certainly not from a performance angle - we should
> consider it a "it's done, don't touch it" issue.
>
Andy actually did the research, and found that even the legacy VDSO
doesn't have to live at any one particular address, it just has to live
at the address it is linked at. So we can move it just fine, but we
have to change the link address to match.
That gives us a lot more maneuvering room than saying it has to be at
one specific address.
-hpa
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:42 ` H. Peter Anvin
@ 2014-03-11 16:45 ` Andy Lutomirski
2014-03-11 16:50 ` Andy Lutomirski
` (2 more replies)
0 siblings, 3 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-11 16:45 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linus Torvalds, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On Tue, Mar 11, 2014 at 9:42 AM, H. Peter Anvin <hpa@linux.intel.com> wrote:
> On 03/11/2014 09:30 AM, Linus Torvalds wrote:
>>
>> No, the trivial solution is to stop adding crap to it.
>>
>> And no, "just reserve a little more space for it" is neither trivial
>> nor a good idea. The fixed VDSO address is very much at the top of the
>> address space, so you can't allocate more space for it unless you do
>> one of
>>
>> (a) make it non-contiguous
>> (b) get rid of the hole that is the very last page
>> (c) mess with the vsyscall pages and make it contiguous "backwards"
>>
>> all of which sound like *horrible* ideas. Certainly not "trivial solution".
>>
>> No, the trivial solution is to not mess with that legacy page at all.
>>
>> Why is *that* trivial solution not on the table? Why the heck are
>> people hell-bent on changing this stupid legacy page around?
>>
>> I find this whole thread very annoying. We shouldn't care about
>> x86-32, and certainly not from a performance angle - we should
>> consider it a "it's done, don't touch it" issue.
>>
>
> Andy actually did the research, and found that even the legacy VDSO
> doesn't have to live at any one particular address, it just has to live
> at the address it is linked at. So we can move it just fine, but we
> have to change the link address to match.
>
> That gives us a lot more maneuvering room than saying it has to be at
> one specific address.
>
We could even just relocate the damn thing wherever it ends up. That
will waste one page of memory per process, though.
--Andy
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:45 ` Andy Lutomirski
@ 2014-03-11 16:50 ` Andy Lutomirski
2014-03-11 16:52 ` H. Peter Anvin
2014-03-11 17:09 ` Linus Torvalds
2014-03-11 17:03 ` H. Peter Anvin
2014-03-11 17:07 ` Linus Torvalds
2 siblings, 2 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-11 16:50 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linus Torvalds, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
Looking forward, would it be reasonable to have an extensible set of
flags that live in the ELF interpreter's headers somewhere that
indicate compatibility hacks that the program in question doesn't
need? There are at least two things I can think of:
- no_compat_vdso32: indicates an interpreter that can load a modern
non-prelinked vdso
- no_vsyscall64: indicates that the libc will not attempt to call
into the vsyscall page on x86_64.
I'm sure that there are more. Think PT_GNU_STACK but for more than
just the stack.
If we do something like this, there should probably be a prctl or
similar that can change some of the flags at runtime, too.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:50 ` Andy Lutomirski
@ 2014-03-11 16:52 ` H. Peter Anvin
2014-03-11 17:09 ` Linus Torvalds
1 sibling, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2014-03-11 16:52 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On 03/11/2014 09:50 AM, Andy Lutomirski wrote:
> Looking forward, would it be reasonable to have an extensible set of
> flags that live in the ELF interpreter's headers somewhere that
> indicate compatibility hacks that the program in question doesn't
> need? There are at least two things I can think of:
>
> - no_compat_vdso32: indicates an interpreter that can load a modern
> non-prelinked vdso
> - no_vsyscall64: indicates that the libc will not attempt to call
> into the vsyscall page on x86_64.
>
> I'm sure that there are more. Think PT_GNU_STACK but for more than
> just the stack.
>
> If we do something like this, there should probably be a prctl or
> similar that can change some of the flags at runtime, too.
>
This comes many years too late for this purpose. Such flags might have
a use, but at this point it is rather meaningless, I think.
-hpa
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:50 ` Andy Lutomirski
2014-03-11 16:52 ` H. Peter Anvin
@ 2014-03-11 17:09 ` Linus Torvalds
2014-03-11 17:14 ` H. Peter Anvin
` (4 more replies)
1 sibling, 5 replies; 39+ messages in thread
From: Linus Torvalds @ 2014-03-11 17:09 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On Tue, Mar 11, 2014 at 9:50 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> Looking forward, would it be reasonable to have an extensible set of
> flags that live in the ELF interpreter's headers somewhere
No. Not reasonable. The whole "32-bit x86" and "looking forward"
combination makes absolutely zero sense.
I can pretty much guarantee that even *phones* will be 64-bit if/when
x86 ever gets there. They'll need it just for ARM emulation, I bet.
So 32-bit x86 is dead, dead, dead. There's absolutely no future to it.
We're not adding new stuff to "future-proof" it.
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 17:09 ` Linus Torvalds
@ 2014-03-11 17:14 ` H. Peter Anvin
2014-03-11 17:16 ` Andy Lutomirski
` (3 subsequent siblings)
4 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2014-03-11 17:14 UTC (permalink / raw)
To: Linus Torvalds, Andy Lutomirski
Cc: linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On 03/11/2014 10:09 AM, Linus Torvalds wrote:
>
> So 32-bit x86 is dead, dead, dead. There's absolutely no future to it.
> We're not adding new stuff to "future-proof" it.
>
Quark and its derivatives will probably be 32 bit for some time to come.
Now, I don't know what the motivation was for Stefani to start this work
specifically. Stefani, would you like to fill us in?
-hpa
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 17:09 ` Linus Torvalds
2014-03-11 17:14 ` H. Peter Anvin
@ 2014-03-11 17:16 ` Andy Lutomirski
2014-03-12 8:30 ` Stefani Seibold
` (2 subsequent siblings)
4 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-11 17:16 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On Tue, Mar 11, 2014 at 10:09 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Mar 11, 2014 at 9:50 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> Looking forward, would it be reasonable to have an extensible set of
>> flags that live in the ELF interpreter's headers somewhere
>
> No. Not reasonable. The whole "32-bit x86" and "looking forward"
> combination makes absolutely zero sense.
>
> I can pretty much guarantee that even *phones* will be 64-bit if/when
> x86 ever gets there. They'll need it just for ARM emulation, I bet.
>
> So 32-bit x86 is dead, dead, dead. There's absolutely no future to it.
> We're not adding new stuff to "future-proof" it.
One of those flags is for 64-bit code, though -- it would be nice if
the tasks that don't use the legacy vsyscall page could tell the
kernel, which would allow them to opt into full ASLR. (There are lots
of legacy programs around that need vsyscall emulation, so there's no
practical way to turn it off with a config option for a very long
time, but vsyscall emulation would be trivial to turn on and off
per-process.)
--Andy
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 17:09 ` Linus Torvalds
2014-03-11 17:14 ` H. Peter Anvin
2014-03-11 17:16 ` Andy Lutomirski
@ 2014-03-12 8:30 ` Stefani Seibold
2014-03-12 14:41 ` Linus Torvalds
2014-03-12 13:55 ` One Thousand Gnomes
2014-03-13 7:08 ` George Spelvin
4 siblings, 1 reply; 39+ messages in thread
From: Stefani Seibold @ 2014-03-12 8:30 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, H. Peter Anvin, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
Am Dienstag, den 11.03.2014, 10:09 -0700 schrieb Linus Torvalds:
> On Tue, Mar 11, 2014 at 9:50 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > Looking forward, would it be reasonable to have an extensible set of
> > flags that live in the ELF interpreter's headers somewhere
>
> No. Not reasonable. The whole "32-bit x86" and "looking forward"
> combination makes absolutely zero sense.
>
> I can pretty much guarantee that even *phones* will be 64-bit if/when
> x86 ever gets there. They'll need it just for ARM emulation, I bet.
>
> So 32-bit x86 is dead, dead, dead. There's absolutely no future to it.
> We're not adding new stuff to "future-proof" it.
>
Quite frankly this sounds like the mad scientist in an old marvell
comic: "dead, dead, dead".
Is it possible to calm down and get a more technical discussion rather
than blaming and treats not to accepting patches?
Can we also stop this hard words like "WTF". I don't like this style and
other developers too, especially women.
32-bit is not dead. I think 98 percent of all computers running linux
are embedded devices and a lot of them are not capable for 64 bit
support. So its your opinion, but there a also developers not sharing
this.
For me i still work with old Celeron Pentium III devices. And the life
time of this device will end in 7 years.
A lot of peoples (also main kernel hackers) ask me to do this patch
because the time functions in 32 bit kernel mode are so slow compared to
a 64 bit linux. And as i can see most of the involved kernel developers
are not opposite against this patch.
The other side is that many embedded developers use hand crafted time
functions using TSC or similar to get a fast time functions, but did not
know the pitfalls (C- and P-States) to handle this in a right way. So a
reliable way is to use the kernel functions, because the kernel knows
the state of the CPU and always returns the correct time. But this will
result in a slow down of the application, which generates latency.
We use this kind of patch for a long time and it decreased the latency
of our applications notable.
The current solution is quite clean, but there was a issue with the size
of the vDSO which not fits into one page by some kernel configurations.
There is a solution for this to #undef CONFIG_OPTIMIZE_INLINING and
CONFIG_X86_PPRO_FENCE in arch/x86/vdso/vdso32/vclock_gettime.c.
To prevent issues which future kernel releases, we have now two ideas to
solve this:
One ist Andy's kick ouf of the compat VDSO. For this there is already a
patch there.
And the other one is (thanks to Andys archeology investigations) to
increase the size of the vDSO fixmap space which has according to Andy
no side effect. This can be done in a very clean and easy way. The code
is still there, since the fixmap area is not fix:
Lguest, XEN, OPLC and the reservetop will move the fixmap during boot,
so we can easily get additional space by fixing __FIXADDR_TOP.
I will write a patch for the later one.
- Stefani
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 8:30 ` Stefani Seibold
@ 2014-03-12 14:41 ` Linus Torvalds
2014-03-12 15:46 ` Linus Torvalds
0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2014-03-12 14:41 UTC (permalink / raw)
To: Stefani Seibold
Cc: Andy Lutomirski, H. Peter Anvin, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 1:30 AM, Stefani Seibold <stefani@seibold.net> wrote:
>
> Is it possible to calm down and get a more technical discussion rather
> than blaming and treats not to accepting patches?
I'm just asking for an upside to the changes, and fighting changing
things "just because".
32-bit is not dead in the sense that it doesn't exist any more, but it
*is* dead in the sense that there is absolutely zero point in treating
it as a developing platform. That was very clearly also the context in
which I said "dead, dead, dead", I was objecting to trying to
future-proof things that are not worth future-proofing.
The VDSO code has worked for us for a long time, and I'm upset and
annoyed that people want to do "improvements" to it that are not
improvements at all. They are ugly (just look at that
remap_pfn_range() call in you patch - why?), and they cause problems,
and instead of people saying "ok, fix the source of the problem",
people are running around like headless chicken and saying "ok, let's
work around all these problems".
WHY?
Nobody has even explained why we want this at all, and why we want
this headache. Nobody has explained why the solution is not to "just
don't do that then". Instead, people are piling up *more* complexity
because the patch had a problem.
That's a technical issue, Stefani. And the threat to not apply patches
is a technical solution, and I'm getting more and more convinced is
the *right* technical solution.
And when Fengguang's automatic bug tester found the problem, YOU
STARTED ARGUING WITH HIM. Christ, well *excuuse* me for being fed up
with this pointless discussion.
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 14:41 ` Linus Torvalds
@ 2014-03-12 15:46 ` Linus Torvalds
2014-03-12 16:04 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 39+ messages in thread
From: Linus Torvalds @ 2014-03-12 15:46 UTC (permalink / raw)
To: Stefani Seibold
Cc: Andy Lutomirski, H. Peter Anvin, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 7:41 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Nobody has even explained why we want this at all, and why we want
> this headache. Nobody has explained why the solution is not to "just
> don't do that then". Instead, people are piling up *more* complexity
> because the patch had a problem.
>
> That's a technical issue, Stefani. And the threat to not apply patches
> is a technical solution, and I'm getting more and more convinced is
> the *right* technical solution.
.. which is not to say that there aren't possible ways forward.
But quite frankly, I do *not* believe that the way forward is "let's
pile on more complexity to hide the problems this patch caused".
And I absolutely do *not* believe that it is a good idea to make big
fundamental changes to x86-32 that may have lots of legacy users (and
potential embedded future users), but that new installations and
developers have largely left behind.
So I'd very strongly suggest that people go back to square one, and
look at the original patch that caused the problem. There's two
issues:
- the fact that it grows the code sufficiently that our single-page
approach doesn't work under certain (unusual) configurations
This is the obvious problem and the thing that people seem to have
worked most on. I don't think it's right to be so eager to work around
the problem, when presumably it should be straightforward to make the
damn code smaller in the first place.
- the patch really is ugly, and already adds random stuff to map the
vvar/hpet pages into user memory, using absolutely disgusting code.
So quite frankly, if I had looked at that patch *before* hearing of
the size issues, I would personally still have NAK'ed it. The games it
plays are just nasty.
Having looked at it a bit more, I think the correct solution is:
- leave the legacy compat-vdso FIXMAP entry at a single page
- do *not* add the HPET/VVAR page games to the legacy case. Get rid
of the remap_pfn_pages() games entirely.
- do the new thing only for the non-compat case, where we can just
use the "vdso_pages[]" array and not play any games.
IOW, just cut the cord. Separate out the legacy one-page FIXMAP case.
Get rid of the magic pfn_remap stuff entirely. It was ugly and it was
a mistake.
Hmm? That way, people can use the vdso and it really looks the same
between x86-32 and -64, and we can leave the legacy fixmap case alone.
Don't touch it.
But I think that "x86, vdso: Add 32 bit VDSO time support for 32 bit
kernel" patch needs to die. And the helper patches building up to it
(just because that patch used [io_]remap_pfn_range()) should die too.
Why weren't those pages in the vdso*_pages[] array anyway?
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 15:46 ` Linus Torvalds
@ 2014-03-12 16:04 ` Linus Torvalds
2014-03-12 16:18 ` Brian Gerst
2014-03-12 16:18 ` Andy Lutomirski
2014-03-12 19:41 ` Linus Torvalds
2 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2014-03-12 16:04 UTC (permalink / raw)
To: Stefani Seibold
Cc: Andy Lutomirski, H. Peter Anvin, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 8:46 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But I think that "x86, vdso: Add 32 bit VDSO time support for 32 bit
> kernel" patch needs to die. And the helper patches building up to it
> (just because that patch used [io_]remap_pfn_range()) should die too.
> Why weren't those pages in the vdso*_pages[] array anyway?
Sorry, confused. FIXMAP, not vdso_pages.
Anyway, that's what x86-64 does. Why not 32?
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 16:04 ` Linus Torvalds
@ 2014-03-12 16:18 ` Brian Gerst
0 siblings, 0 replies; 39+ messages in thread
From: Brian Gerst @ 2014-03-12 16:18 UTC (permalink / raw)
To: Linus Torvalds
Cc: Stefani Seibold, Andy Lutomirski, H. Peter Anvin,
linux-kernel@vger.kernel.org, the arch/x86 maintainers,
Dave Jones, Martin Runge, Andreas Brief, Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 12:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Mar 12, 2014 at 8:46 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> But I think that "x86, vdso: Add 32 bit VDSO time support for 32 bit
>> kernel" patch needs to die. And the helper patches building up to it
>> (just because that patch used [io_]remap_pfn_range()) should die too.
>> Why weren't those pages in the vdso*_pages[] array anyway?
>
> Sorry, confused. FIXMAP, not vdso_pages.
>
> Anyway, that's what x86-64 does. Why not 32?
A 32-bit process running on a 64-bit kernel cannot access the high
fixmap mappings. The VVAR/HPET pages have to be mapped below 4GB.
--
Brian Gerst
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 15:46 ` Linus Torvalds
2014-03-12 16:04 ` Linus Torvalds
@ 2014-03-12 16:18 ` Andy Lutomirski
2014-03-12 19:41 ` Linus Torvalds
2 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-12 16:18 UTC (permalink / raw)
To: Linus Torvalds
Cc: Stefani Seibold, H. Peter Anvin, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 8:46 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Mar 12, 2014 at 7:41 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>
> Having looked at it a bit more, I think the correct solution is:
>
> - leave the legacy compat-vdso FIXMAP entry at a single page
>
> - do *not* add the HPET/VVAR page games to the legacy case. Get rid
> of the remap_pfn_pages() games entirely.
If I understand your suggestion right, I you're saying that we
should have two cases:
1. CONFIG_COMPAT_VDSO or vdso32=2: Use a one-page vdso in the fixmap.
2. !CONFIG_COMPAT_VDSO or vdso32=1: Use a multiple-page vdso + vvar
area in a vma (or maybe a couple of vmas).
The current 32-bit vdso is actually three separate .so files, selected
at boot time. So now we'd need six. (Yes, this can be reduced to
two. But still.) I think I'd prefer:
1. CONFIG_COMPAT_VDSO or vdso32=2: No vdso at all
2. !CONFIG_COMPAT_VDSO or vdso32=1: Use a multiple-page vdso + vvar
area in a vma (or maybe a couple of vmas).
This variant is a *lot* less code, and it avoids the need to generate
a further combinatorial blowup in the number of vdso images that the
kernel builds. (This is more or less the same thing as my
thoroughly-nakked patch, except with the default flipped.)
The only downside that I can think of is that people who are using
CONFIG_COMPAT_VDSO unnecessarily take a small performance hit.
IOW, I don't think that anyone really wants to support two
implementations of a non-trivial 32-bit vdso.
Re: leaving the hpet and such in the fixmap unconditionally, I'm not
sure I see the advantage. For one thing, it can't support the compat
case, it's not really clear to me how it fits in with x32, and it
prevents future improvements that involve adjusting those mappings per
process. Doing it with vmas is IMO not so bad. (Doing with with vmas
or with fixmaps depending on kernel configuration, on the other hand,
is a mess.)
--Andy
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 15:46 ` Linus Torvalds
2014-03-12 16:04 ` Linus Torvalds
2014-03-12 16:18 ` Andy Lutomirski
@ 2014-03-12 19:41 ` Linus Torvalds
2014-03-12 20:52 ` Andy Lutomirski
` (2 more replies)
2 siblings, 3 replies; 39+ messages in thread
From: Linus Torvalds @ 2014-03-12 19:41 UTC (permalink / raw)
To: Stefani Seibold
Cc: Andy Lutomirski, H. Peter Anvin, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 8:46 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> - do *not* add the HPET/VVAR page games to the legacy case. Get rid
> of the remap_pfn_pages() games entirely.
.. actually, another approach would be to do the HPET/VVAR page games,
but make them non-legacy.
The reason I hate seeing those remap_pfn_range() things is because
it's nasty code for a legacy case that I think shouldn't have new code
written for it, especially when it won't get testing by developers.
So my reaction was "don't do that".
But people pointing out that we can't do what x86-64 does made me
think: we could avoid the whole "nasty code for a legacy case" by
making it the *non*-legacy case. We could get rid of the fixmap
HPET/VVAR entirely - on x86-64 (which can use those addresses) a
PC-relative addressing is probably actually better anyway, so mapping
them together with the vdso code shouldn't hurt.
That would remove my objections to doing all this stuff for a case
that developers won't see and use (the whole "It's dead, Jim"
objection) . And it would unify the 32-bit and 64-bit cases.
Together with Andy's "remove legacy 32-bit fixmap vdso", I'd feel that
this is actually an _improvement_ to the current situation.
Would something like that be more acceptable to everybody?
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 19:41 ` Linus Torvalds
@ 2014-03-12 20:52 ` Andy Lutomirski
2014-03-12 21:37 ` H. Peter Anvin
2014-03-13 16:23 ` Thomas Gleixner
2 siblings, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-12 20:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Stefani Seibold, H. Peter Anvin, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 12:41 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Mar 12, 2014 at 8:46 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> - do *not* add the HPET/VVAR page games to the legacy case. Get rid
>> of the remap_pfn_pages() games entirely.
>
> .. actually, another approach would be to do the HPET/VVAR page games,
> but make them non-legacy.
>
> The reason I hate seeing those remap_pfn_range() things is because
> it's nasty code for a legacy case that I think shouldn't have new code
> written for it, especially when it won't get testing by developers.
>
> So my reaction was "don't do that".
>
> But people pointing out that we can't do what x86-64 does made me
> think: we could avoid the whole "nasty code for a legacy case" by
> making it the *non*-legacy case. We could get rid of the fixmap
> HPET/VVAR entirely - on x86-64 (which can use those addresses) a
> PC-relative addressing is probably actually better anyway, so mapping
> them together with the vdso code shouldn't hurt.
I think this is approximately what I was suggesting as a long-term solution :)
>
> Together with Andy's "remove legacy 32-bit fixmap vdso", I'd feel that
> this is actually an _improvement_ to the current situation.
>
> Would something like that be more acceptable to everybody?
I like it.
This has the added benefit that the vvar symbols can be ordinary
symbols as far as the compiler is concerned. Unfortunately, I don't
know how to get the linker to play along without hardcoding the
offsets of all the variables into the linker script. I'll play around
a bit.
--Andy
It would be nice to get rid of the vvar declaration stuff, too
>
> Linus
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 19:41 ` Linus Torvalds
2014-03-12 20:52 ` Andy Lutomirski
@ 2014-03-12 21:37 ` H. Peter Anvin
2014-03-12 21:45 ` Andy Lutomirski
2014-03-12 21:46 ` Linus Torvalds
2014-03-13 16:23 ` Thomas Gleixner
2 siblings, 2 replies; 39+ messages in thread
From: H. Peter Anvin @ 2014-03-12 21:37 UTC (permalink / raw)
To: Linus Torvalds, Stefani Seibold
Cc: Andy Lutomirski, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On 03/12/2014 12:41 PM, Linus Torvalds wrote:
>
> So my reaction was "don't do that".
>
> But people pointing out that we can't do what x86-64 does made me
> think: we could avoid the whole "nasty code for a legacy case" by
> making it the *non*-legacy case. We could get rid of the fixmap
> HPET/VVAR entirely - on x86-64 (which can use those addresses) a
> PC-relative addressing is probably actually better anyway, so mapping
> them together with the vdso code shouldn't hurt.
>
How would that deal with the legacy vsyscall case for x86-64? Just rely
on the "legacy vsyscall emulation" (which seems to have its own class of
problems...)?
-hpa
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 21:37 ` H. Peter Anvin
@ 2014-03-12 21:45 ` Andy Lutomirski
2014-03-12 21:46 ` Linus Torvalds
1 sibling, 0 replies; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-12 21:45 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linus Torvalds, Stefani Seibold, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 2:37 PM, H. Peter Anvin <hpa@linux.intel.com> wrote:
> On 03/12/2014 12:41 PM, Linus Torvalds wrote:
>>
>> So my reaction was "don't do that".
>>
>> But people pointing out that we can't do what x86-64 does made me
>> think: we could avoid the whole "nasty code for a legacy case" by
>> making it the *non*-legacy case. We could get rid of the fixmap
>> HPET/VVAR entirely - on x86-64 (which can use those addresses) a
>> PC-relative addressing is probably actually better anyway, so mapping
>> them together with the vdso code shouldn't hurt.
>>
>
> How would that deal with the legacy vsyscall case for x86-64? Just rely
> on the "legacy vsyscall emulation" (which seems to have its own class of
> problems...)?
The emulated 64-bit vsyscall is completely independent of the vdso and
any user-accessible data whatsoever -- it's just a self-contained bit
of code that runs in kernel space. There is no user-executable code
involved at all. (There is code that *looks* executable to a dynamic
recompiler, but that code consists of ordinary system calls.) As far
as I know, all of the problems with it (other than the fact that it
exists at all) were solved a couple years ago.
--Andy
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 21:37 ` H. Peter Anvin
2014-03-12 21:45 ` Andy Lutomirski
@ 2014-03-12 21:46 ` Linus Torvalds
2014-03-12 21:49 ` Andy Lutomirski
1 sibling, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2014-03-12 21:46 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Stefani Seibold, Andy Lutomirski, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 2:37 PM, H. Peter Anvin <hpa@linux.intel.com> wrote:
>
> How would that deal with the legacy vsyscall case for x86-64? Just rely
> on the "legacy vsyscall emulation" (which seems to have its own class of
> problems...)?
It does?
We *default* to emulation, and have for over two years now (since
v3.4). If there are problems with it, we need to fix those.
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 21:46 ` Linus Torvalds
@ 2014-03-12 21:49 ` Andy Lutomirski
2014-03-12 23:06 ` H. Peter Anvin
0 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-12 21:49 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, Stefani Seibold, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 2:46 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Mar 12, 2014 at 2:37 PM, H. Peter Anvin <hpa@linux.intel.com> wrote:
>>
>> How would that deal with the legacy vsyscall case for x86-64? Just rely
>> on the "legacy vsyscall emulation" (which seems to have its own class of
>> problems...)?
>
> It does?
>
> We *default* to emulation, and have for over two years now (since
> v3.4). If there are problems with it, we need to fix those.
Even in the non-default "vsyscall=native" case, the vsyscall pages
just contains syscalls. It does not need to access the vvar page, the
hpet, or anything else that the vdso uses.
>
> Linus
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 21:49 ` Andy Lutomirski
@ 2014-03-12 23:06 ` H. Peter Anvin
2014-03-12 23:43 ` Andy Lutomirski
0 siblings, 1 reply; 39+ messages in thread
From: H. Peter Anvin @ 2014-03-12 23:06 UTC (permalink / raw)
To: Andy Lutomirski, Linus Torvalds
Cc: Stefani Seibold, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On 03/12/2014 02:49 PM, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:46 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Wed, Mar 12, 2014 at 2:37 PM, H. Peter Anvin <hpa@linux.intel.com> wrote:
>>>
>>> How would that deal with the legacy vsyscall case for x86-64? Just rely
>>> on the "legacy vsyscall emulation" (which seems to have its own class of
>>> problems...)?
>>
>> It does?
>>
>> We *default* to emulation, and have for over two years now (since
>> v3.4). If there are problems with it, we need to fix those.
>
> Even in the non-default "vsyscall=native" case, the vsyscall pages
> just contains syscalls. It does not need to access the vvar page, the
> hpet, or anything else that the vdso uses.
>
Ah, right. I let that detail slip the mind.
I do hear vsyscall=native still being used as a workaround for problems,
but yes, just making it call the kernel is fine, of course.
So yes, this does make it all better.
-hpa
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 23:06 ` H. Peter Anvin
@ 2014-03-12 23:43 ` Andy Lutomirski
2014-03-12 23:46 ` H. Peter Anvin
0 siblings, 1 reply; 39+ messages in thread
From: Andy Lutomirski @ 2014-03-12 23:43 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Linus Torvalds, Stefani Seibold, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On Wed, Mar 12, 2014 at 4:06 PM, H. Peter Anvin <hpa@linux.intel.com> wrote:
> On 03/12/2014 02:49 PM, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 2:46 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> On Wed, Mar 12, 2014 at 2:37 PM, H. Peter Anvin <hpa@linux.intel.com> wrote:
>>>>
>>>> How would that deal with the legacy vsyscall case for x86-64? Just rely
>>>> on the "legacy vsyscall emulation" (which seems to have its own class of
>>>> problems...)?
>>>
>>> It does?
>>>
>>> We *default* to emulation, and have for over two years now (since
>>> v3.4). If there are problems with it, we need to fix those.
>>
>> Even in the non-default "vsyscall=native" case, the vsyscall pages
>> just contains syscalls. It does not need to access the vvar page, the
>> hpet, or anything else that the vdso uses.
>>
>
> Ah, right. I let that detail slip the mind.
>
> I do hear vsyscall=native still being used as a workaround for problems,
> but yes, just making it call the kernel is fine, of course.
Next time you hear that, can you let me know? I haven't heard of any
issues since 3.4 IIRC.
--Andy
>
> So yes, this does make it all better.
>
> -hpa
>
>
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 23:43 ` Andy Lutomirski
@ 2014-03-12 23:46 ` H. Peter Anvin
0 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2014-03-12 23:46 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, Stefani Seibold, linux-kernel@vger.kernel.org,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief,
Greg Kroah-Hartman
On 03/12/2014 04:43 PM, Andy Lutomirski wrote:
>>
>> I do hear vsyscall=native still being used as a workaround for problems,
>> but yes, just making it call the kernel is fine, of course.
>
> Next time you hear that, can you let me know? I haven't heard of any
> issues since 3.4 IIRC.
>
Will do.
-hpa
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-12 19:41 ` Linus Torvalds
2014-03-12 20:52 ` Andy Lutomirski
2014-03-12 21:37 ` H. Peter Anvin
@ 2014-03-13 16:23 ` Thomas Gleixner
2 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2014-03-13 16:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Stefani Seibold, Andy Lutomirski, H. Peter Anvin,
linux-kernel@vger.kernel.org, the arch/x86 maintainers,
Dave Jones, Martin Runge, Andreas Brief, Greg Kroah-Hartman
On Wed, 12 Mar 2014, Linus Torvalds wrote:
> On Wed, Mar 12, 2014 at 8:46 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > - do *not* add the HPET/VVAR page games to the legacy case. Get rid
> > of the remap_pfn_pages() games entirely.
>
> .. actually, another approach would be to do the HPET/VVAR page games,
> but make them non-legacy.
>
> The reason I hate seeing those remap_pfn_range() things is because
> it's nasty code for a legacy case that I think shouldn't have new code
> written for it, especially when it won't get testing by developers.
>
> So my reaction was "don't do that".
>
> But people pointing out that we can't do what x86-64 does made me
> think: we could avoid the whole "nasty code for a legacy case" by
> making it the *non*-legacy case. We could get rid of the fixmap
> HPET/VVAR entirely - on x86-64 (which can use those addresses) a
> PC-relative addressing is probably actually better anyway, so mapping
> them together with the vdso code shouldn't hurt.
>
> That would remove my objections to doing all this stuff for a case
> that developers won't see and use (the whole "It's dead, Jim"
> objection) . And it would unify the 32-bit and 64-bit cases.
>
> Together with Andy's "remove legacy 32-bit fixmap vdso", I'd feel that
> this is actually an _improvement_ to the current situation.
>
> Would something like that be more acceptable to everybody?
Definitely yes.
Thanks,
tglx
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 17:09 ` Linus Torvalds
` (2 preceding siblings ...)
2014-03-12 8:30 ` Stefani Seibold
@ 2014-03-12 13:55 ` One Thousand Gnomes
2014-03-13 7:08 ` George Spelvin
4 siblings, 0 replies; 39+ messages in thread
From: One Thousand Gnomes @ 2014-03-12 13:55 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andy Lutomirski, H. Peter Anvin, linux-kernel@vger.kernel.org,
Stefani Seibold, the arch/x86 maintainers, Dave Jones,
Martin Runge, Andreas Brief
> So 32-bit x86 is dead, dead, dead. There's absolutely no future to it.
> We're not adding new stuff to "future-proof" it.
I think you underestimate how long it'll be present given the advantages
of 32bit in certain situations like very very small devices. Intel is
still releasing new 32bit processors (SoC X1020D etc - 'Quark')
Alan
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 17:09 ` Linus Torvalds
` (3 preceding siblings ...)
2014-03-12 13:55 ` One Thousand Gnomes
@ 2014-03-13 7:08 ` George Spelvin
4 siblings, 0 replies; 39+ messages in thread
From: George Spelvin @ 2014-03-13 7:08 UTC (permalink / raw)
To: torvalds; +Cc: hpa, linux, linux-kernel, luto
Dear Leader spake:
> So 32-bit x86 is dead, dead, dead. There's absolutely no future to it.
> We're not adding new stuff to "future-proof" it.
I think you have a small error of perspective here. I agree the future
for 32-bit kernels is very limited. But this is about 32-bit binary
support, and lots of people run 32-bit binaries, even on 64-bit kernels.
The great majority of applications have no need for more than 4G of
address space and frequently people would rather avoid the pointer
bloat that going to 64 bits would cause.
The whole reason that the x32 ABI has been added to the kernel is to
better support this kind of application. But non-long-mode binaries
are still a lowest common denominator.
E.g Firefox full releases come in 32- and 64-bit versions, but the
precompiled betas are 32-bit only.
It still may not be worth it, but 32-bit binaries will be with us
for a long time.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:45 ` Andy Lutomirski
2014-03-11 16:50 ` Andy Lutomirski
@ 2014-03-11 17:03 ` H. Peter Anvin
2014-03-11 17:07 ` Linus Torvalds
2 siblings, 0 replies; 39+ messages in thread
From: H. Peter Anvin @ 2014-03-11 17:03 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Linus Torvalds, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On 03/11/2014 09:45 AM, Andy Lutomirski wrote:
>
> We could even just relocate the damn thing wherever it ends up. That
> will waste one page of memory per process, though.
>
We could definitely relocate it once and use the address across all
processes (e.g. top of the user address space.)
However, just changing one reserved page to two really is a trivial
change which preserves the status quo.
-hpa
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2] x86: Remove compat vdso support
2014-03-11 16:45 ` Andy Lutomirski
2014-03-11 16:50 ` Andy Lutomirski
2014-03-11 17:03 ` H. Peter Anvin
@ 2014-03-11 17:07 ` Linus Torvalds
2 siblings, 0 replies; 39+ messages in thread
From: Linus Torvalds @ 2014-03-11 17:07 UTC (permalink / raw)
To: Andy Lutomirski
Cc: H. Peter Anvin, linux-kernel@vger.kernel.org, Stefani Seibold,
the arch/x86 maintainers, Dave Jones, Martin Runge, Andreas Brief
On Tue, Mar 11, 2014 at 9:45 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> We could even just relocate the damn thing wherever it ends up. That
> will waste one page of memory per process, though.
I don't think we need to worry about the one page per process. So
*if* it is true that nobody actually cares about the exact address,
maybe that's the right solution.
I still don't get why people want to mess with the page in the first
place, though.
Linus
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2014-03-13 16:23 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-11 1:03 [PATCH v2] x86: Remove compat vdso support Andy Lutomirski
2014-03-11 1:39 ` Linus Torvalds
2014-03-11 2:37 ` Andy Lutomirski
2014-03-11 3:09 ` Linus Torvalds
2014-03-11 4:10 ` Andy Lutomirski
2014-03-11 8:37 ` Ingo Molnar
2014-03-11 9:36 ` Linus Torvalds
2014-03-11 14:53 ` Andy Lutomirski
2014-03-11 15:30 ` Linus Torvalds
2014-03-11 16:14 ` H. Peter Anvin
2014-03-11 16:30 ` Linus Torvalds
2014-03-11 16:42 ` Andy Lutomirski
2014-03-11 16:42 ` H. Peter Anvin
2014-03-11 16:45 ` Andy Lutomirski
2014-03-11 16:50 ` Andy Lutomirski
2014-03-11 16:52 ` H. Peter Anvin
2014-03-11 17:09 ` Linus Torvalds
2014-03-11 17:14 ` H. Peter Anvin
2014-03-11 17:16 ` Andy Lutomirski
2014-03-12 8:30 ` Stefani Seibold
2014-03-12 14:41 ` Linus Torvalds
2014-03-12 15:46 ` Linus Torvalds
2014-03-12 16:04 ` Linus Torvalds
2014-03-12 16:18 ` Brian Gerst
2014-03-12 16:18 ` Andy Lutomirski
2014-03-12 19:41 ` Linus Torvalds
2014-03-12 20:52 ` Andy Lutomirski
2014-03-12 21:37 ` H. Peter Anvin
2014-03-12 21:45 ` Andy Lutomirski
2014-03-12 21:46 ` Linus Torvalds
2014-03-12 21:49 ` Andy Lutomirski
2014-03-12 23:06 ` H. Peter Anvin
2014-03-12 23:43 ` Andy Lutomirski
2014-03-12 23:46 ` H. Peter Anvin
2014-03-13 16:23 ` Thomas Gleixner
2014-03-12 13:55 ` One Thousand Gnomes
2014-03-13 7:08 ` George Spelvin
2014-03-11 17:03 ` H. Peter Anvin
2014-03-11 17:07 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox