linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
@ 2015-09-07 15:56 Denys Vlasenko
  2015-09-09 22:01 ` Andy Lutomirski
  2015-09-15  6:05 ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Denys Vlasenko @ 2015-09-07 15:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

This new test checks that all x86 registers are preserved across
32-bit syscalls. It tests syscalls through VDSO (if available)
and through INT 0x80, normally and under ptrace.

If kernel is a 64-bit one, high registers (r8..r15) are poisoned
before the syscall is called and are checked afterwards.

They must be either preserved, or cleared to zero (but r11 is special);
r12..15 must be preserved for INT 0x80.

EFLAGS is checked for changes too, but change there is not
considered to be a bug (paravirt kernels do not preserve
arithmetic flags).

Run-tested on 64-bit kernel:

$ ./test_syscall_vdso_32
[RUN]	Executing 6-argument 32-bit syscall via VDSO
[OK]	Arguments are preserved across syscall
[NOTE]	R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK]	R8..R15 did not leak kernel data
[RUN]	Executing 6-argument 32-bit syscall via INT 80
[OK]	Arguments are preserved across syscall
[OK]	R8..R15 did not leak kernel data
[RUN]	Running tests under ptrace
[RUN]	Executing 6-argument 32-bit syscall via VDSO
[OK]	Arguments are preserved across syscall
[OK]	R8..R15 did not leak kernel data
[RUN]	Executing 6-argument 32-bit syscall via INT 80
[OK]	Arguments are preserved across syscall
[OK]	R8..R15 did not leak kernel data

On 32-bit paravirt kernel:

$ ./test_syscall_vdso_32
[NOTE]	Not a 64-bit kernel, won't test R8..R15 leaks
[RUN]	Executing 6-argument 32-bit syscall via VDSO
[WARN]	Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
[WARN]	Flags  after=0000000000200246 id 0 00 i z 0 0 p 1
[WARN]	Flags change=0000000000000c91 0 00 o d s 0 a 0 0 c
[OK]	Arguments are preserved across syscall
[RUN]	Executing 6-argument 32-bit syscall via INT 80
[OK]	Arguments are preserved across syscall
[RUN]	Running tests under ptrace
[RUN]	Executing 6-argument 32-bit syscall via VDSO
[OK]	Arguments are preserved across syscall
[RUN]	Executing 6-argument 32-bit syscall via INT 80
[OK]	Arguments are preserved across syscall

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---

Changes in v2:
 does not fail if VDSO can't be found;
 tests INT 80 syscall method;
 tests syscalls under ptrace;
 switched to /* */ comments

Changes in v3:
 added checking for r8..r15 info leaks

Changes in v4:
 re-added Makefile change

 tools/testing/selftests/x86/Makefile            |   2 +-
 tools/testing/selftests/x86/test_syscall_vdso.c | 401 ++++++++++++++++++++++++
 tools/testing/selftests/x86/thunks_32.S         |  55 ++++
 3 files changed, 457 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/x86/test_syscall_vdso.c
 create mode 100644 tools/testing/selftests/x86/thunks_32.S

diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index caa60d5..84effa6 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -5,7 +5,7 @@ include ../lib.mk
 .PHONY: all all_32 all_64 warn_32bit_failure clean
 
 TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs ldt_gdt syscall_nt
-TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn
+TARGETS_C_32BIT_ONLY := entry_from_vm86 syscall_arg_fault sigreturn test_syscall_vdso
 
 TARGETS_C_32BIT_ALL := $(TARGETS_C_BOTHBITS) $(TARGETS_C_32BIT_ONLY)
 BINARIES_32 := $(TARGETS_C_32BIT_ALL:%=%_32)
@@ -60,3 +60,4 @@ endif
 
 # Some tests have additional dependencies.
 sysret_ss_attrs_64: thunks.S
+test_syscall_vdso_32: thunks_32.S
diff --git a/tools/testing/selftests/x86/test_syscall_vdso.c b/tools/testing/selftests/x86/test_syscall_vdso.c
new file mode 100644
index 0000000..0792aef
--- /dev/null
+++ b/tools/testing/selftests/x86/test_syscall_vdso.c
@@ -0,0 +1,401 @@
+/*
+ * 32-bit syscall ABI conformance test.
+ *
+ * Copyright (c) 2015 Denys Vlasenko
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+/*
+ * Can be built statically:
+ * gcc -Os -Wall -static -m32 test_syscall_vdso.c thunks_32.S
+ */
+#undef _GNU_SOURCE
+#define _GNU_SOURCE 1
+#undef __USE_GNU
+#define __USE_GNU 1
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <signal.h>
+#include <sys/types.h>
+#include <sys/select.h>
+#include <sys/time.h>
+#include <elf.h>
+#include <sys/ptrace.h>
+#include <sys/wait.h>
+
+#if !defined(__i386__)
+int main(int argc, char **argv, char **envp)
+{
+	printf("[SKIP]\tNot a 32-bit x86 userspace\n");
+	return 0;
+}
+#else
+
+long syscall_addr;
+long get_syscall(char **envp)
+{
+	Elf32_auxv_t *auxv;
+	while (*envp++ != NULL)
+		continue;
+	for (auxv = (void *)envp; auxv->a_type != AT_NULL; auxv++)
+		if (auxv->a_type == AT_SYSINFO)
+			return auxv->a_un.a_val;
+	printf("[WARN]\tAT_SYSINFO not supplied\n");
+	return 0;
+}
+
+asm (
+	"	.pushsection .text\n"
+	"	.global	int80\n"
+	"int80:\n"
+	"	int	$0x80\n"
+	"	ret\n"
+	"	.popsection\n"
+);
+extern char int80;
+
+struct regs64 {
+	uint64_t rax, rbx, rcx, rdx;
+	uint64_t rsi, rdi, rbp, rsp;
+	uint64_t r8,  r9,  r10, r11;
+	uint64_t r12, r13, r14, r15;
+};
+struct regs64 regs64;
+int kernel_is_64bit;
+
+asm (
+	"	.pushsection .text\n"
+	"	.code64\n"
+	"get_regs64:\n"
+	"	push	%rax\n"
+	"	mov	$regs64, %eax\n"
+	"	pop	0*8(%rax)\n"
+	"	movq	%rbx, 1*8(%rax)\n"
+	"	movq	%rcx, 2*8(%rax)\n"
+	"	movq	%rdx, 3*8(%rax)\n"
+	"	movq	%rsi, 4*8(%rax)\n"
+	"	movq	%rdi, 5*8(%rax)\n"
+	"	movq	%rbp, 6*8(%rax)\n"
+	"	movq	%rsp, 7*8(%rax)\n"
+	"	movq	%r8,  8*8(%rax)\n"
+	"	movq	%r9,  9*8(%rax)\n"
+	"	movq	%r10, 10*8(%rax)\n"
+	"	movq	%r11, 11*8(%rax)\n"
+	"	movq	%r12, 12*8(%rax)\n"
+	"	movq	%r13, 13*8(%rax)\n"
+	"	movq	%r14, 14*8(%rax)\n"
+	"	movq	%r15, 15*8(%rax)\n"
+	"	ret\n"
+	"poison_regs64:\n"
+	"	movq	$0x7f7f7f7f, %r8\n"
+	"	shl	$32, %r8\n"
+	"	orq	$0x7f7f7f7f, %r8\n"
+	"	movq	%r8, %r9\n"
+	"	movq	%r8, %r10\n"
+	"	movq	%r8, %r11\n"
+	"	movq	%r8, %r12\n"
+	"	movq	%r8, %r13\n"
+	"	movq	%r8, %r14\n"
+	"	movq	%r8, %r15\n"
+	"	ret\n"
+	"	.code32\n"
+	"	.popsection\n"
+);
+extern void get_regs64(void);
+extern void poison_regs64(void);
+extern unsigned long call64_from_32(void (*function)(void));
+void print_regs64(void)
+{
+	if (!kernel_is_64bit)
+		return;
+	printf("ax:%016llx bx:%016llx cx:%016llx dx:%016llx\n", regs64.rax,  regs64.rbx,  regs64.rcx,  regs64.rdx);
+	printf("si:%016llx di:%016llx bp:%016llx sp:%016llx\n", regs64.rsi,  regs64.rdi,  regs64.rbp,  regs64.rsp);
+	printf(" 8:%016llx  9:%016llx 10:%016llx 11:%016llx\n", regs64.r8 ,  regs64.r9 ,  regs64.r10,  regs64.r11);
+	printf("12:%016llx 13:%016llx 14:%016llx 15:%016llx\n", regs64.r12,  regs64.r13,  regs64.r14,  regs64.r15);
+}
+
+int check_regs64(void)
+{
+	int err = 0;
+	int num = 8;
+	uint64_t *r64 = &regs64.r8;
+
+	if (!kernel_is_64bit)
+		return 0;
+
+	do {
+		if (*r64 == 0x7f7f7f7f7f7f7f7fULL)
+			continue; /* register did not change */
+		if (syscall_addr != (long)&int80) {
+			/*
+			 * Non-INT80 syscall entrypoints are allowed to clobber R8+ regs:
+			 * either clear them to 0, or for R11, load EFLAGS.
+			 */
+			if (*r64 == 0)
+				continue;
+			if (num == 11) {
+				printf("[NOTE]\tR11 has changed:%016llx - assuming clobbered by SYSRET insn\n", *r64);
+				continue;
+			}
+		} else {
+			/* INT80 syscall entrypoint can be used by
+			 * 64-bit programs too, unlike SYSCALL/SYSENTER.
+			 * Therefore it must preserve R12+
+			 * (they are callee-saved registers in 64-bit C ABI).
+			 *
+			 * This was probably historically not intended,
+			 * but R8..11 are clobbered (cleared to 0).
+			 * IOW: they are the only registers which aren't
+			 * preserved across INT80 syscall.
+			 */
+			if (*r64 == 0 && num <= 11)
+				continue;
+		}
+		printf("[FAIL]\tR%d has changed:%016llx\n", num, *r64);
+		err++;
+	} while (r64++, ++num < 16);
+
+	if (!err)
+		printf("[OK]\tR8..R15 did not leak kernel data\n");
+	return err;
+}
+
+int nfds;
+fd_set rfds;
+fd_set wfds;
+fd_set efds;
+struct timespec timeout;
+sigset_t sigmask;
+struct {
+	sigset_t *sp;
+	int sz;
+} sigmask_desc;
+
+void prep_args()
+{
+	nfds = 42;
+	FD_ZERO(&rfds);
+	FD_ZERO(&wfds);
+	FD_ZERO(&efds);
+	FD_SET(0, &rfds);
+	FD_SET(1, &wfds);
+	FD_SET(2, &efds);
+	timeout.tv_sec = 0;
+	timeout.tv_nsec = 123;
+	sigemptyset(&sigmask);
+	sigaddset(&sigmask, SIGINT);
+	sigaddset(&sigmask, SIGUSR2);
+	sigaddset(&sigmask, SIGRTMAX);
+	sigmask_desc.sp = &sigmask;
+	sigmask_desc.sz = 8; /* bytes */
+}
+
+static void print_flags(const char *name, unsigned long r)
+{
+	static const char *bitarray[] = {
+	"\n" ,"c\n" ,/* Carry Flag */
+	"0 " ,"1 "  ,/* Bit 1 - always on */
+	""   ,"p "  ,/* Parity Flag */
+	"0 " ,"3? " ,
+	""   ,"a "  ,/* Auxiliary carry Flag */
+	"0 " ,"5? " ,
+	""   ,"z "  ,/* Zero Flag */
+	""   ,"s "  ,/* Sign Flag */
+	""   ,"t "  ,/* Trap Flag */
+	""   ,"i "  ,/* Interrupt Flag */
+	""   ,"d "  ,/* Direction Flag */
+	""   ,"o "  ,/* Overflow Flag */
+	"0 " ,"1 "  ,/* I/O Privilege Level (2 bits) */
+	"0"  ,"1"   ,/* I/O Privilege Level (2 bits) */
+	""   ,"n "  ,/* Nested Task */
+	"0 " ,"15? ",
+	""   ,"r "  ,/* Resume Flag */
+	""   ,"v "  ,/* Virtual Mode */
+	""   ,"ac " ,/* Alignment Check/Access Control */
+	""   ,"vif ",/* Virtual Interrupt Flag */
+	""   ,"vip ",/* Virtual Interrupt Pending */
+	""   ,"id " ,/* CPUID detection */
+	NULL
+	};
+	const char **bitstr;
+	int bit;
+
+	printf("%s=%016lx ", name, r);
+	bitstr = bitarray + 42;
+	bit = 21;
+	if ((r >> 22) != 0)
+		printf("(extra bits are set) ");
+	do {
+		if (bitstr[(r >> bit) & 1][0])
+			printf(bitstr[(r >> bit) & 1]);
+		bitstr -= 2;
+		bit--;
+	} while (bit >= 0);
+}
+
+int run_syscall(void)
+{
+	long flags, bad_arg;
+
+	prep_args();
+
+	if (kernel_is_64bit)
+		call64_from_32(poison_regs64);
+	/*print_regs64();*/
+
+	asm("\n"
+	/* Try 6-arg syscall: pselect. It should return quickly */
+	"	push	%%ebp\n"
+	"	mov	$308, %%eax\n"     /* PSELECT */
+	"	mov	nfds, %%ebx\n"     /* ebx  arg1 */
+	"	mov	$rfds, %%ecx\n"    /* ecx  arg2 */
+	"	mov	$wfds, %%edx\n"    /* edx  arg3 */
+	"	mov	$efds, %%esi\n"    /* esi  arg4 */
+	"	mov	$timeout, %%edi\n" /* edi  arg5 */
+	"	mov	$sigmask_desc, %%ebp\n" /* %ebp arg6 */
+	"	push	$0x200ed7\n"      /* set almost all flags */
+	"	popf\n"		/* except TF, IOPL, NT, RF, VM, AC, VIF, VIP */
+	"	call	*syscall_addr\n"
+	/* Check that registers are not clobbered */
+	"	pushf\n"
+	"	pop	%%eax\n"
+	"	cld\n"
+	"	cmp	nfds, %%ebx\n"     /* ebx  arg1 */
+	"	mov	$1, %%ebx\n"
+	"	jne	1f\n"
+	"	cmp	$rfds, %%ecx\n"    /* ecx  arg2 */
+	"	mov	$2, %%ebx\n"
+	"	jne	1f\n"
+	"	cmp	$wfds, %%edx\n"    /* edx  arg3 */
+	"	mov	$3, %%ebx\n"
+	"	jne	1f\n"
+	"	cmp	$efds, %%esi\n"    /* esi  arg4 */
+	"	mov	$4, %%ebx\n"
+	"	jne	1f\n"
+	"	cmp	$timeout, %%edi\n" /* edi  arg5 */
+	"	mov	$5, %%ebx\n"
+	"	jne	1f\n"
+	"	cmpl	$sigmask_desc, %%ebp\n" /* %ebp arg6 */
+	"	mov	$6, %%ebx\n"
+	"	jne	1f\n"
+	"	mov	$0, %%ebx\n"
+	"1:\n"
+	"	pop	%%ebp\n"
+	: "=a" (flags), "=b" (bad_arg)
+	:
+	: "cx", "dx", "si", "di"
+	);
+
+	if (kernel_is_64bit) {
+		memset(&regs64, 0x77, sizeof(regs64));
+		call64_from_32(get_regs64);
+		/*print_regs64();*/
+	}
+
+	/*
+	 * On paravirt kernels, flags are not preserved across syscalls.
+	 * Thus, we do not consider it a bug if some are changed.
+	 * We just show ones which do.
+	 */
+	if ((0x200ed7 ^ flags) != 0) {
+		print_flags("[WARN]\tFlags before", 0x200ed7);
+		print_flags("[WARN]\tFlags  after", flags);
+		print_flags("[WARN]\tFlags change", (0x200ed7 ^ flags));
+	}
+
+	if (bad_arg) {
+		printf("[FAIL]\targ#%ld clobbered\n", bad_arg);
+		return 1;
+	}
+	printf("[OK]\tArguments are preserved across syscall\n");
+
+	return check_regs64();
+}
+
+int run_syscall_twice()
+{
+	int exitcode = 0;
+	long sv;
+
+	if (syscall_addr) {
+		printf("[RUN]\tExecuting 6-argument 32-bit syscall via VDSO\n");
+		exitcode = run_syscall();
+	}
+	sv = syscall_addr;
+	syscall_addr = (long)&int80;
+	printf("[RUN]\tExecuting 6-argument 32-bit syscall via INT 80\n");
+	exitcode += run_syscall();
+	syscall_addr = sv;
+	return exitcode;
+}
+
+void ptrace_me()
+{
+	pid_t pid;
+
+	fflush(NULL);
+	pid = fork();
+	if (pid < 0)
+		exit(1);
+	if (pid == 0) {
+		/* child */
+		if (ptrace(PTRACE_TRACEME, 0L, 0L, 0L) != 0)
+			exit(0);
+		raise(SIGSTOP);
+		return;
+	}
+	/* parent */
+	printf("[RUN]\tRunning tests under ptrace\n");
+	while (1) {
+		int status;
+		pid = waitpid(-1, &status, __WALL);
+		if (WIFEXITED(status))
+			exit(WEXITSTATUS(status));
+		if (WIFSIGNALED(status))
+			exit(WTERMSIG(status));
+		if (pid <= 0 || !WIFSTOPPED(status)) /* paranoia */
+			exit(255);
+		/*
+		 * Note: we do not inject sig = WSTOPSIG(status).
+		 * We probably should, but careful: do not inject SIGTRAP
+		 * generated by syscall entry/exit stops.
+		 * That kills the child.
+		 */
+		ptrace(PTRACE_SYSCALL, pid, 0L, 0L /*sig*/);
+	}
+}
+
+int main(int argc, char **argv, char **envp)
+{
+	int exitcode = 0;
+	int cs;
+
+	asm("\n"
+	"	movl	%%cs, %%eax\n"
+	: "=a" (cs)
+	);
+	kernel_is_64bit = (cs == 0x23);
+	if (!kernel_is_64bit)
+		printf("[NOTE]\tNot a 64-bit kernel, won't test R8..R15 leaks\n");
+
+	/* This only works for non-static builds:
+	 * syscall_addr = dlsym(dlopen("linux-gate.so.1", RTLD_NOW), "__kernel_vsyscall");
+	 */
+	syscall_addr = get_syscall(envp);
+
+	exitcode += run_syscall_twice();
+	ptrace_me();
+	exitcode += run_syscall_twice();
+
+	return exitcode;
+}
+#endif
diff --git a/tools/testing/selftests/x86/thunks_32.S b/tools/testing/selftests/x86/thunks_32.S
new file mode 100644
index 0000000..77e9159
--- /dev/null
+++ b/tools/testing/selftests/x86/thunks_32.S
@@ -0,0 +1,55 @@
+/*
+ * thunks_32.S - assembly helpers for mixed-bitness code
+ * Copyright (c) 2015 Denys Vlasenko
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * These are little helpers that make it easier to switch bitness on
+ * the fly.
+ */
+
+	.text
+	.code32
+
+	.global call64_from_32
+	.type call32_from_64, @function
+
+	// 4(%esp): function to call
+call64_from_32:
+	// Fetch function address
+	mov	4(%esp), %eax
+
+	// Save registers which are callee-clobbered by 64-bit ABI
+	push	%ecx
+	push	%edx
+	push	%esi
+	push	%edi
+
+	// Switch to long mode
+	jmp	$0x33,$1f
+1:	.code64
+
+	// Call the function
+	call	*%rax
+
+	// Switch to compatibility mode
+	push	$0x23  /* USER32_CS */
+	.code32; push $1f; .code64 /* hack: can't have X86_64_32S relocation in 32-bit ELF */
+	lretq
+1:	.code32
+
+	pop	%edi
+	pop	%esi
+	pop	%edx
+	pop	%ecx
+
+	ret
+
+.size call64_from_32, .-call64_from_32
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-07 15:56 [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test Denys Vlasenko
@ 2015-09-09 22:01 ` Andy Lutomirski
  2015-09-10 19:04   ` Denys Vlasenko
  2015-09-15  6:05 ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2015-09-09 22:01 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	linux-kernel@vger.kernel.org

On Mon, Sep 7, 2015 at 8:56 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> This new test checks that all x86 registers are preserved across
> 32-bit syscalls. It tests syscalls through VDSO (if available)
> and through INT 0x80, normally and under ptrace.
>
> If kernel is a 64-bit one, high registers (r8..r15) are poisoned
> before the syscall is called and are checked afterwards.
>
> They must be either preserved, or cleared to zero (but r11 is special);
> r12..15 must be preserved for INT 0x80.
>
> EFLAGS is checked for changes too, but change there is not
> considered to be a bug (paravirt kernels do not preserve
> arithmetic flags).
>
> Run-tested on 64-bit kernel:
>
> $ ./test_syscall_vdso_32
> [RUN]   Executing 6-argument 32-bit syscall via VDSO
> [OK]    Arguments are preserved across syscall
> [NOTE]  R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
> [OK]    R8..R15 did not leak kernel data
> [RUN]   Executing 6-argument 32-bit syscall via INT 80
> [OK]    Arguments are preserved across syscall
> [OK]    R8..R15 did not leak kernel data
> [RUN]   Running tests under ptrace
> [RUN]   Executing 6-argument 32-bit syscall via VDSO
> [OK]    Arguments are preserved across syscall
> [OK]    R8..R15 did not leak kernel data
> [RUN]   Executing 6-argument 32-bit syscall via INT 80
> [OK]    Arguments are preserved across syscall
> [OK]    R8..R15 did not leak kernel data
>
> On 32-bit paravirt kernel:
>
> $ ./test_syscall_vdso_32
> [NOTE]  Not a 64-bit kernel, won't test R8..R15 leaks
> [RUN]   Executing 6-argument 32-bit syscall via VDSO
> [WARN]  Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
> [WARN]  Flags  after=0000000000200246 id 0 00 i z 0 0 p 1
> [WARN]  Flags change=0000000000000c91 0 00 o d s 0 a 0 0 c
> [OK]    Arguments are preserved across syscall
> [RUN]   Executing 6-argument 32-bit syscall via INT 80
> [OK]    Arguments are preserved across syscall
> [RUN]   Running tests under ptrace
> [RUN]   Executing 6-argument 32-bit syscall via VDSO
> [OK]    Arguments are preserved across syscall
> [RUN]   Executing 6-argument 32-bit syscall via INT 80
> [OK]    Arguments are preserved across syscall
>
> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Borislav Petkov <bp@alien8.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Andy Lutomirski <luto@amacapital.net>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Frederic Weisbecker <fweisbec@gmail.com>
> CC: Alexei Starovoitov <ast@plumgrid.com>
> CC: Will Drewry <wad@chromium.org>
> CC: Kees Cook <keescook@chromium.org>
> CC: x86@kernel.org
> CC: linux-kernel@vger.kernel.org

Acked-by: Andy Lutomirski <luto@kernel.org>

with minor caveats below, none of which are show-stoppers...

> +                       /* INT80 syscall entrypoint can be used by
> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
> +                        * Therefore it must preserve R12+
> +                        * (they are callee-saved registers in 64-bit C ABI).
> +                        *
> +                        * This was probably historically not intended,
> +                        * but R8..11 are clobbered (cleared to 0).
> +                        * IOW: they are the only registers which aren't
> +                        * preserved across INT80 syscall.
> +                        */
> +                       if (*r64 == 0 && num <= 11)
> +                               continue;

Ugh.  I'll change my big entry patchset to preserve these and maybe to
preserve all of the 64-bit regs.

> +int main(int argc, char **argv, char **envp)
> +{
> +       int exitcode = 0;
> +       int cs;
> +
> +       asm("\n"
> +       "       movl    %%cs, %%eax\n"
> +       : "=a" (cs)
> +       );
> +       kernel_is_64bit = (cs == 0x23);
> +       if (!kernel_is_64bit)
> +               printf("[NOTE]\tNot a 64-bit kernel, won't test R8..R15 leaks\n");
> +

Does this work on Xen?  IIRC Xen is quite likely to land you in a
funny Xen-specific CS.

If you want to be fancier, you could borrow cs_bitness from sigreturn.c.

> +       /* This only works for non-static builds:
> +        * syscall_addr = dlsym(dlopen("linux-gate.so.1", RTLD_NOW), "__kernel_vsyscall");
> +        */
> +       syscall_addr = get_syscall(envp);
> +
> +       exitcode += run_syscall_twice();
> +       ptrace_me();
> +       exitcode += run_syscall_twice();
> +
> +       return exitcode;
> +}
> +#endif
> diff --git a/tools/testing/selftests/x86/thunks_32.S b/tools/testing/selftests/x86/thunks_32.S
> new file mode 100644
> index 0000000..77e9159
> --- /dev/null
> +++ b/tools/testing/selftests/x86/thunks_32.S

As a followup, we should rename thunks.S to thunks_64.S or just merge
both of them into one file.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-09 22:01 ` Andy Lutomirski
@ 2015-09-10 19:04   ` Denys Vlasenko
  2015-09-10 19:17     ` Andy Lutomirski
  2015-09-14  8:15     ` Ingo Molnar
  0 siblings, 2 replies; 12+ messages in thread
From: Denys Vlasenko @ 2015-09-10 19:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	linux-kernel@vger.kernel.org

On 09/10/2015 12:01 AM, Andy Lutomirski wrote:
> On Mon, Sep 7, 2015 at 8:56 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> This new test checks that all x86 registers are preserved across
>> 32-bit syscalls. It tests syscalls through VDSO (if available)
>> and through INT 0x80, normally and under ptrace.
>>
>> If kernel is a 64-bit one, high registers (r8..r15) are poisoned
>> before the syscall is called and are checked afterwards.
>>
>> They must be either preserved, or cleared to zero (but r11 is special);
>> r12..15 must be preserved for INT 0x80.
>>
>> EFLAGS is checked for changes too, but change there is not
>> considered to be a bug (paravirt kernels do not preserve
>> arithmetic flags).
>>
>> Run-tested on 64-bit kernel:
>>
>> $ ./test_syscall_vdso_32
>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>> [OK]    Arguments are preserved across syscall
>> [NOTE]  R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
>> [OK]    R8..R15 did not leak kernel data
>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>> [OK]    Arguments are preserved across syscall
>> [OK]    R8..R15 did not leak kernel data
>> [RUN]   Running tests under ptrace
>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>> [OK]    Arguments are preserved across syscall
>> [OK]    R8..R15 did not leak kernel data
>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>> [OK]    Arguments are preserved across syscall
>> [OK]    R8..R15 did not leak kernel data
>>
>> On 32-bit paravirt kernel:
>>
>> $ ./test_syscall_vdso_32
>> [NOTE]  Not a 64-bit kernel, won't test R8..R15 leaks
>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>> [WARN]  Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
>> [WARN]  Flags  after=0000000000200246 id 0 00 i z 0 0 p 1
>> [WARN]  Flags change=0000000000000c91 0 00 o d s 0 a 0 0 c
>> [OK]    Arguments are preserved across syscall
>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>> [OK]    Arguments are preserved across syscall
>> [RUN]   Running tests under ptrace
>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>> [OK]    Arguments are preserved across syscall
>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>> [OK]    Arguments are preserved across syscall
>>
>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>> CC: Linus Torvalds <torvalds@linux-foundation.org>
>> CC: Steven Rostedt <rostedt@goodmis.org>
>> CC: Ingo Molnar <mingo@kernel.org>
>> CC: Borislav Petkov <bp@alien8.de>
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: Andy Lutomirski <luto@amacapital.net>
>> CC: Oleg Nesterov <oleg@redhat.com>
>> CC: Frederic Weisbecker <fweisbec@gmail.com>
>> CC: Alexei Starovoitov <ast@plumgrid.com>
>> CC: Will Drewry <wad@chromium.org>
>> CC: Kees Cook <keescook@chromium.org>
>> CC: x86@kernel.org
>> CC: linux-kernel@vger.kernel.org
> 
> Acked-by: Andy Lutomirski <luto@kernel.org>
> 
> with minor caveats below, none of which are show-stoppers...
> 
>> +                       /* INT80 syscall entrypoint can be used by
>> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
>> +                        * Therefore it must preserve R12+
>> +                        * (they are callee-saved registers in 64-bit C ABI).
>> +                        *
>> +                        * This was probably historically not intended,
>> +                        * but R8..11 are clobbered (cleared to 0).
>> +                        * IOW: they are the only registers which aren't
>> +                        * preserved across INT80 syscall.
>> +                        */
>> +                       if (*r64 == 0 && num <= 11)
>> +                               continue;
> 
> Ugh.  I'll change my big entry patchset to preserve these and maybe to
> preserve all of the 64-bit regs.

If you do that, this won't change the ABI: we don't _promise_
to save them. If we accidentally do, that means nothing.

If you do that, the test won't fail. The code above does
not require regs to be 0 - there is further code which
also allow them to be unchanged.

(I'm not very comfortable about additional six push/pops
which are necessary for this to happen. I'm surprised
maintainers tentatively agreed to that -
I was grilled and asked to prove with measurements
that *one* additional push+pop wasn't adding significant overhead).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-10 19:04   ` Denys Vlasenko
@ 2015-09-10 19:17     ` Andy Lutomirski
  2015-09-14  8:26       ` Ingo Molnar
  2015-09-14  8:15     ` Ingo Molnar
  1 sibling, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2015-09-10 19:17 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	linux-kernel@vger.kernel.org

On Thu, Sep 10, 2015 at 12:04 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 09/10/2015 12:01 AM, Andy Lutomirski wrote:
>> On Mon, Sep 7, 2015 at 8:56 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> This new test checks that all x86 registers are preserved across
>>> 32-bit syscalls. It tests syscalls through VDSO (if available)
>>> and through INT 0x80, normally and under ptrace.
>>>
>>> If kernel is a 64-bit one, high registers (r8..r15) are poisoned
>>> before the syscall is called and are checked afterwards.
>>>
>>> They must be either preserved, or cleared to zero (but r11 is special);
>>> r12..15 must be preserved for INT 0x80.
>>>
>>> EFLAGS is checked for changes too, but change there is not
>>> considered to be a bug (paravirt kernels do not preserve
>>> arithmetic flags).
>>>
>>> Run-tested on 64-bit kernel:
>>>
>>> $ ./test_syscall_vdso_32
>>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>>> [OK]    Arguments are preserved across syscall
>>> [NOTE]  R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
>>> [OK]    R8..R15 did not leak kernel data
>>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>>> [OK]    Arguments are preserved across syscall
>>> [OK]    R8..R15 did not leak kernel data
>>> [RUN]   Running tests under ptrace
>>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>>> [OK]    Arguments are preserved across syscall
>>> [OK]    R8..R15 did not leak kernel data
>>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>>> [OK]    Arguments are preserved across syscall
>>> [OK]    R8..R15 did not leak kernel data
>>>
>>> On 32-bit paravirt kernel:
>>>
>>> $ ./test_syscall_vdso_32
>>> [NOTE]  Not a 64-bit kernel, won't test R8..R15 leaks
>>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>>> [WARN]  Flags before=0000000000200ed7 id 0 00 o d i s z 0 a 0 p 1 c
>>> [WARN]  Flags  after=0000000000200246 id 0 00 i z 0 0 p 1
>>> [WARN]  Flags change=0000000000000c91 0 00 o d s 0 a 0 0 c
>>> [OK]    Arguments are preserved across syscall
>>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>>> [OK]    Arguments are preserved across syscall
>>> [RUN]   Running tests under ptrace
>>> [RUN]   Executing 6-argument 32-bit syscall via VDSO
>>> [OK]    Arguments are preserved across syscall
>>> [RUN]   Executing 6-argument 32-bit syscall via INT 80
>>> [OK]    Arguments are preserved across syscall
>>>
>>> Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
>>> CC: Linus Torvalds <torvalds@linux-foundation.org>
>>> CC: Steven Rostedt <rostedt@goodmis.org>
>>> CC: Ingo Molnar <mingo@kernel.org>
>>> CC: Borislav Petkov <bp@alien8.de>
>>> CC: "H. Peter Anvin" <hpa@zytor.com>
>>> CC: Andy Lutomirski <luto@amacapital.net>
>>> CC: Oleg Nesterov <oleg@redhat.com>
>>> CC: Frederic Weisbecker <fweisbec@gmail.com>
>>> CC: Alexei Starovoitov <ast@plumgrid.com>
>>> CC: Will Drewry <wad@chromium.org>
>>> CC: Kees Cook <keescook@chromium.org>
>>> CC: x86@kernel.org
>>> CC: linux-kernel@vger.kernel.org
>>
>> Acked-by: Andy Lutomirski <luto@kernel.org>
>>
>> with minor caveats below, none of which are show-stoppers...
>>
>>> +                       /* INT80 syscall entrypoint can be used by
>>> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
>>> +                        * Therefore it must preserve R12+
>>> +                        * (they are callee-saved registers in 64-bit C ABI).
>>> +                        *
>>> +                        * This was probably historically not intended,
>>> +                        * but R8..11 are clobbered (cleared to 0).
>>> +                        * IOW: they are the only registers which aren't
>>> +                        * preserved across INT80 syscall.
>>> +                        */
>>> +                       if (*r64 == 0 && num <= 11)
>>> +                               continue;
>>
>> Ugh.  I'll change my big entry patchset to preserve these and maybe to
>> preserve all of the 64-bit regs.
>
> If you do that, this won't change the ABI: we don't _promise_
> to save them. If we accidentally do, that means nothing.
>
> If you do that, the test won't fail. The code above does
> not require regs to be 0 - there is further code which
> also allow them to be unchanged.
>
> (I'm not very comfortable about additional six push/pops
> which are necessary for this to happen. I'm surprised
> maintainers tentatively agreed to that -
> I was grilled and asked to prove with measurements
> that *one* additional push+pop wasn't adding significant overhead).

I suspect that I need to make the series faster.

Also, int $0x80 isn't a fast path for any legitimate use case except
Debian, and I would argue that Debian is just buggy.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-10 19:04   ` Denys Vlasenko
  2015-09-10 19:17     ` Andy Lutomirski
@ 2015-09-14  8:15     ` Ingo Molnar
  2015-09-14 17:55       ` Andy Lutomirski
  1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2015-09-14  8:15 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	linux-kernel@vger.kernel.org


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> >> +                       /* INT80 syscall entrypoint can be used by
> >> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
> >> +                        * Therefore it must preserve R12+
> >> +                        * (they are callee-saved registers in 64-bit C ABI).
> >> +                        *
> >> +                        * This was probably historically not intended,
> >> +                        * but R8..11 are clobbered (cleared to 0).
> >> +                        * IOW: they are the only registers which aren't
> >> +                        * preserved across INT80 syscall.
> >> +                        */
> >> +                       if (*r64 == 0 && num <= 11)
> >> +                               continue;
> > 
> > Ugh.  I'll change my big entry patchset to preserve these and maybe to
> > preserve all of the 64-bit regs.
> 
> If you do that, this won't change the ABI: we don't _promise_
> to save them. If we accidentally do, that means nothing.

Argh, that's dangerous nonsense! You _still_ don't seem to understand what the 
Linux ABI means and how to change code that implements it...

Let me try one more time, and please make the mental effort to understand and 
internalize it, because I'm not going to apply any more ABI affecting patches from 
you if you don't...

When you read 'ABI' you always need to write it out mentally:

    _Application_ Binary Interface

Do you see the word 'kernel developer' in that phrase? I don't, and this really 
matters: what an ABI is does not depend on what the kernel developers think or 
thought it means. Our state of mind, our design preferences, our documentation and 
our intentions are absolutely irrelevant.

What an ABI is purely depends on is whether applications break or change behavior 
as a result of us changing it.

What an ABI depends on is how applications (pretty much any real application or 
library, no matter how rare) learned to use the kernel's previous implementation 
of the ABI, in released kernels - in every tiny detail of that ABI.

If _any_ application/library functionally relied on saving registers in this place 
- even if that behavior was 'accidental' from the kernel developer's POV, and if 
the app would break or misbehave if we changed that detail, then that's an 
'accidental ABI', which is an ABI just as much!!!

This is why we preserve old quirks and this is why we try to be 110% careful 
around ABIs, especially low level ones that expose a big set of hardware registers 
that applications could rely on in unexpected ways.

Not because we couldn't "fix" them, but because an ABI is a long term contract: we 
promise absolutely no breakage to applications, no matter what.

Mistaken ABIs, unintended ABIs, deterministically quirky ABIs that apps/libs 
learned to rely on, even if the ABI usage in the app itself is just a bug or is 
otherwise unintended, has _exactly_ the same ABI strength as our sys_close() ABI.

There are some very, very rare expections, but none of them apply here:

 - If an ABI component behaved non-deterministically (relying on random stack
   content for example) then we cannot always fully preserve it when fixing it - 
   but we'll generally gravitate towards trying to fix it while preserving 
   existing apps.

 - Sometimes, 1 out of 100,000 kernel bugs, preserving an old, rare ABI would mean
   introducing or allowing a serious security hole. There's been less than 3 such 
   cases in all of the 500,000+ commits in the Linux kernel that I'm aware of, so 
   this is not a case you need to even know about.

 - Testcases whose only purpose is to bake in every single silly detail from an
   exposed ABI generally don't count. Only if a real app/lib that people use is
   affected counts. (There are exceptions even to this exception: if the ABI
   change that breaks a test-case is obviously gratuitous then the ABI counts: 
   because reasonably there _might_ be apps out there that rely on the reasonable 
   old behavior, they just haven't been reported yet.) Obviously an ABI test-case 
   failing is a red flag, and by default we should assume we messed up and need to 
   unbreak the ABI - unless we are absolutely convinced it's OK. And even if we 
   think it's absolutely OK it takes just a single regression report of an 
   application breaking to revert an ABI change - no questions asked.

> (I'm not very comfortable about additional six push/pops which are necessary for 
> this to happen. I'm surprised maintainers tentatively agreed to that - I was 
> grilled and asked to prove with measurements that *one* additional push+pop 
> wasn't adding significant overhead).

Yeah, so here's our thinking: the code is still not as maintainable as we wish it 
to be, so we are willing to temporarily accept those pop/pushes to help along the 
C conversion effort, to make the code (much!) more maintainable.

Other changes still have to be justified on an instruction granular manner.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-10 19:17     ` Andy Lutomirski
@ 2015-09-14  8:26       ` Ingo Molnar
  2015-09-14 17:54         ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2015-09-14  8:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	linux-kernel@vger.kernel.org


* Andy Lutomirski <luto@amacapital.net> wrote:

> > (I'm not very comfortable about additional six push/pops
> > which are necessary for this to happen. I'm surprised
> > maintainers tentatively agreed to that -
> > I was grilled and asked to prove with measurements
> > that *one* additional push+pop wasn't adding significant overhead).
> 
> I suspect that I need to make the series faster.
> 
> Also, int $0x80 isn't a fast path for any legitimate use case except
> Debian, and I would argue that Debian is just buggy.

So buggy in the sense of not making use of faster syscalls, right? It won't break 
in any visible way, correct?

So if this heavy int80 syscall use happens even with the latest version of Debian 
as well then it would be nice to figure out what's wrong there, and provide an 
optimization patch to their libc guys or so - to make sure we fully understand the 
problem.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-14  8:26       ` Ingo Molnar
@ 2015-09-14 17:54         ` Andy Lutomirski
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2015-09-14 17:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Denys Vlasenko, Kees Cook, Borislav Petkov,
	X86 ML, Alexei Starovoitov, Oleg Nesterov, Will Drewry,
	Steven Rostedt, linux-kernel@vger.kernel.org, H. Peter Anvin,
	Linus Torvalds

On Sep 14, 2015 1:26 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
> > > (I'm not very comfortable about additional six push/pops
> > > which are necessary for this to happen. I'm surprised
> > > maintainers tentatively agreed to that -
> > > I was grilled and asked to prove with measurements
> > > that *one* additional push+pop wasn't adding significant overhead).
> >
> > I suspect that I need to make the series faster.
> >
> > Also, int $0x80 isn't a fast path for any legitimate use case except
> > Debian, and I would argue that Debian is just buggy.
>
> So buggy in the sense of not making use of faster syscalls, right? It won't break
> in any visible way, correct?

Correct.  The only visible breakage I know of is that my unwind
selftest fails if compiled by a Debian toolchain, and I'll fix that in
v2.

>
> So if this heavy int80 syscall use happens even with the latest version of Debian
> as well then it would be nice to figure out what's wrong there, and provide an
> optimization patch to their libc guys or so - to make sure we fully understand the
> problem.

I looked for a bit and drew a blank, but I know I'm bad at
understanding Debian packaging and even worse at understanding the
mess that is glibc's pile of multiply compiled files.  I'll file a bug
or something.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-14  8:15     ` Ingo Molnar
@ 2015-09-14 17:55       ` Andy Lutomirski
  2015-09-15  5:59         ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2015-09-14 17:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Denys Vlasenko, Kees Cook, Borislav Petkov,
	X86 ML, Alexei Starovoitov, Oleg Nesterov, Will Drewry,
	Steven Rostedt, linux-kernel@vger.kernel.org, H. Peter Anvin,
	Linus Torvalds

On Sep 14, 2015 1:15 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
>
> > >> +                       /* INT80 syscall entrypoint can be used by
> > >> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
> > >> +                        * Therefore it must preserve R12+
> > >> +                        * (they are callee-saved registers in 64-bit C ABI).
> > >> +                        *
> > >> +                        * This was probably historically not intended,
> > >> +                        * but R8..11 are clobbered (cleared to 0).
> > >> +                        * IOW: they are the only registers which aren't
> > >> +                        * preserved across INT80 syscall.
> > >> +                        */
> > >> +                       if (*r64 == 0 && num <= 11)
> > >> +                               continue;
> > >
> > > Ugh.  I'll change my big entry patchset to preserve these and maybe to
> > > preserve all of the 64-bit regs.
> >
> > If you do that, this won't change the ABI: we don't _promise_
> > to save them. If we accidentally do, that means nothing.
>
> Argh, that's dangerous nonsense! You _still_ don't seem to understand what the
> Linux ABI means and how to change code that implements it...

I think Denys might be taking about R8-R11 here.  If we change them
from clobbered to saved, that's probably fine.  Certainly I have to
save R12-R15 -- my v1 is just buggy there.  I was too deep in
__kernel_vsyscall when I wrote that code, and I wasn't thinking about
the raw int $0x80 entry variant.

I'd be rather surprised if anything broke if we started preserving
R8-R11 instead of zeroing them.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-14 17:55       ` Andy Lutomirski
@ 2015-09-15  5:59         ` Ingo Molnar
  2015-09-15 18:14           ` Andy Lutomirski
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2015-09-15  5:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Denys Vlasenko, Kees Cook, Borislav Petkov,
	X86 ML, Alexei Starovoitov, Oleg Nesterov, Will Drewry,
	Steven Rostedt, linux-kernel@vger.kernel.org, H. Peter Anvin,
	Linus Torvalds


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Sep 14, 2015 1:15 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
> >
> >
> > * Denys Vlasenko <dvlasenk@redhat.com> wrote:
> >
> > > >> +                       /* INT80 syscall entrypoint can be used by
> > > >> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
> > > >> +                        * Therefore it must preserve R12+
> > > >> +                        * (they are callee-saved registers in 64-bit C ABI).
> > > >> +                        *
> > > >> +                        * This was probably historically not intended,
> > > >> +                        * but R8..11 are clobbered (cleared to 0).
> > > >> +                        * IOW: they are the only registers which aren't
> > > >> +                        * preserved across INT80 syscall.
> > > >> +                        */
> > > >> +                       if (*r64 == 0 && num <= 11)
> > > >> +                               continue;
> > > >
> > > > Ugh.  I'll change my big entry patchset to preserve these and maybe to
> > > > preserve all of the 64-bit regs.
> > >
> > > If you do that, this won't change the ABI: we don't _promise_
> > > to save them. If we accidentally do, that means nothing.
> >
> > Argh, that's dangerous nonsense! You _still_ don't seem to understand what the
> > Linux ABI means and how to change code that implements it...
> 
> I think Denys might be taking about R8-R11 here.  If we change them
> from clobbered to saved, that's probably fine.  Certainly I have to
> save R12-R15 -- my v1 is just buggy there.  I was too deep in
> __kernel_vsyscall when I wrote that code, and I wasn't thinking about
> the raw int $0x80 entry variant.
> 
> I'd be rather surprised if anything broke if we started preserving
> R8-R11 instead of zeroing them.

Well, read the statement:

  " If you do that, this won't change the ABI: we don't _promise_
    to save them. If we accidentally do, that means nothing. "

of _course_ it means everything: if we preserve R8-R11 and any app learns to rely 
on it, it becomes an ABI.

What we 'promise' with our implementation and what our intentions are are entirely 
irrelevant, they simply don't matter to ABI compatibility: it's only what apps do 
with our released kernels that matters, in 99.998% of the cases..

And I'd really like that thinking to permeate everyone's mind who is regularly 
changing this type of code ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-07 15:56 [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test Denys Vlasenko
  2015-09-09 22:01 ` Andy Lutomirski
@ 2015-09-15  6:05 ` Ingo Molnar
  1 sibling, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2015-09-15  6:05 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
	Andy Lutomirski, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

>  tools/testing/selftests/x86/Makefile            |   2 +-
>  tools/testing/selftests/x86/test_syscall_vdso.c | 401 ++++++++++++++++++++++++
>  tools/testing/selftests/x86/thunks_32.S         |  55 ++++
>  3 files changed, 457 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/x86/test_syscall_vdso.c
>  create mode 100644 tools/testing/selftests/x86/thunks_32.S

So I wanted to apply this testcase (testcases are great!), but there's a build 
warning:

test_syscall_vdso.c: In function ‘print_flags’:
test_syscall_vdso.c:239:4: warning: format not a string literal and no format arguments [-Wformat-security]
    printf(bitstr[(r >> bit) & 1]);
    ^

That's with GCC 4.9.2.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-15  5:59         ` Ingo Molnar
@ 2015-09-15 18:14           ` Andy Lutomirski
  2015-09-16  9:51             ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2015-09-15 18:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Denys Vlasenko, Kees Cook, Borislav Petkov,
	X86 ML, Oleg Nesterov, Alexei Starovoitov, Steven Rostedt,
	Will Drewry, linux-kernel@vger.kernel.org, H. Peter Anvin,
	Linus Torvalds

On Sep 14, 2015 11:00 PM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Andy Lutomirski <luto@amacapital.net> wrote:
>
> > On Sep 14, 2015 1:15 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
> > >
> > >
> > > * Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > >
> > > > >> +                       /* INT80 syscall entrypoint can be used by
> > > > >> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
> > > > >> +                        * Therefore it must preserve R12+
> > > > >> +                        * (they are callee-saved registers in 64-bit C ABI).
> > > > >> +                        *
> > > > >> +                        * This was probably historically not intended,
> > > > >> +                        * but R8..11 are clobbered (cleared to 0).
> > > > >> +                        * IOW: they are the only registers which aren't
> > > > >> +                        * preserved across INT80 syscall.
> > > > >> +                        */
> > > > >> +                       if (*r64 == 0 && num <= 11)
> > > > >> +                               continue;
> > > > >
> > > > > Ugh.  I'll change my big entry patchset to preserve these and maybe to
> > > > > preserve all of the 64-bit regs.
> > > >
> > > > If you do that, this won't change the ABI: we don't _promise_
> > > > to save them. If we accidentally do, that means nothing.
> > >
> > > Argh, that's dangerous nonsense! You _still_ don't seem to understand what the
> > > Linux ABI means and how to change code that implements it...
> >
> > I think Denys might be taking about R8-R11 here.  If we change them
> > from clobbered to saved, that's probably fine.  Certainly I have to
> > save R12-R15 -- my v1 is just buggy there.  I was too deep in
> > __kernel_vsyscall when I wrote that code, and I wasn't thinking about
> > the raw int $0x80 entry variant.
> >
> > I'd be rather surprised if anything broke if we started preserving
> > R8-R11 instead of zeroing them.
>
> Well, read the statement:
>
>   " If you do that, this won't change the ABI: we don't _promise_
>     to save them. If we accidentally do, that means nothing. "
>
> of _course_ it means everything: if we preserve R8-R11 and any app learns to rely
> on it, it becomes an ABI.

Right, it changes the ABI in a way that we can't undo, but it probably
doesn't break the old ABI.

Certainly for v2, I'll try to preserve the old behavior exactly.  If
we change it later to preserve all high regs, that'll be a separate
patch.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test
  2015-09-15 18:14           ` Andy Lutomirski
@ 2015-09-16  9:51             ` Ingo Molnar
  0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2015-09-16  9:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Frederic Weisbecker, Denys Vlasenko, Kees Cook, Borislav Petkov,
	X86 ML, Oleg Nesterov, Alexei Starovoitov, Steven Rostedt,
	Will Drewry, linux-kernel@vger.kernel.org, H. Peter Anvin,
	Linus Torvalds


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Sep 14, 2015 11:00 PM, "Ingo Molnar" <mingo@kernel.org> wrote:
> >
> >
> > * Andy Lutomirski <luto@amacapital.net> wrote:
> >
> > > On Sep 14, 2015 1:15 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
> > > >
> > > >
> > > > * Denys Vlasenko <dvlasenk@redhat.com> wrote:
> > > >
> > > > > >> +                       /* INT80 syscall entrypoint can be used by
> > > > > >> +                        * 64-bit programs too, unlike SYSCALL/SYSENTER.
> > > > > >> +                        * Therefore it must preserve R12+
> > > > > >> +                        * (they are callee-saved registers in 64-bit C ABI).
> > > > > >> +                        *
> > > > > >> +                        * This was probably historically not intended,
> > > > > >> +                        * but R8..11 are clobbered (cleared to 0).
> > > > > >> +                        * IOW: they are the only registers which aren't
> > > > > >> +                        * preserved across INT80 syscall.
> > > > > >> +                        */
> > > > > >> +                       if (*r64 == 0 && num <= 11)
> > > > > >> +                               continue;
> > > > > >
> > > > > > Ugh.  I'll change my big entry patchset to preserve these and maybe to
> > > > > > preserve all of the 64-bit regs.
> > > > >
> > > > > If you do that, this won't change the ABI: we don't _promise_
> > > > > to save them. If we accidentally do, that means nothing.
> > > >
> > > > Argh, that's dangerous nonsense! You _still_ don't seem to understand what the
> > > > Linux ABI means and how to change code that implements it...
> > >
> > > I think Denys might be taking about R8-R11 here.  If we change them
> > > from clobbered to saved, that's probably fine.  Certainly I have to
> > > save R12-R15 -- my v1 is just buggy there.  I was too deep in
> > > __kernel_vsyscall when I wrote that code, and I wasn't thinking about
> > > the raw int $0x80 entry variant.
> > >
> > > I'd be rather surprised if anything broke if we started preserving
> > > R8-R11 instead of zeroing them.
> >
> > Well, read the statement:
> >
> >   " If you do that, this won't change the ABI: we don't _promise_
> >     to save them. If we accidentally do, that means nothing. "
> >
> > of _course_ it means everything: if we preserve R8-R11 and any app learns to rely
> > on it, it becomes an ABI.
> 
> Right, it changes the ABI in a way that we can't undo, but it probably
> doesn't break the old ABI.

it's unknown: user-space code might have (unknowingly) started relying on the 
zeroing behavior. If that happened (and let's hope it didn't - but there's no 
guarantee), then the zeroing behavior is an ABI too.

> Certainly for v2, I'll try to preserve the old behavior exactly.  If
> we change it later to preserve all high regs, that'll be a separate
> patch.

Yeah, cleanly separating ABI-invariant patches from ABI-impacting ones (no matter 
how innocious the effect on the ABI looks) is a must.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-09-16  9:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-07 15:56 [PATCH v4 RESEND] x86/asm/entry/32, selftests: Add 'test_syscall_vdso' test Denys Vlasenko
2015-09-09 22:01 ` Andy Lutomirski
2015-09-10 19:04   ` Denys Vlasenko
2015-09-10 19:17     ` Andy Lutomirski
2015-09-14  8:26       ` Ingo Molnar
2015-09-14 17:54         ` Andy Lutomirski
2015-09-14  8:15     ` Ingo Molnar
2015-09-14 17:55       ` Andy Lutomirski
2015-09-15  5:59         ` Ingo Molnar
2015-09-15 18:14           ` Andy Lutomirski
2015-09-16  9:51             ` Ingo Molnar
2015-09-15  6:05 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).