Linux userland API discussions
 help / color / mirror / Atom feed
* [PATCHv9 4/5] sparc: Hook up execveat system call.
From: David Drysdale @ 2014-11-19 17:27 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff,
	linux-kernel, Andrew Morton, David Miller, Thomas Gleixner
  Cc: Oleg Nesterov, Michael Kerrisk, Ingo Molnar, H. Peter Anvin,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86,
	linux-arch, linux-api, sparclinux, David Drysdale
In-Reply-To: <1416418072-18639-1-git-send-email-drysdale@google.com>

Signed-off-by: David Drysdale <drysdale@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc/include/uapi/asm/unistd.h | 3 ++-
 arch/sparc/kernel/systbls_32.S       | 1 +
 arch/sparc/kernel/systbls_64.S       | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/include/uapi/asm/unistd.h b/arch/sparc/include/uapi/asm/unistd.h
index 46d83842eddc..6f35f4df17f2 100644
--- a/arch/sparc/include/uapi/asm/unistd.h
+++ b/arch/sparc/include/uapi/asm/unistd.h
@@ -415,8 +415,9 @@
 #define __NR_getrandom		347
 #define __NR_memfd_create	348
 #define __NR_bpf		349
+#define __NR_execveat		350
 
-#define NR_syscalls		350
+#define NR_syscalls		351
 
 /* Bitmask values returned from kern_features system call.  */
 #define KERN_FEATURE_MIXED_MODE_STACK	0x00000001
diff --git a/arch/sparc/kernel/systbls_32.S b/arch/sparc/kernel/systbls_32.S
index ad0cdf497b78..e31a9056a303 100644
--- a/arch/sparc/kernel/systbls_32.S
+++ b/arch/sparc/kernel/systbls_32.S
@@ -87,3 +87,4 @@ sys_call_table:
 /*335*/	.long sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
 /*340*/	.long sys_ni_syscall, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
 /*345*/	.long sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+/*350*/	.long sys_execveat
diff --git a/arch/sparc/kernel/systbls_64.S b/arch/sparc/kernel/systbls_64.S
index 580cde9370c9..15069cb35dac 100644
--- a/arch/sparc/kernel/systbls_64.S
+++ b/arch/sparc/kernel/systbls_64.S
@@ -88,6 +88,7 @@ sys_call_table32:
 	.word sys_syncfs, compat_sys_sendmmsg, sys_setns, compat_sys_process_vm_readv, compat_sys_process_vm_writev
 /*340*/	.word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
 	.word sys32_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+/*350*/	.word sys_execveat
 
 #endif /* CONFIG_COMPAT */
 
@@ -167,3 +168,4 @@ sys_call_table:
 	.word sys_syncfs, sys_sendmmsg, sys_setns, sys_process_vm_readv, sys_process_vm_writev
 /*340*/	.word sys_kern_features, sys_kcmp, sys_finit_module, sys_sched_setattr, sys_sched_getattr
 	.word sys_renameat2, sys_seccomp, sys_getrandom, sys_memfd_create, sys_bpf
+/*350*/	.word sys_execveat
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCHv9 3/5] syscalls: add selftest for execveat(2)
From: David Drysdale @ 2014-11-19 17:27 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton, David Miller,
	Thomas Gleixner
  Cc: Oleg Nesterov, Michael Kerrisk, Ingo Molnar, H. Peter Anvin,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig,
	x86-DgEjT+Ai2ygdnm+yROfE0A, linux-arch-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	sparclinux-u79uwXL29TY76Z2rM5mHXA, David Drysdale
In-Reply-To: <1416418072-18639-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

Signed-off-by: David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 tools/testing/selftests/Makefile        |   1 +
 tools/testing/selftests/exec/.gitignore |   9 +
 tools/testing/selftests/exec/Makefile   |  25 ++
 tools/testing/selftests/exec/execveat.c | 397 ++++++++++++++++++++++++++++++++
 4 files changed, 432 insertions(+)
 create mode 100644 tools/testing/selftests/exec/.gitignore
 create mode 100644 tools/testing/selftests/exec/Makefile
 create mode 100644 tools/testing/selftests/exec/execveat.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 45f145c6f843..c14893b501a9 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -15,6 +15,7 @@ TARGETS += user
 TARGETS += sysctl
 TARGETS += firmware
 TARGETS += ftrace
+TARGETS += exec
 
 TARGETS_HOTPLUG = cpu-hotplug
 TARGETS_HOTPLUG += memory-hotplug
diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore
new file mode 100644
index 000000000000..64073e050c6a
--- /dev/null
+++ b/tools/testing/selftests/exec/.gitignore
@@ -0,0 +1,9 @@
+subdir*
+script*
+execveat
+execveat.symlink
+execveat.moved
+execveat.path.ephemeral
+execveat.ephemeral
+execveat.denatured
+xxxxxxxx*
\ No newline at end of file
diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile
new file mode 100644
index 000000000000..66dfc2ce1788
--- /dev/null
+++ b/tools/testing/selftests/exec/Makefile
@@ -0,0 +1,25 @@
+CC = $(CROSS_COMPILE)gcc
+CFLAGS = -Wall
+BINARIES = execveat
+DEPS = execveat.symlink execveat.denatured script subdir
+all: $(BINARIES) $(DEPS)
+
+subdir:
+	mkdir -p $@
+script:
+	echo '#!/bin/sh' > $@
+	echo 'exit $$*' >> $@
+	chmod +x $@
+execveat.symlink: execveat
+	ln -s -f $< $@
+execveat.denatured: execveat
+	cp $< $@
+	chmod -x $@
+%: %.c
+	$(CC) $(CFLAGS) -o $@ $^
+
+run_tests: all
+	./execveat
+
+clean:
+	rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved xxxxx*
diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c
new file mode 100644
index 000000000000..33a5c06d95ca
--- /dev/null
+++ b/tools/testing/selftests/exec/execveat.c
@@ -0,0 +1,397 @@
+/*
+ * Copyright (c) 2014 Google, Inc.
+ *
+ * Licensed under the terms of the GNU GPL License version 2
+ *
+ * Selftests for execveat(2).
+ */
+
+#define _GNU_SOURCE  /* to get O_PATH, AT_EMPTY_PATH */
+#include <sys/sendfile.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+static char longpath[2 * PATH_MAX] = "";
+static char *envp[] = { "IN_TEST=yes", NULL, NULL };
+static char *argv[] = { "execveat", "99", NULL };
+
+static int execveat_(int fd, const char *path, char **argv, char **envp,
+		     int flags)
+{
+#ifdef __NR_execveat
+	return syscall(__NR_execveat, fd, path, argv, envp, flags);
+#else
+	errno = -ENOSYS;
+	return -1;
+#endif
+}
+
+#define check_execveat_fail(fd, path, flags, errno)	\
+	_check_execveat_fail(fd, path, flags, errno, #errno)
+static int _check_execveat_fail(int fd, const char *path, int flags,
+				int expected_errno, const char *errno_str)
+{
+	int rc;
+
+	errno = 0;
+	printf("Check failure of execveat(%d, '%s', %d) with %s... ",
+		fd, path?:"(null)", flags, errno_str);
+	rc = execveat_(fd, path, argv, envp, flags);
+
+	if (rc > 0) {
+		printf("[FAIL] (unexpected success from execveat(2))\n");
+		return 1;
+	}
+	if (errno != expected_errno) {
+		printf("[FAIL] (expected errno %d (%s) not %d (%s)\n",
+			expected_errno, strerror(expected_errno),
+			errno, strerror(errno));
+		return 1;
+	}
+	printf("[OK]\n");
+	return 0;
+}
+
+static int check_execveat_invoked_rc(int fd, const char *path, int flags,
+				     int expected_rc)
+{
+	int status;
+	int rc;
+	pid_t child;
+	int pathlen = path ? strlen(path) : 0;
+
+	if (pathlen > 40)
+		printf("Check success of execveat(%d, '%.20s...%s', %d)... ",
+			fd, path, (path + pathlen - 20), flags);
+	else
+		printf("Check success of execveat(%d, '%s', %d)... ",
+			fd, path?:"(null)", flags);
+	child = fork();
+	if (child < 0) {
+		printf("[FAIL] (fork() failed)\n");
+		return 1;
+	}
+	if (child == 0) {
+		/* Child: do execveat(). */
+		rc = execveat_(fd, path, argv, envp, flags);
+		printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n",
+			rc, errno, strerror(errno));
+		exit(1);  /* should not reach here */
+	}
+	/* Parent: wait for & check child's exit status. */
+	rc = waitpid(child, &status, 0);
+	if (rc != child) {
+		printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc);
+		return 1;
+	}
+	if (!WIFEXITED(status)) {
+		printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n",
+			child, status);
+		return 1;
+	}
+	if (WEXITSTATUS(status) != expected_rc) {
+		printf("[FAIL] (child %d exited with %d not %d)\n",
+			child, WEXITSTATUS(status), expected_rc);
+		return 1;
+	}
+	printf("[OK]\n");
+	return 0;
+}
+
+static int check_execveat(int fd, const char *path, int flags)
+{
+	return check_execveat_invoked_rc(fd, path, flags, 99);
+}
+
+static char *concat(const char *left, const char *right)
+{
+	char *result = malloc(strlen(left) + strlen(right) + 1);
+
+	strcpy(result, left);
+	strcat(result, right);
+	return result;
+}
+
+static int open_or_die(const char *filename, int flags)
+{
+	int fd = open(filename, flags);
+
+	if (fd < 0) {
+		printf("Failed to open '%s'; "
+			"check prerequisites are available\n", filename);
+		exit(1);
+	}
+	return fd;
+}
+
+static void exe_cp(const char *src, const char *dest)
+{
+	int in_fd = open_or_die(src, O_RDONLY);
+	int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755);
+	struct stat info;
+
+	fstat(in_fd, &info);
+	sendfile(out_fd, in_fd, NULL, info.st_size);
+	close(in_fd);
+	close(out_fd);
+}
+
+#define XX_DIR_LEN 200
+static int check_execveat_pathmax(int dot_dfd, const char *src, int is_script)
+{
+	int fail = 0;
+	int ii, count, len;
+	char longname[XX_DIR_LEN + 1];
+	int fd;
+
+	if (*longpath == '\0') {
+		/* Create a filename close to PATH_MAX in length */
+		memset(longname, 'x', XX_DIR_LEN - 1);
+		longname[XX_DIR_LEN - 1] = '/';
+		longname[XX_DIR_LEN] = '\0';
+		count = (PATH_MAX - 3) / XX_DIR_LEN;
+		for (ii = 0; ii < count; ii++) {
+			strcat(longpath, longname);
+			mkdir(longpath, 0755);
+		}
+		len = (PATH_MAX - 3) - (count * XX_DIR_LEN);
+		if (len <= 0)
+			len = 1;
+		memset(longname, 'y', len);
+		longname[len] = '\0';
+		strcat(longpath, longname);
+	}
+	exe_cp(src, longpath);
+
+	/*
+	 * Execute as a pre-opened file descriptor, which works whether this is
+	 * a script or not (because the interpreter sees a filename like
+	 * "/dev/fd/20").
+	 */
+	fd = open(longpath, O_RDONLY);
+	if (fd > 0) {
+		printf("Invoke copy of '%s' via filename of length %lu:\n",
+			src, strlen(longpath));
+		fail += check_execveat(fd, "", AT_EMPTY_PATH);
+	} else {
+		printf("Failed to open length %lu filename, errno=%d (%s)\n",
+			strlen(longpath), errno, strerror(errno));
+		fail++;
+	}
+
+	/*
+	 * Execute as a long pathname relative to ".".  If this is a script,
+	 * the interpreter will launch but fail to open the script because its
+	 * name ("/dev/fd/5/xxx....") is bigger than PATH_MAX.
+	 */
+	if (is_script)
+		fail += check_execveat_invoked_rc(dot_dfd, longpath, 0, 127);
+	else
+		fail += check_execveat(dot_dfd, longpath, 0);
+
+	return fail;
+}
+
+static int run_tests(void)
+{
+	int fail = 0;
+	char *fullname = realpath("execveat", NULL);
+	char *fullname_script = realpath("script", NULL);
+	char *fullname_symlink = concat(fullname, ".symlink");
+	int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY);
+	int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral",
+					       O_DIRECTORY|O_RDONLY);
+	int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY);
+	int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH);
+	int dot_dfd_cloexec = open_or_die(".", O_DIRECTORY|O_RDONLY|O_CLOEXEC);
+	int fd = open_or_die("execveat", O_RDONLY);
+	int fd_path = open_or_die("execveat", O_RDONLY|O_PATH);
+	int fd_symlink = open_or_die("execveat.symlink", O_RDONLY);
+	int fd_denatured = open_or_die("execveat.denatured", O_RDONLY);
+	int fd_denatured_path = open_or_die("execveat.denatured",
+					    O_RDONLY|O_PATH);
+	int fd_script = open_or_die("script", O_RDONLY);
+	int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY);
+	int fd_ephemeral_path = open_or_die("execveat.path.ephemeral",
+					    O_RDONLY|O_PATH);
+	int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY);
+	int fd_cloexec = open_or_die("execveat", O_RDONLY|O_CLOEXEC);
+	int fd_script_cloexec = open_or_die("script", O_RDONLY|O_CLOEXEC);
+
+	/* Change file position to confirm it doesn't affect anything */
+	lseek(fd, 10, SEEK_SET);
+
+	/* Normal executable file: */
+	/*   dfd + path */
+	fail += check_execveat(subdir_dfd, "../execveat", 0);
+	fail += check_execveat(dot_dfd, "execveat", 0);
+	fail += check_execveat(dot_dfd_path, "execveat", 0);
+	/*   absolute path */
+	fail += check_execveat(AT_FDCWD, fullname, 0);
+	/*   absolute path with nonsense dfd */
+	fail += check_execveat(99, fullname, 0);
+	/*   fd + no path */
+	fail += check_execveat(fd, "", AT_EMPTY_PATH);
+	/*   O_CLOEXEC fd + no path */
+	fail += check_execveat(fd_cloexec, "", AT_EMPTY_PATH);
+	/*   O_PATH fd */
+	fail += check_execveat(fd_path, "", AT_EMPTY_PATH);
+
+	/* Mess with executable file that's already open: */
+	/*   fd + no path to a file that's been renamed */
+	rename("execveat.ephemeral", "execveat.moved");
+	fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+	/*   fd + no path to a file that's been deleted */
+	unlink("execveat.moved"); /* remove the file now fd open */
+	fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH);
+
+	/* Mess with executable file that's already open with O_PATH */
+	/*   fd + no path to a file that's been deleted */
+	unlink("execveat.path.ephemeral");
+	fail += check_execveat(fd_ephemeral_path, "", AT_EMPTY_PATH);
+
+	/* Invalid argument failures */
+	fail += check_execveat_fail(fd, "", 0, ENOENT);
+	fail += check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT);
+
+	/* Symlink to executable file: */
+	/*   dfd + path */
+	fail += check_execveat(dot_dfd, "execveat.symlink", 0);
+	fail += check_execveat(dot_dfd_path, "execveat.symlink", 0);
+	/*   absolute path */
+	fail += check_execveat(AT_FDCWD, fullname_symlink, 0);
+	/*   fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */
+	fail += check_execveat(fd_symlink, "", AT_EMPTY_PATH);
+	fail += check_execveat(fd_symlink, "",
+			       AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+
+	/* Symlink fails when AT_SYMLINK_NOFOLLOW set: */
+	/*   dfd + path */
+	fail += check_execveat_fail(dot_dfd, "execveat.symlink",
+				    AT_SYMLINK_NOFOLLOW, ELOOP);
+	fail += check_execveat_fail(dot_dfd_path, "execveat.symlink",
+				    AT_SYMLINK_NOFOLLOW, ELOOP);
+	/*   absolute path */
+	fail += check_execveat_fail(AT_FDCWD, fullname_symlink,
+				    AT_SYMLINK_NOFOLLOW, ELOOP);
+
+	/* Shell script wrapping executable file: */
+	/*   dfd + path */
+	fail += check_execveat(subdir_dfd, "../script", 0);
+	fail += check_execveat(dot_dfd, "script", 0);
+	fail += check_execveat(dot_dfd_path, "script", 0);
+	/*   absolute path */
+	fail += check_execveat(AT_FDCWD, fullname_script, 0);
+	/*   fd + no path */
+	fail += check_execveat(fd_script, "", AT_EMPTY_PATH);
+	fail += check_execveat(fd_script, "",
+			       AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW);
+	/*   O_CLOEXEC fd fails for a script (as script file inaccessible) */
+	fail += check_execveat_fail(fd_script_cloexec, "", AT_EMPTY_PATH,
+				    ENOENT);
+	fail += check_execveat_fail(dot_dfd_cloexec, "script", 0, ENOENT);
+
+	/* Mess with script file that's already open: */
+	/*   fd + no path to a file that's been renamed */
+	rename("script.ephemeral", "script.moved");
+	fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+	/*   fd + no path to a file that's been deleted */
+	unlink("script.moved"); /* remove the file while fd open */
+	fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH);
+
+	/* Rename a subdirectory in the path: */
+	rename("subdir.ephemeral", "subdir.moved");
+	fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
+	fail += check_execveat(subdir_dfd_ephemeral, "script", 0);
+	/* Remove the subdir and its contents */
+	unlink("subdir.moved/script");
+	unlink("subdir.moved");
+	/* Shell loads via deleted subdir OK because name starts with .. */
+	fail += check_execveat(subdir_dfd_ephemeral, "../script", 0);
+	fail += check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT);
+
+	/* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */
+	fail += check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL);
+	/* Invalid path => ENOENT */
+	fail += check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT);
+	fail += check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT);
+	fail += check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT);
+	/* Attempt to execute directory => EACCES */
+	fail += check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES);
+	/* Attempt to execute non-executable => EACCES */
+	fail += check_execveat_fail(dot_dfd, "Makefile", 0, EACCES);
+	fail += check_execveat_fail(fd_denatured, "", AT_EMPTY_PATH, EACCES);
+	fail += check_execveat_fail(fd_denatured_path, "", AT_EMPTY_PATH,
+				    EACCES);
+	/* Attempt to execute nonsense FD => EBADF */
+	fail += check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF);
+	fail += check_execveat_fail(99, "execveat", 0, EBADF);
+	/* Attempt to execute relative to non-directory => ENOTDIR */
+	fail += check_execveat_fail(fd, "execveat", 0, ENOTDIR);
+
+	fail += check_execveat_pathmax(dot_dfd, "execveat", 0);
+	fail += check_execveat_pathmax(dot_dfd, "script", 1);
+	return fail;
+}
+
+static void prerequisites(void)
+{
+	int fd;
+	const char *script = "#!/bin/sh\nexit $*\n";
+
+	/* Create ephemeral copies of files */
+	exe_cp("execveat", "execveat.ephemeral");
+	exe_cp("execveat", "execveat.path.ephemeral");
+	exe_cp("script", "script.ephemeral");
+	mkdir("subdir.ephemeral", 0755);
+
+	fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755);
+	write(fd, script, strlen(script));
+	close(fd);
+}
+
+int main(int argc, char **argv)
+{
+	int ii;
+	int rc;
+	const char *verbose = getenv("VERBOSE");
+
+	if (argc >= 2) {
+		/* If we are invoked with an argument, don't run tests. */
+		const char *in_test = getenv("IN_TEST");
+
+		if (verbose) {
+			printf("  invoked with:");
+			for (ii = 0; ii < argc; ii++)
+				printf(" [%d]='%s'", ii, argv[ii]);
+			printf("\n");
+		}
+
+		/* Check expected environment transferred. */
+		if (!in_test || strcmp(in_test, "yes") != 0) {
+			printf("[FAIL] (no IN_TEST=yes in env)\n");
+			return 1;
+		}
+
+		/* Use the final argument as an exit code. */
+		rc = atoi(argv[argc - 1]);
+		fflush(stdout);
+	} else {
+		prerequisites();
+		if (verbose)
+			envp[1] = "VERBOSE=1";
+		rc = run_tests();
+		if (rc > 0)
+			printf("%d tests failed\n", rc);
+	}
+	return rc;
+}
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCHv9 2/5] x86: Hook up execveat system call.
From: David Drysdale @ 2014-11-19 17:27 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff,
	linux-kernel, Andrew Morton, David Miller, Thomas Gleixner
  Cc: Oleg Nesterov, Michael Kerrisk, Ingo Molnar, H. Peter Anvin,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86,
	linux-arch, linux-api, sparclinux, David Drysdale
In-Reply-To: <1416418072-18639-1-git-send-email-drysdale@google.com>

Hook up x86-64, i386 and x32 ABIs.

Signed-off-by: David Drysdale <drysdale@google.com>
---
 arch/x86/ia32/audit.c            |  1 +
 arch/x86/ia32/ia32entry.S        |  1 +
 arch/x86/kernel/audit_64.c       |  1 +
 arch/x86/kernel/entry_64.S       | 28 ++++++++++++++++++++++++++++
 arch/x86/syscalls/syscall_32.tbl |  1 +
 arch/x86/syscalls/syscall_64.tbl |  2 ++
 arch/x86/um/sys_call_table_64.c  |  1 +
 7 files changed, 35 insertions(+)

diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c
index 5d7b381da692..2eccc8932ae6 100644
--- a/arch/x86/ia32/audit.c
+++ b/arch/x86/ia32/audit.c
@@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall)
 	case __NR_socketcall:
 		return 4;
 	case __NR_execve:
+	case __NR_execveat:
 		return 5;
 	default:
 		return 1;
diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index ffe71228fc10..82e8a1d44658 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -480,6 +480,7 @@ GLOBAL(\label)
 	PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn
 	PTREGSCALL stub32_sigreturn, sys32_sigreturn
 	PTREGSCALL stub32_execve, compat_sys_execve
+	PTREGSCALL stub32_execveat, compat_sys_execveat
 	PTREGSCALL stub32_fork, sys_fork
 	PTREGSCALL stub32_vfork, sys_vfork
 
diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c
index 06d3e5a14d9d..f3672508b249 100644
--- a/arch/x86/kernel/audit_64.c
+++ b/arch/x86/kernel/audit_64.c
@@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall)
 	case __NR_openat:
 		return 3;
 	case __NR_execve:
+	case __NR_execveat:
 		return 5;
 	default:
 		return 0;
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index df088bb03fb3..40d893c60fcc 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -652,6 +652,20 @@ ENTRY(stub_execve)
 	CFI_ENDPROC
 END(stub_execve)
 
+ENTRY(stub_execveat)
+	CFI_STARTPROC
+	addq $8, %rsp
+	PARTIAL_FRAME 0
+	SAVE_REST
+	FIXUP_TOP_OF_STACK %r11
+	call sys_execveat
+	RESTORE_TOP_OF_STACK %r11
+	movq %rax,RAX(%rsp)
+	RESTORE_REST
+	jmp int_ret_from_sys_call
+	CFI_ENDPROC
+END(stub_execveat)
+
 /*
  * sigreturn is special because it needs to restore all registers on return.
  * This cannot be done with SYSRET, so use the IRET return path instead.
@@ -697,6 +711,20 @@ ENTRY(stub_x32_execve)
 	CFI_ENDPROC
 END(stub_x32_execve)
 
+ENTRY(stub_x32_execveat)
+	CFI_STARTPROC
+	addq $8, %rsp
+	PARTIAL_FRAME 0
+	SAVE_REST
+	FIXUP_TOP_OF_STACK %r11
+	call compat_sys_execveat
+	RESTORE_TOP_OF_STACK %r11
+	movq %rax,RAX(%rsp)
+	RESTORE_REST
+	jmp int_ret_from_sys_call
+	CFI_ENDPROC
+END(stub_x32_execveat)
+
 #endif
 
 /*
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d002f0..b3560ece1c9f 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,4 @@
 355	i386	getrandom		sys_getrandom
 356	i386	memfd_create		sys_memfd_create
 357	i386	bpf			sys_bpf
+358	i386	execveat		sys_execveat			stub32_execveat
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b539a2..8d656fbb57aa 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,7 @@
 319	common	memfd_create		sys_memfd_create
 320	common	kexec_file_load		sys_kexec_file_load
 321	common	bpf			sys_bpf
+322	64	execveat		stub_execveat
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
@@ -366,3 +367,4 @@
 542	x32	getsockopt		compat_sys_getsockopt
 543	x32	io_setup		compat_sys_io_setup
 544	x32	io_submit		compat_sys_io_submit
+545	x32	execveat		stub_x32_execveat
diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c
index f2f0723070ca..20c3649d0691 100644
--- a/arch/x86/um/sys_call_table_64.c
+++ b/arch/x86/um/sys_call_table_64.c
@@ -31,6 +31,7 @@
 #define stub_fork sys_fork
 #define stub_vfork sys_vfork
 #define stub_execve sys_execve
+#define stub_execveat sys_execveat
 #define stub_rt_sigreturn sys_rt_sigreturn
 
 #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat)
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCHv9 1/5] syscalls: implement execveat() system call
From: David Drysdale @ 2014-11-19 17:27 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff,
	linux-kernel, Andrew Morton, David Miller, Thomas Gleixner
  Cc: Oleg Nesterov, Michael Kerrisk, Ingo Molnar, H. Peter Anvin,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86,
	linux-arch, linux-api, sparclinux, David Drysdale
In-Reply-To: <1416418072-18639-1-git-send-email-drysdale@google.com>

Add a new execveat(2) system call. execveat() is to execve() as
openat() is to open(): it takes a file descriptor that refers to a
directory, and resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers. This
replicates the functionality of fexecve(), which is a system call in
other UNIXen, but in Linux glibc it depends on opening
"/proc/self/fd/<fd>" (and so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also,
script execution via an O_CLOEXEC file descriptor fails (as the file
will not be accessible after exec).

Based on patches by Meredydd Luff <meredydd@senatehouse.org>

Signed-off-by: David Drysdale <drysdale@google.com>
---
 fs/binfmt_em86.c                  |   4 ++
 fs/binfmt_misc.c                  |   4 ++
 fs/binfmt_script.c                |  10 ++++
 fs/exec.c                         | 113 +++++++++++++++++++++++++++++++++-----
 fs/namei.c                        |   2 +-
 include/linux/binfmts.h           |   4 ++
 include/linux/compat.h            |   3 +
 include/linux/fs.h                |   1 +
 include/linux/sched.h             |   4 ++
 include/linux/syscalls.h          |   5 ++
 include/uapi/asm-generic/unistd.h |   4 +-
 kernel/sys_ni.c                   |   3 +
 lib/audit.c                       |   3 +
 13 files changed, 145 insertions(+), 15 deletions(-)

diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index f37b08cea1f7..490538536cb4 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm)
 			return -ENOEXEC;
 	}
 
+	/* Need to be able to load the file after exec */
+	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+		return -ENOENT;
+
 	allow_write_access(bprm->file);
 	fput(bprm->file);
 	bprm->file = NULL;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index fd8beb9657a2..85acb8c83a9a 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -142,6 +142,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (!fmt)
 		goto _ret;
 
+	/* Need to be able to load the file after exec */
+	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+		return -ENOENT;
+
 	if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) {
 		retval = remove_arg_zero(bprm);
 		if (retval)
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 5027a3e14922..afdf4e3cafc2 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm)
 
 	if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
 		return -ENOEXEC;
+
+	/*
+	 * If the script filename will be inaccessible after exec, typically
+	 * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
+	 * up now (on the assumption that the interpreter will want to load
+	 * this file).
+	 */
+	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+		return -ENOENT;
+
 	/*
 	 * This section does the #! interpretation.
 	 * Sorta complicated, but hopefully it will work.  -TYT
diff --git a/fs/exec.c b/fs/exec.c
index 7302b75a9820..6ce5cc47a201 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -747,18 +747,25 @@ EXPORT_SYMBOL(setup_arg_pages);
 
 #endif /* CONFIG_MMU */
 
-static struct file *do_open_exec(struct filename *name)
+static struct file *do_open_execat(int fd, struct filename *name, int flags)
 {
 	struct file *file;
 	int err;
-	static const struct open_flags open_exec_flags = {
+	struct open_flags open_exec_flags = {
 		.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
 		.acc_mode = MAY_EXEC | MAY_OPEN,
 		.intent = LOOKUP_OPEN,
 		.lookup_flags = LOOKUP_FOLLOW,
 	};
 
-	file = do_filp_open(AT_FDCWD, name, &open_exec_flags);
+	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
+		return ERR_PTR(-EINVAL);
+	if (flags & AT_SYMLINK_NOFOLLOW)
+		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
+	if (flags & AT_EMPTY_PATH)
+		open_exec_flags.lookup_flags |= LOOKUP_EMPTY;
+
+	file = do_filp_open(fd, name, &open_exec_flags);
 	if (IS_ERR(file))
 		goto out;
 
@@ -769,12 +776,13 @@ static struct file *do_open_exec(struct filename *name)
 	if (file->f_path.mnt->mnt_flags & MNT_NOEXEC)
 		goto exit;
 
-	fsnotify_open(file);
-
 	err = deny_write_access(file);
 	if (err)
 		goto exit;
 
+	if (name->name[0] != '\0')
+		fsnotify_open(file);
+
 out:
 	return file;
 
@@ -786,7 +794,7 @@ exit:
 struct file *open_exec(const char *name)
 {
 	struct filename tmp = { .name = name };
-	return do_open_exec(&tmp);
+	return do_open_execat(AT_FDCWD, &tmp, 0);
 }
 EXPORT_SYMBOL(open_exec);
 
@@ -1427,10 +1435,12 @@ static int exec_binprm(struct linux_binprm *bprm)
 /*
  * sys_execve() executes a new program.
  */
-static int do_execve_common(struct filename *filename,
-				struct user_arg_ptr argv,
-				struct user_arg_ptr envp)
+static int do_execveat_common(int fd, struct filename *filename,
+			      struct user_arg_ptr argv,
+			      struct user_arg_ptr envp,
+			      int flags)
 {
+	char *pathbuf = NULL;
 	struct linux_binprm *bprm;
 	struct file *file;
 	struct files_struct *displaced;
@@ -1471,7 +1481,7 @@ static int do_execve_common(struct filename *filename,
 	check_unsafe_exec(bprm);
 	current->in_execve = 1;
 
-	file = do_open_exec(filename);
+	file = do_open_execat(fd, filename, flags);
 	retval = PTR_ERR(file);
 	if (IS_ERR(file))
 		goto out_unmark;
@@ -1479,7 +1489,28 @@ static int do_execve_common(struct filename *filename,
 	sched_exec();
 
 	bprm->file = file;
-	bprm->filename = bprm->interp = filename->name;
+	if (fd == AT_FDCWD || filename->name[0] == '/') {
+		bprm->filename = filename->name;
+	} else {
+		if (filename->name[0] == '\0')
+			pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d", fd);
+		else
+			pathbuf = kasprintf(GFP_TEMPORARY, "/dev/fd/%d/%s",
+					    fd, filename->name);
+		if (!pathbuf) {
+			retval = -ENOMEM;
+			goto out_unmark;
+		}
+		/*
+		 * Record that a name derived from an O_CLOEXEC fd will be
+		 * inaccessible after exec. Relies on having exclusive access to
+		 * current->files (due to unshare_files above).
+		 */
+		if (close_on_exec(fd, rcu_dereference_raw(current->files->fdt)))
+			bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE;
+		bprm->filename = pathbuf;
+	}
+	bprm->interp = bprm->filename;
 
 	retval = bprm_mm_init(bprm);
 	if (retval)
@@ -1520,6 +1551,7 @@ static int do_execve_common(struct filename *filename,
 	acct_update_integrals(current);
 	task_numa_free(current);
 	free_bprm(bprm);
+	kfree(pathbuf);
 	putname(filename);
 	if (displaced)
 		put_files_struct(displaced);
@@ -1537,6 +1569,7 @@ out_unmark:
 
 out_free:
 	free_bprm(bprm);
+	kfree(pathbuf);
 
 out_files:
 	if (displaced)
@@ -1552,7 +1585,18 @@ int do_execve(struct filename *filename,
 {
 	struct user_arg_ptr argv = { .ptr.native = __argv };
 	struct user_arg_ptr envp = { .ptr.native = __envp };
-	return do_execve_common(filename, argv, envp);
+	return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+int do_execveat(int fd, struct filename *filename,
+		const char __user *const __user *__argv,
+		const char __user *const __user *__envp,
+		int flags)
+{
+	struct user_arg_ptr argv = { .ptr.native = __argv };
+	struct user_arg_ptr envp = { .ptr.native = __envp };
+
+	return do_execveat_common(fd, filename, argv, envp, flags);
 }
 
 #ifdef CONFIG_COMPAT
@@ -1568,7 +1612,23 @@ static int compat_do_execve(struct filename *filename,
 		.is_compat = true,
 		.ptr.compat = __envp,
 	};
-	return do_execve_common(filename, argv, envp);
+	return do_execveat_common(AT_FDCWD, filename, argv, envp, 0);
+}
+
+static int compat_do_execveat(int fd, struct filename *filename,
+			      const compat_uptr_t __user *__argv,
+			      const compat_uptr_t __user *__envp,
+			      int flags)
+{
+	struct user_arg_ptr argv = {
+		.is_compat = true,
+		.ptr.compat = __argv,
+	};
+	struct user_arg_ptr envp = {
+		.is_compat = true,
+		.ptr.compat = __envp,
+	};
+	return do_execveat_common(fd, filename, argv, envp, flags);
 }
 #endif
 
@@ -1608,6 +1668,20 @@ SYSCALL_DEFINE3(execve,
 {
 	return do_execve(getname(filename), argv, envp);
 }
+
+SYSCALL_DEFINE5(execveat,
+		int, fd, const char __user *, filename,
+		const char __user *const __user *, argv,
+		const char __user *const __user *, envp,
+		int, flags)
+{
+	int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+	return do_execveat(fd,
+			   getname_flags(filename, lookup_flags, NULL),
+			   argv, envp, flags);
+}
+
 #ifdef CONFIG_COMPAT
 COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
 	const compat_uptr_t __user *, argv,
@@ -1615,4 +1689,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
 {
 	return compat_do_execve(getname(filename), argv, envp);
 }
+
+COMPAT_SYSCALL_DEFINE5(execveat, int, fd,
+		       const char __user *, filename,
+		       const compat_uptr_t __user *, argv,
+		       const compat_uptr_t __user *, envp,
+		       int,  flags)
+{
+	int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0;
+
+	return compat_do_execveat(fd,
+				  getname_flags(filename, lookup_flags, NULL),
+				  argv, envp, flags);
+}
 #endif
diff --git a/fs/namei.c b/fs/namei.c
index db5fe86319e6..ca814165d84c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -130,7 +130,7 @@ void final_putname(struct filename *name)
 
 #define EMBEDDED_NAME_MAX	(PATH_MAX - sizeof(struct filename))
 
-static struct filename *
+struct filename *
 getname_flags(const char __user *filename, int flags, int *empty)
 {
 	struct filename *result, *err;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 61f29e5ea840..576e4639ca60 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -53,6 +53,10 @@ struct linux_binprm {
 #define BINPRM_FLAGS_EXECFD_BIT 1
 #define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)
 
+/* filename of the binary will be inaccessible after exec */
+#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
+#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)
+
 /* Function parameter for binfmt->coredump */
 struct coredump_params {
 	const siginfo_t *siginfo;
diff --git a/include/linux/compat.h b/include/linux/compat.h
index e6494261eaff..7450ca2ac1fc 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int);
 
 asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv,
 		     const compat_uptr_t __user *envp);
+asmlinkage long compat_sys_execveat(int dfd, const char __user *filename,
+		     const compat_uptr_t __user *argv,
+		     const compat_uptr_t __user *envp, int flags);
 
 asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp,
 		compat_ulong_t __user *outp, compat_ulong_t __user *exp,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ab779e8a63c..133b60b1d4d0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2072,6 +2072,7 @@ extern int vfs_open(const struct path *, struct file *, const struct cred *);
 extern struct file * dentry_open(const struct path *, int, const struct cred *);
 extern int filp_close(struct file *, fl_owner_t id);
 
+extern struct filename *getname_flags(const char __user *, int, int *);
 extern struct filename *getname(const char __user *);
 extern struct filename *getname_kernel(const char *);
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e344bbe63ec..344163d09efb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2441,6 +2441,10 @@ extern void do_group_exit(int);
 extern int do_execve(struct filename *,
 		     const char __user * const __user *,
 		     const char __user * const __user *);
+extern int do_execveat(int, struct filename *,
+		       const char __user * const __user *,
+		       const char __user * const __user *,
+		       int);
 extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *);
 struct task_struct *fork_idle(int);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index bda9b81357cc..1ff5a4d09693 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -877,4 +877,9 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
 asmlinkage long sys_getrandom(char __user *buf, size_t count,
 			      unsigned int flags);
 asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+
+asmlinkage long sys_execveat(int dfd, const char __user *filename,
+			const char __user *const __user *argv,
+			const char __user *const __user *envp, int flags);
+
 #endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 22749c134117..e016bd9b1a04 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -707,9 +707,11 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
 __SYSCALL(__NR_memfd_create, sys_memfd_create)
 #define __NR_bpf 280
 __SYSCALL(__NR_bpf, sys_bpf)
+#define __NR_execveat 281
+__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
 
 #undef __NR_syscalls
-#define __NR_syscalls 281
+#define __NR_syscalls 282
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 02aa4185b17e..832fba6e2eb1 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -224,3 +224,6 @@ cond_syscall(sys_seccomp);
 
 /* access BPF programs and maps */
 cond_syscall(sys_bpf);
+
+/* execveat */
+cond_syscall(sys_execveat);
diff --git a/lib/audit.c b/lib/audit.c
index 1d726a22565b..b8fb5ee81e26 100644
--- a/lib/audit.c
+++ b/lib/audit.c
@@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall)
 	case __NR_socketcall:
 		return 4;
 #endif
+#ifdef __NR_execveat
+	case __NR_execveat:
+#endif
 	case __NR_execve:
 		return 5;
 	default:
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCHv9 0/5] syscalls,x86,sparc: Add execveat() system call
From: David Drysdale @ 2014-11-19 17:27 UTC (permalink / raw)
  To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff,
	linux-kernel, Andrew Morton, David Miller, Thomas Gleixner
  Cc: Oleg Nesterov, Michael Kerrisk, Ingo Molnar, H. Peter Anvin,
	Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86,
	linux-arch, linux-api, sparclinux, David Drysdale

This patch set adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc
filesystem, at least for executables (rather than scripts).  The
current glibc version of fexecve(3) is implemented via /proc, which
causes problems in sandboxed or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA
suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2)
syscall would be an appropriate generalization.

Also, having a new syscall means that it can take a flags argument
without back-compatibility concerns.  The current implementation just
defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other
flags could be added in future -- for example, flags for new namespaces
(as suggested at https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.


Changes since v8:
 - Split core/fs changes from x86 changes [Thomas Gleixner]

Changes since v7:
 - Speculatively wire up sparc version of syscall (untested)
 - Fix leak of pathbuf in mainline arm [Oleg Nesterov]
 - Add rcu_dereference_raw() on fdt access [sparse kbuild robot]
 - Realigned comment [Andrew Morton]
 - Merged up to v3.18-rc4

Changes since v6:
 - Remove special case for O_PATH file descriptors [Andy Lutomirski]
 - Use kasprintf rather than error-prone arithmetic [Kees Cook]
 - Add test for long name [Kees Cook]
 - Add test for non-executable O_PATH fd [Andy Lutomirski]

Changes since v5:
 - Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts
   that invoke an interpreter fail the exec (as they will not be able
   to access the invoked file). [Andy Lutomirski]
 - Don't truncate long paths. [Andy Lutomirski]
 - Commonize code to open the executed file. [Eric W. Biederman]
 - Mark O_PATH file descriptors so they cannot be fexecve()ed.
 - Make self-test more helpful, and add additional cases:
     - file offset non-zero
     - binary file without execute bit
     - O_CLOEXEC fds

Changes since v4, suggested by Eric W. Biederman:
 - Use empty filename with AT_EMPTY_PATH flag rather than NULL
   pathname to request fexecve-like behaviour.
 - Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>")
   rather than using d_path().
 - Patch against v3.17 (bfe01a5ba249)

Changes since Meredydd's v3 patch:
 - Added a selftest.
 - Added a man page.
 - Left open_exec() signature untouched to reduce patch impact
   elsewhere (as suggested by Al Viro).
 - Filled in bprm->filename with d_path() into a buffer, to avoid use
   of potentially-ephemeral dentry->d_name.
 - Patch against v3.14 (455c6fdbd21916).


David Drysdale (4):
  syscalls: implement execveat() system call
  x86: Hook up execveat system call.
  syscalls: add selftest for execveat(2)
  sparc: Hook up execveat system call.

 arch/sparc/include/uapi/asm/unistd.h    |   3 +-
 arch/sparc/kernel/systbls_32.S          |   1 +
 arch/sparc/kernel/systbls_64.S          |   2 +
 arch/x86/ia32/audit.c                   |   1 +
 arch/x86/ia32/ia32entry.S               |   1 +
 arch/x86/kernel/audit_64.c              |   1 +
 arch/x86/kernel/entry_64.S              |  28 +++
 arch/x86/syscalls/syscall_32.tbl        |   1 +
 arch/x86/syscalls/syscall_64.tbl        |   2 +
 arch/x86/um/sys_call_table_64.c         |   1 +
 fs/binfmt_em86.c                        |   4 +
 fs/binfmt_misc.c                        |   4 +
 fs/binfmt_script.c                      |  10 +
 fs/exec.c                               | 113 +++++++--
 fs/namei.c                              |   2 +-
 include/linux/binfmts.h                 |   4 +
 include/linux/compat.h                  |   3 +
 include/linux/fs.h                      |   1 +
 include/linux/sched.h                   |   4 +
 include/linux/syscalls.h                |   5 +
 include/uapi/asm-generic/unistd.h       |   4 +-
 kernel/sys_ni.c                         |   3 +
 lib/audit.c                             |   3 +
 tools/testing/selftests/Makefile        |   1 +
 tools/testing/selftests/exec/.gitignore |   9 +
 tools/testing/selftests/exec/Makefile   |  25 ++
 tools/testing/selftests/exec/execveat.c | 397 ++++++++++++++++++++++++++++++++
 27 files changed, 617 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/exec/.gitignore
 create mode 100644 tools/testing/selftests/exec/Makefile
 create mode 100644 tools/testing/selftests/exec/execveat.c

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply

* Re: [PATCH 1/3] tun: move internal flag defines out of uapi
From: Dan Williams @ 2014-11-19 17:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, rusty, davem, Jason Wang, Zhi Yong Wu, Tom Herbert,
	Ben Hutchings, Masatake YAMATO, Xi Wang, netdev, linux-api
In-Reply-To: <20141119170835.GA29928@redhat.com>

On Wed, 2014-11-19 at 19:08 +0200, Michael S. Tsirkin wrote:
> On Wed, Nov 19, 2014 at 06:50:17PM +0200, Michael S. Tsirkin wrote:
> > On Wed, Nov 19, 2014 at 10:47:14AM -0600, Dan Williams wrote:
> > > On Wed, 2014-11-19 at 18:18 +0200, Michael S. Tsirkin wrote:
> > > > TUN_ flags are internal and never exposed
> > > > to userspace. Any application using it is almost
> > > > certainly buggy.
> > > 
> > > Except for TUN_TUN_DEV and TUN_TAP_DEV and TUN_TYPE_MASK...  which we're
> > > using (for some reason) in NetworkManager, though I'll happily convert
> > > those to IFF_* instead.  It might be worth #defining those to their
> > > IFF_* equivalents since their usage is not technically broken.
> > > 
> > > Dan
> > 
> > Hmm you are right, they happen to have the same value.
> > I'll send v2 leaving these in place.
> > 
> 
> Though I do think userspace shouldn't depend on them generally,
> so it might be a good idea to stop using them, even though
> I'll fix up my patches to avoid breaking this usecase.

Yeah, I'm doing an NM patch right now to use IFF_*.

Dan

> 
> 
> > > > Move them out to tun.c, we'll remove them in follow-up patches.
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  include/uapi/linux/if_tun.h | 14 --------------
> > > >  drivers/net/tun.c           | 14 ++++++++++++++
> > > >  2 files changed, 14 insertions(+), 14 deletions(-)
> > > > 
> > > > diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> > > > index e9502dd..b82c276 100644
> > > > --- a/include/uapi/linux/if_tun.h
> > > > +++ b/include/uapi/linux/if_tun.h
> > > > @@ -23,20 +23,6 @@
> > > >  /* Read queue size */
> > > >  #define TUN_READQ_SIZE	500
> > > >  
> > > > -/* TUN device flags */
> > > > -#define TUN_TUN_DEV 	0x0001	
> > > > -#define TUN_TAP_DEV	0x0002
> > > > -#define TUN_TYPE_MASK   0x000f
> > > > -
> > > > -#define TUN_FASYNC	0x0010
> > > > -#define TUN_NOCHECKSUM	0x0020
> > > > -#define TUN_NO_PI	0x0040
> > > > -/* This flag has no real effect */
> > > > -#define TUN_ONE_QUEUE	0x0080
> > > > -#define TUN_PERSIST 	0x0100	
> > > > -#define TUN_VNET_HDR 	0x0200
> > > > -#define TUN_TAP_MQ      0x0400
> > > > -
> > > >  /* Ioctl defines */
> > > >  #define TUNSETNOCSUM  _IOW('T', 200, int) 
> > > >  #define TUNSETDEBUG   _IOW('T', 201, int) 
> > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > index 2e18ddd..81735f5 100644
> > > > --- a/drivers/net/tun.c
> > > > +++ b/drivers/net/tun.c
> > > > @@ -103,6 +103,20 @@ do {								\
> > > >  } while (0)
> > > >  #endif
> > > >  
> > > > +/* TUN device flags */
> > > > +#define TUN_TUN_DEV 	0x0001
> > > > +#define TUN_TAP_DEV	0x0002
> > > > +#define TUN_TYPE_MASK   0x000f
> > > > +
> > > > +#define TUN_FASYNC	0x0010
> > > > +#define TUN_NOCHECKSUM	0x0020
> > > > +#define TUN_NO_PI	0x0040
> > > > +/* This flag has no real effect */
> > > > +#define TUN_ONE_QUEUE	0x0080
> > > > +#define TUN_PERSIST 	0x0100
> > > > +#define TUN_VNET_HDR 	0x0200
> > > > +#define TUN_TAP_MQ      0x0400
> > > > +
> > > >  #define GOODCOPY_LEN 128
> > > >  
> > > >  #define FLT_EXACT_COUNT 8
> > > 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/3] tun: move internal flag defines out of uapi
From: Michael S. Tsirkin @ 2014-11-19 17:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, rusty-8n+1lVoiYb80n/F98K4Iww,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, Jason Wang, Zhi Yong Wu,
	Tom Herbert, Ben Hutchings, Masatake YAMATO, Xi Wang,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20141119165017.GA29759-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Wed, Nov 19, 2014 at 06:50:17PM +0200, Michael S. Tsirkin wrote:
> On Wed, Nov 19, 2014 at 10:47:14AM -0600, Dan Williams wrote:
> > On Wed, 2014-11-19 at 18:18 +0200, Michael S. Tsirkin wrote:
> > > TUN_ flags are internal and never exposed
> > > to userspace. Any application using it is almost
> > > certainly buggy.
> > 
> > Except for TUN_TUN_DEV and TUN_TAP_DEV and TUN_TYPE_MASK...  which we're
> > using (for some reason) in NetworkManager, though I'll happily convert
> > those to IFF_* instead.  It might be worth #defining those to their
> > IFF_* equivalents since their usage is not technically broken.
> > 
> > Dan
> 
> Hmm you are right, they happen to have the same value.
> I'll send v2 leaving these in place.
> 

Though I do think userspace shouldn't depend on them generally,
so it might be a good idea to stop using them, even though
I'll fix up my patches to avoid breaking this usecase.



> > > Move them out to tun.c, we'll remove them in follow-up patches.
> > > Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > ---
> > >  include/uapi/linux/if_tun.h | 14 --------------
> > >  drivers/net/tun.c           | 14 ++++++++++++++
> > >  2 files changed, 14 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> > > index e9502dd..b82c276 100644
> > > --- a/include/uapi/linux/if_tun.h
> > > +++ b/include/uapi/linux/if_tun.h
> > > @@ -23,20 +23,6 @@
> > >  /* Read queue size */
> > >  #define TUN_READQ_SIZE	500
> > >  
> > > -/* TUN device flags */
> > > -#define TUN_TUN_DEV 	0x0001	
> > > -#define TUN_TAP_DEV	0x0002
> > > -#define TUN_TYPE_MASK   0x000f
> > > -
> > > -#define TUN_FASYNC	0x0010
> > > -#define TUN_NOCHECKSUM	0x0020
> > > -#define TUN_NO_PI	0x0040
> > > -/* This flag has no real effect */
> > > -#define TUN_ONE_QUEUE	0x0080
> > > -#define TUN_PERSIST 	0x0100	
> > > -#define TUN_VNET_HDR 	0x0200
> > > -#define TUN_TAP_MQ      0x0400
> > > -
> > >  /* Ioctl defines */
> > >  #define TUNSETNOCSUM  _IOW('T', 200, int) 
> > >  #define TUNSETDEBUG   _IOW('T', 201, int) 
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 2e18ddd..81735f5 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -103,6 +103,20 @@ do {								\
> > >  } while (0)
> > >  #endif
> > >  
> > > +/* TUN device flags */
> > > +#define TUN_TUN_DEV 	0x0001
> > > +#define TUN_TAP_DEV	0x0002
> > > +#define TUN_TYPE_MASK   0x000f
> > > +
> > > +#define TUN_FASYNC	0x0010
> > > +#define TUN_NOCHECKSUM	0x0020
> > > +#define TUN_NO_PI	0x0040
> > > +/* This flag has no real effect */
> > > +#define TUN_ONE_QUEUE	0x0080
> > > +#define TUN_PERSIST 	0x0100
> > > +#define TUN_VNET_HDR 	0x0200
> > > +#define TUN_TAP_MQ      0x0400
> > > +
> > >  #define GOODCOPY_LEN 128
> > >  
> > >  #define FLT_EXACT_COUNT 8
> > 

^ permalink raw reply

* Re: [PATCH 1/3] tun: move internal flag defines out of uapi
From: Michael S. Tsirkin @ 2014-11-19 16:50 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-kernel, rusty, davem, Jason Wang, Zhi Yong Wu, Tom Herbert,
	Ben Hutchings, Masatake YAMATO, Xi Wang, netdev, linux-api
In-Reply-To: <1416415634.4763.9.camel@dcbw.local>

On Wed, Nov 19, 2014 at 10:47:14AM -0600, Dan Williams wrote:
> On Wed, 2014-11-19 at 18:18 +0200, Michael S. Tsirkin wrote:
> > TUN_ flags are internal and never exposed
> > to userspace. Any application using it is almost
> > certainly buggy.
> 
> Except for TUN_TUN_DEV and TUN_TAP_DEV and TUN_TYPE_MASK...  which we're
> using (for some reason) in NetworkManager, though I'll happily convert
> those to IFF_* instead.  It might be worth #defining those to their
> IFF_* equivalents since their usage is not technically broken.
> 
> Dan

Hmm you are right, they happen to have the same value.
I'll send v2 leaving these in place.


> > Move them out to tun.c, we'll remove them in follow-up patches.
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  include/uapi/linux/if_tun.h | 14 --------------
> >  drivers/net/tun.c           | 14 ++++++++++++++
> >  2 files changed, 14 insertions(+), 14 deletions(-)
> > 
> > diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> > index e9502dd..b82c276 100644
> > --- a/include/uapi/linux/if_tun.h
> > +++ b/include/uapi/linux/if_tun.h
> > @@ -23,20 +23,6 @@
> >  /* Read queue size */
> >  #define TUN_READQ_SIZE	500
> >  
> > -/* TUN device flags */
> > -#define TUN_TUN_DEV 	0x0001	
> > -#define TUN_TAP_DEV	0x0002
> > -#define TUN_TYPE_MASK   0x000f
> > -
> > -#define TUN_FASYNC	0x0010
> > -#define TUN_NOCHECKSUM	0x0020
> > -#define TUN_NO_PI	0x0040
> > -/* This flag has no real effect */
> > -#define TUN_ONE_QUEUE	0x0080
> > -#define TUN_PERSIST 	0x0100	
> > -#define TUN_VNET_HDR 	0x0200
> > -#define TUN_TAP_MQ      0x0400
> > -
> >  /* Ioctl defines */
> >  #define TUNSETNOCSUM  _IOW('T', 200, int) 
> >  #define TUNSETDEBUG   _IOW('T', 201, int) 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 2e18ddd..81735f5 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -103,6 +103,20 @@ do {								\
> >  } while (0)
> >  #endif
> >  
> > +/* TUN device flags */
> > +#define TUN_TUN_DEV 	0x0001
> > +#define TUN_TAP_DEV	0x0002
> > +#define TUN_TYPE_MASK   0x000f
> > +
> > +#define TUN_FASYNC	0x0010
> > +#define TUN_NOCHECKSUM	0x0020
> > +#define TUN_NO_PI	0x0040
> > +/* This flag has no real effect */
> > +#define TUN_ONE_QUEUE	0x0080
> > +#define TUN_PERSIST 	0x0100
> > +#define TUN_VNET_HDR 	0x0200
> > +#define TUN_TAP_MQ      0x0400
> > +
> >  #define GOODCOPY_LEN 128
> >  
> >  #define FLT_EXACT_COUNT 8
> 

^ permalink raw reply

* Re: [PATCH 1/3] tun: move internal flag defines out of uapi
From: Dan Williams @ 2014-11-19 16:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, rusty-8n+1lVoiYb80n/F98K4Iww,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, Jason Wang, Zhi Yong Wu,
	Tom Herbert, Ben Hutchings, Masatake YAMATO, Xi Wang,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1416413891-29562-2-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Wed, 2014-11-19 at 18:18 +0200, Michael S. Tsirkin wrote:
> TUN_ flags are internal and never exposed
> to userspace. Any application using it is almost
> certainly buggy.

Except for TUN_TUN_DEV and TUN_TAP_DEV and TUN_TYPE_MASK...  which we're
using (for some reason) in NetworkManager, though I'll happily convert
those to IFF_* instead.  It might be worth #defining those to their
IFF_* equivalents since their usage is not technically broken.

Dan

> Move them out to tun.c, we'll remove them in follow-up patches.
> Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  include/uapi/linux/if_tun.h | 14 --------------
>  drivers/net/tun.c           | 14 ++++++++++++++
>  2 files changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
> index e9502dd..b82c276 100644
> --- a/include/uapi/linux/if_tun.h
> +++ b/include/uapi/linux/if_tun.h
> @@ -23,20 +23,6 @@
>  /* Read queue size */
>  #define TUN_READQ_SIZE	500
>  
> -/* TUN device flags */
> -#define TUN_TUN_DEV 	0x0001	
> -#define TUN_TAP_DEV	0x0002
> -#define TUN_TYPE_MASK   0x000f
> -
> -#define TUN_FASYNC	0x0010
> -#define TUN_NOCHECKSUM	0x0020
> -#define TUN_NO_PI	0x0040
> -/* This flag has no real effect */
> -#define TUN_ONE_QUEUE	0x0080
> -#define TUN_PERSIST 	0x0100	
> -#define TUN_VNET_HDR 	0x0200
> -#define TUN_TAP_MQ      0x0400
> -
>  /* Ioctl defines */
>  #define TUNSETNOCSUM  _IOW('T', 200, int) 
>  #define TUNSETDEBUG   _IOW('T', 201, int) 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2e18ddd..81735f5 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -103,6 +103,20 @@ do {								\
>  } while (0)
>  #endif
>  
> +/* TUN device flags */
> +#define TUN_TUN_DEV 	0x0001
> +#define TUN_TAP_DEV	0x0002
> +#define TUN_TYPE_MASK   0x000f
> +
> +#define TUN_FASYNC	0x0010
> +#define TUN_NOCHECKSUM	0x0020
> +#define TUN_NO_PI	0x0040
> +/* This flag has no real effect */
> +#define TUN_ONE_QUEUE	0x0080
> +#define TUN_PERSIST 	0x0100
> +#define TUN_VNET_HDR 	0x0200
> +#define TUN_TAP_MQ      0x0400
> +
>  #define GOODCOPY_LEN 128
>  
>  #define FLT_EXACT_COUNT 8

^ permalink raw reply

* [PATCH 1/3] tun: move internal flag defines out of uapi
From: Michael S. Tsirkin @ 2014-11-19 16:18 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: rusty-8n+1lVoiYb80n/F98K4Iww, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	Jason Wang, Zhi Yong Wu, Tom Herbert, Ben Hutchings,
	Masatake YAMATO, Xi Wang, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1416413891-29562-1-git-send-email-mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

TUN_ flags are internal and never exposed
to userspace. Any application using it is almost
certainly buggy.

Move them out to tun.c, we'll remove them in follow-up patches.

Signed-off-by: Michael S. Tsirkin <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/uapi/linux/if_tun.h | 14 --------------
 drivers/net/tun.c           | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index e9502dd..b82c276 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -23,20 +23,6 @@
 /* Read queue size */
 #define TUN_READQ_SIZE	500
 
-/* TUN device flags */
-#define TUN_TUN_DEV 	0x0001	
-#define TUN_TAP_DEV	0x0002
-#define TUN_TYPE_MASK   0x000f
-
-#define TUN_FASYNC	0x0010
-#define TUN_NOCHECKSUM	0x0020
-#define TUN_NO_PI	0x0040
-/* This flag has no real effect */
-#define TUN_ONE_QUEUE	0x0080
-#define TUN_PERSIST 	0x0100	
-#define TUN_VNET_HDR 	0x0200
-#define TUN_TAP_MQ      0x0400
-
 /* Ioctl defines */
 #define TUNSETNOCSUM  _IOW('T', 200, int) 
 #define TUNSETDEBUG   _IOW('T', 201, int) 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2e18ddd..81735f5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -103,6 +103,20 @@ do {								\
 } while (0)
 #endif
 
+/* TUN device flags */
+#define TUN_TUN_DEV 	0x0001
+#define TUN_TAP_DEV	0x0002
+#define TUN_TYPE_MASK   0x000f
+
+#define TUN_FASYNC	0x0010
+#define TUN_NOCHECKSUM	0x0020
+#define TUN_NO_PI	0x0040
+/* This flag has no real effect */
+#define TUN_ONE_QUEUE	0x0080
+#define TUN_PERSIST 	0x0100
+#define TUN_VNET_HDR 	0x0200
+#define TUN_TAP_MQ      0x0400
+
 #define GOODCOPY_LEN 128
 
 #define FLT_EXACT_COUNT 8
-- 
MST

^ permalink raw reply related

* Re: [PATCH v2 07/19] selftests/firmware: add install target to enable installing test
From: Shuah Khan @ 2014-11-19 16:13 UTC (permalink / raw)
  To: Kees Cook
  Cc: Greg KH, Andrew Morton, Michal Marek, David S. Miller, Phong Tran,
	David Herrmann, Hugh Dickins, pranith kumar, Eric W. Biederman,
	Serge E. Hallyn, linux-kbuild, LKML, Linux API,
	Network Development
In-Reply-To: <5462B2A3.9000007-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>

On 11/11/2014 06:06 PM, Shuah Khan wrote:
> On 11/11/2014 02:29 PM, Kees Cook wrote:
>> Hi,
>>
>> Sorry, I still really don't like this approach. While it is all in one
>> place (thank you for that), I think it isn't a form that is very
>> workable for the people maintaining the self tests. How about this,
>> instead of per-Makefile customization, why not define an execution
>> framework for these tests instead.
> 
> If I understand correctly, sounds like you don't like the way
> install target is implemented in the individual test Makefiles
> and the changes I made to run_tests targets to address the code
> duplication concern.
> 
> At the moment there is no duplicate code in this patch series
> between install and run_tests targets. This is a  first step
> towards standardizing the framework and a definite improvement
> over what we have currently. As I mentioned earlier, my goal
> is to make it easier for developers to install and run the
> existing tests and evolve the framework as we go.
> 
> Assuming my understanding is correct that:
> 
> -- install and run_tests targets in individual tests can be
>    refined and automated with a common Makefile approach you
>    proposed.
> -- the rest of the user-interface kselftest_install and kselftest
>    are good.
> 
> I would like to propose that we get started with the current
> implementation and refine it based on the following ideas you
> suggested. The refinements you are recommending are confined
> to selftests and can be made after the kselftest_install
> gets added. Adding kselftest_install makes it easier to make
> the refinements as it defines overall UI.
> 

Hi Kees,

Are you ok with the above proposal? I understand this approach
might not be perfect, however it is a step in the right direction
to enable and make it easier to run them. I would like to get some
initial work in for 3.19 if at all possible.

I plan to evolve the back-end to make it easier to write and
maintain for developers writing tests going forward.

thanks,
-- Shuah


-- 
Shuah Khan
Sr. Linux Kernel Developer
Samsung Research America (Silicon Valley)
shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org | (970) 217-8978

^ permalink raw reply

* Re: [PATCH v2 02/19] kbuild: kselftest_install - add a new make target to install selftests
From: Shuah Khan @ 2014-11-19 15:59 UTC (permalink / raw)
  To: mmarek, ebiederm, serge.hallyn
  Cc: gregkh, akpm, davem, keescook, tranmanphong, dh.herrmann, hughd,
	bobby.prani, linux-kbuild, linux-kernel, linux-api, netdev
In-Reply-To: <a2344d4df903d673afe1631118f40917f773cc9a.1415735831.git.shuahkh@osg.samsung.com>

On 11/11/2014 01:27 PM, Shuah Khan wrote:
> Add a new make target to install to install kernel selftests.
> This new target will build and install selftests. kselftest
> target now depends on kselftest_install and runs the generated
> kselftest script to reduce duplicate work and for common look
> and feel when running tests.
> 
> Approach:
> 
> make kselftest_target:
> -- exports kselftest INSTALL_KSFT_PATH
>    default $(INSTALL_MOD_PATH)/lib/kselftest/$(KERNELRELEASE)
> -- exports path for ksefltest.sh
> -- runs selftests make install target:
> 
> selftests make install target
> -- creates kselftest.sh script in install install dir
> -- runs install targets for all INSTALL_TARGETS
>    (Note: ftrace and powerpc aren't included in INSTALL_TARGETS,
>           to not add more content to patch v1 series. This work
>           will happen soon. In this series these two targets are
>           run after running the generated kselftest script, without
>           any regression in the way these tests are run with
>           "make kselftest" prior to this work.)
> -- install target can be run only from top level source dir.
> 
> Individual test make install targets:
> -- install test programs and/or scripts in install dir
> -- append to the ksefltest.sh file to add commands to run test
> -- install target can be run only from top level source dir.
> 
> Adds the following new ways to initiate selftests:
> -- Installing and running kselftest from install directory
>    by running  "make kselftest"
> -- Running kselftest script from install directory
> 
> Maintains the following ways to run tests:
> -- make -C tools/testing/selftests run_tests
> -- make -C tools/testing/selftests TARGETS=target run_tests
>    Ability specify targets: e.g TARGETS=net
> -- make run_tests from tools/testing/selftests
> -- make run_tests from individual test directories:
>    e.g: make run_tests in tools/testing/selftests/breakpoints
> 
> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
> ---
>  Makefile | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)

Hi Marek,

Did you get a chance to take a look at this patch?

-- Shuah


-- 
Shuah Khan
Sr. Linux Kernel Developer
Samsung Research America (Silicon Valley)
shuahkh@osg.samsung.com | (970) 217-8978

^ permalink raw reply

* Re: [PATCH v12 1/3] drm: rockchip: Add basic drm driver
From: Daniel Vetter @ 2014-11-19 13:17 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Mark Rutland, heiko, linux-doc, kever.yang, dri-devel,
	linux-kernel, xxm, linux-rockchip, Grant Likely, huangtao,
	devicetree, Pawel Moll, Ian Campbell, Kumar Gala, yxj,
	Rob Herring, marcheu, Mark yao, xw, linux-api, Randy Dunlap,
	dianders, Greg Kroah-Hartman, cf
In-Reply-To: <20141119100413.7ab613e9@bbrezillon>

On Wed, Nov 19, 2014 at 10:04:13AM +0100, Boris Brezillon wrote:
> AFAIU, the suggestion was to split drm_connector_init and
> drm_connector_register calls:
>  - drm_connector_init call should still be part of the load procedure
>    (this function adds the connector to the connector list which is used
>    by drm_mode_group_init_legacy_group)
>  - drm_connector_register should be called after the device has been
>    registered
> 
> Here what I've done and it seems to work:
> 
> static int atmel_hlcdc_dc_connector_plug_all(struct drm_device *dev)
> {
> 	struct drm_connector *connector, *failed;
> 	int ret;
> 
> 	mutex_lock(&dev->mode_config.mutex);
> 	list_for_each_entry(connector,
> 	&dev->mode_config.connector_list, head) { ret =
> 	drm_connector_register(connector); if (ret) {
> 			failed = connector;
> 			goto err;
> 		}
> 	}
> 	mutex_unlock(&dev->mode_config.mutex);
> 	return 0;
> 
> err:
> 	list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
> 		if (failed == connector)
> 			break;
> 
> 		drm_connector_unregister(connector);
> 	}
> 	mutex_unlock(&dev->mode_config.mutex);
> 
> 	return ret;
> }
> 
> [...]
> 
> static int atmel_hlcdc_dc_drm_probe(struct platform_device *pdev)
> {
> 	struct drm_device *ddev;
> 	int ret;
> 
> 	ddev = drm_dev_alloc(&atmel_hlcdc_dc_driver, &pdev->dev);
> 	if (!ddev)
> 		return -ENOMEM;
> 
> 	ret = drm_dev_set_unique(ddev, dev_name(ddev->dev));
> 	if (ret)
> 		goto err_unref;
> 
> 	ret = atmel_hlcdc_dc_load(ddev);
> 	if (ret)
> 		goto err_unref;
> 
> 	ret = drm_dev_register(ddev, 0);
> 	if (ret)
> 		goto err_unload;
> 
> 	ret = atmel_hlcdc_dc_connector_plug_all(ddev);
> 	if (ret)
> 		goto err_unregister;
> 
> 	return 0;
> 
> err_unregister:
> 	drm_dev_unregister(ddev);
> 
> err_unload:
> 	atmel_hlcdc_dc_unload(ddev);
> 
> err_unref:
> 	drm_dev_unref(ddev);
> 
> 	return ret;
> }
> 
> Daniel, can you confirm that's what you had in mind ?

Yup. To be able to have race-free driver load we need to split object
into an _init step (allocates structs and links to kernel-internal lists)
and _register (makes the object userspace-visible through sysfs and
dev-node kms object lookup idr).

This entire mess is all still fallout from the dark ages of the drm
midlayer and we'll probably have fun with this for another few years ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCHv8 1/4] syscalls,x86: implement execveat() system call
From: Thomas Gleixner @ 2014-11-19 12:40 UTC (permalink / raw)
  To: David Drysdale
  Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff,
	linux-kernel, Andrew Morton, David Miller, Oleg Nesterov,
	Michael Kerrisk, Ingo Molnar, H. Peter Anvin, Kees Cook,
	Arnd Bergmann, Rich Felker, Christoph Hellwig, x86, linux-arch,
	linux-api, sparclinux
In-Reply-To: <1415982183-20525-2-git-send-email-drysdale@google.com>

On Fri, 14 Nov 2014, David Drysdale wrote:
> Only x86-64, i386 and x32 ABIs are supported in this patch.

Can you please split this into the core/fs changes and the actual x86
hookup?
 
Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH v12 1/3] drm: rockchip: Add basic drm driver
From: Boris Brezillon @ 2014-11-19  9:04 UTC (permalink / raw)
  To: Mark yao
  Cc: heiko, David Airlie, Rob Clark, Rob Herring, Pawel Moll,
	Mark Rutland, Ian Campbell, Kumar Gala, Randy Dunlap,
	Grant Likely, Greg Kroah-Hartman, John Stultz, Rom Lemarchand,
	devicetree, linux-doc, linux-kernel, dri-devel, linux-api,
	linux-rockchip, dianders, marcheu, dbehr, olof, djkurtz, cf, xxm,
	huangtao, kever.yang, yxj, xw
In-Reply-To: <546BFA4D.3090700@rock-chips.com>

On Wed, 19 Nov 2014 10:02:53 +0800
Mark yao <mark.yao@rock-chips.com> wrote:

> On 2014年11月19日 09:09, Mark yao wrote:
> > On 2014年11月18日 22:24, Daniel Vetter wrote:
> >> On Tue, Nov 18, 2014 at 02:21:30PM +0100, Boris Brezillon wrote:
> >>> Hi Daniel,
> >>>
> >>> On Tue, 18 Nov 2014 09:32:34 +0100
> >>> Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>
> >>>> On Tue, Nov 18, 2014 at 04:00:29PM +0800, Mark Yao wrote:
> >>>>> From: Mark yao <mark.yao@rock-chips.com>
> >>>>>
> >>>>> This patch adds the basic structure of a DRM Driver for Rockchip 
> >>>>> Socs.
> >>>>>
> >>>>> Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
> >>>>> Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
> >>>>> Acked-by: Daniel Vetter <daniel@ffwll.ch>
> >>>>> Reviewed-by: Rob Clark <robdclark@gmail.com>
> >>>>> ---
> >>>>> Changes in v2:
> >>>>> - use the component framework to defer main drm driver probe
> >>>>>    until all VOP devices have been probed.
> >>>>> - use dma-mapping API with ARM_DMA_USE_IOMMU, create dma mapping by
> >>>>>    master device and each vop device can shared the drm dma mapping.
> >>>>> - use drm_crtc_init_with_planes and drm_universal_plane_init.
> >>>>> - remove unnecessary middle layers.
> >>>>> - add cursor set, move funcs to rockchip drm crtc.
> >>>>> - use vop reset at first init
> >>>>> - reference framebuffer when used and unreference when swap out vop
> >>>>>
> >>>>> Changes in v3:
> >>>>> - change "crtc->fb" to "crtc->primary-fb"
> >>>>> Adviced by Daniel Vetter
> >>>>> - init cursor plane with universal api, remove unnecessary cursor 
> >>>>> set,move
> >>>>>
> >>>>> Changes in v4:
> >>>>> Adviced by David Herrmann
> >>>>> - remove drm_platform_*() usage, use register drm device directly.
> >>>> Minor fixup for that part below.
> >>>>
> >>>> [snip]
> >>>>
> >>>>> +static int rockchip_drm_bind(struct device *dev)
> >>>>> +{
> >>>>> +    struct drm_device *drm;
> >>>>> +    int ret;
> >>>>> +
> >>>>> +    drm = drm_dev_alloc(&rockchip_drm_driver, dev);
> >>>>> +    if (!drm)
> >>>>> +        return -ENOMEM;
> >>>>> +
> >>>>> +    ret = drm_dev_set_unique(drm, "%s", dev_name(dev));
> >>>>> +    if (ret)
> >>>>> +        goto err_free;
> >>>> Please call rockchip_drm_load here directly and don't put it as the 
> >>>> ->load
> >>>> function into the driver vtable. The point of the alloc/register 
> >>>> split is
> >>>> that the driver can be completely set up _before_ we register 
> >>>> anything.
> >>>> But for backwards compat and historical reasons ->load is called 
> >>>> somewhere
> >>>> in the middle (so that you could access the minor nodes if needed, 
> >>>> since
> >>>> some drivers do that).
> >>> I tried to do the same in the atmel-hlcdc DRM driver, but I need the
> >>> primary drm_minor to register the connectors (see this kernel
> >>> backtrace [1]), which means I either initialize the connector in the
> >>> wrong place (currently part of the drm load process), or I just can't
> >>> call atmel_hlcdc_dc_load before registering the drm device...
> >> Hm right, wonder who that works with rockchip tbh.
> >>
> >> We did split up the drm_connector setup into _init and register, so 
> >> if you
> >> want to do this then you need to move the call to drm_connector_register
> >> below the call to drm_dev_register.
> >>
> >> We should probably have a drm_connectors_register_all helper which does
> >> this for all connectors on the connector list. And also grabs the
> >> appropriate lock.
> >>
> >> I guess it's somewhat obvious that no one yet actually tried this ;-)
> >> -Daniel
> > right, I re-test the driver with it, get the same problem with Boris.
> > I should move drm_connector_register below the drm_dev_register.
> I try move connector_register below drm_dev_register, but unfortunately, 
> drm_dev_register call
> drm_mode_group_init_legacy_group to save crtc, connector into list, it 
> need below  connector_register.
> so that problem is that now: connector_register need bewteen minor node 
> create and
> drm_mode_group_init_legacy_group.

AFAIU, the suggestion was to split drm_connector_init and
drm_connector_register calls:
 - drm_connector_init call should still be part of the load procedure
   (this function adds the connector to the connector list which is used
   by drm_mode_group_init_legacy_group)
 - drm_connector_register should be called after the device has been
   registered

Here what I've done and it seems to work:

static int atmel_hlcdc_dc_connector_plug_all(struct drm_device *dev)
{
	struct drm_connector *connector, *failed;
	int ret;

	mutex_lock(&dev->mode_config.mutex);
	list_for_each_entry(connector,
	&dev->mode_config.connector_list, head) { ret =
	drm_connector_register(connector); if (ret) {
			failed = connector;
			goto err;
		}
	}
	mutex_unlock(&dev->mode_config.mutex);
	return 0;

err:
	list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
		if (failed == connector)
			break;

		drm_connector_unregister(connector);
	}
	mutex_unlock(&dev->mode_config.mutex);

	return ret;
}

[...]

static int atmel_hlcdc_dc_drm_probe(struct platform_device *pdev)
{
	struct drm_device *ddev;
	int ret;

	ddev = drm_dev_alloc(&atmel_hlcdc_dc_driver, &pdev->dev);
	if (!ddev)
		return -ENOMEM;

	ret = drm_dev_set_unique(ddev, dev_name(ddev->dev));
	if (ret)
		goto err_unref;

	ret = atmel_hlcdc_dc_load(ddev);
	if (ret)
		goto err_unref;

	ret = drm_dev_register(ddev, 0);
	if (ret)
		goto err_unload;

	ret = atmel_hlcdc_dc_connector_plug_all(ddev);
	if (ret)
		goto err_unregister;

	return 0;

err_unregister:
	drm_dev_unregister(ddev);

err_unload:
	atmel_hlcdc_dc_unload(ddev);

err_unref:
	drm_dev_unref(ddev);

	return ret;
}

Daniel, can you confirm that's what you had in mind ?

Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH v2 01/10] crypto: AF_ALG: add user space interface for AEAD
From: Herbert Xu @ 2014-11-19  6:45 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, quentin.gouchet-Re5JQEeQqe8AvxtiuMwx3w, LKML,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, ABI/API
In-Reply-To: <12318471.ucMNmAKX0e-PJstQz4BMNNP20K/wil9xYQuADTiUCJX@public.gmane.org>

On Wed, Nov 19, 2014 at 07:30:52AM +0100, Stephan Mueller wrote:
> 
> - these AD scatterlist chunks cannot be released after a normal encryption 
> operation. The associated data must be available for multiple operations. So, 
> while plaintext data is still flowing in, we need to keep operating with the 
> same AD.

We don't start an AEAD operation until the entire input has been
received.  Unlike ciphers you cannot process AEAD requests as you
go.

So there is no need to special-case AD chunks since you will have
everything at your disposal before you can feed the request to the
crypto API.

> Thus I am wondering how the rather static nature of the AD can fit with the 
> dynamic nature of the plaintext given the current implementation on how 
> plaintext is handled in the kernel.
> 
> To me, AD in league with an IV considering its rather static nature. Having 
> said that, the IV is also not transported via the plaintext interface, but via 
> a setsockopt. Shouldn't the AD be handled the same way?

AD is not like an IV at all.  An IV is a fixed-size (and small)
input while AD can be of any length.

Think about how this is used in real life.  For IPsec AD is the part
of the packet that we don't encrypt.  So there is nothing fundamentally
different between AD and the plain-text that we do encrypt except
that you don't encrypt it :)

Cheers,
-- 
Email: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 01/10] crypto: AF_ALG: add user space interface for AEAD
From: Stephan Mueller @ 2014-11-19  6:30 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Borkmann, quentin.gouchet-Re5JQEeQqe8AvxtiuMwx3w, LKML,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, ABI/API
In-Reply-To: <20141119042704.GA19258-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

Am Mittwoch, 19. November 2014, 12:27:04 schrieb Herbert Xu:

Hi Herbert,

> On Wed, Nov 19, 2014 at 05:20:42AM +0100, Stephan Mueller wrote:
> > When looking deeper into skcipher_sendmsg, I see that the input data is
> > copied into the kernel using memcpy_fromiovec. The memory is allocated
> > before the memcpy call by skcipher_alloc_sgl.
> 
> Zero-copy is done through sendpage.

I am slightly at a loss here -- if you could give me a hint on how you think 
it can be implemented, I would be grateful.

Let us assume the AD || plaintext buffer is known to the kernel, either 
through sendpage or sendmsg. The entire buffer is split into chunks of 
scatterlists with ctx->tsgl. After processing one scatterlist from ctx->tsgl, 
that scatterlist is released via skcipher_pull_sgl.

Now, for AD, we need to consider:

- AD can span multiple ctx->tsgl chunks

- these AD scatterlist chunks cannot be released after a normal encryption 
operation. The associated data must be available for multiple operations. So, 
while plaintext data is still flowing in, we need to keep operating with the 
same AD.

Thus I am wondering how the rather static nature of the AD can fit with the 
dynamic nature of the plaintext given the current implementation on how 
plaintext is handled in the kernel.

To me, AD in league with an IV considering its rather static nature. Having 
said that, the IV is also not transported via the plaintext interface, but via 
a setsockopt. Shouldn't the AD be handled the same way?
> 
> Cheers,


-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH v2 01/10] crypto: AF_ALG: add user space interface for AEAD
From: Herbert Xu @ 2014-11-19  4:27 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Daniel Borkmann, quentin.gouchet-Re5JQEeQqe8AvxtiuMwx3w, LKML,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, ABI/API
In-Reply-To: <2398701.sGeMzIcHaz-PJstQz4BMNNP20K/wil9xYQuADTiUCJX@public.gmane.org>

On Wed, Nov 19, 2014 at 05:20:42AM +0100, Stephan Mueller wrote:
> 
> When looking deeper into skcipher_sendmsg, I see that the input data is copied 
> into the kernel using memcpy_fromiovec. The memory is allocated before the 
> memcpy call by skcipher_alloc_sgl.

Zero-copy is done through sendpage.

Cheers,
-- 
Email: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 01/10] crypto: AF_ALG: add user space interface for AEAD
From: Stephan Mueller @ 2014-11-19  4:20 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Daniel Borkmann, quentin.gouchet, LKML, linux-crypto, ABI/API
In-Reply-To: <20141118140631.GA12100@gondor.apana.org.au>

Am Dienstag, 18. November 2014, 22:06:31 schrieb Herbert Xu:

Hi Herbert,

> On Sun, Nov 16, 2014 at 03:23:50AM +0100, Stephan Mueller wrote:
> > AEAD requires the following data in addition to normal symmetric
> > 
> > ciphers:
> > 	* Associated authentication data of arbitrary length
> > 	
> > 	* Authentication tag for decryption
> > 	
> > 	* Length of authentication tag for encryption
> > 
> > The authentication tag data is communicated as part of the actual
> > ciphertext as mandated by the kernel crypto API. Therefore we only need
> > to provide a user space interface for the associated authentication data
> > as well as for the authentication tag length.
> > 
> > This patch adds both as a setsockopt interface that is identical to the
> > AF_ALG interface for setting an IV and for selecting the cipher
> > operation type (encrypt or decrypt).
> > 
> > Signed-off-by: Stephan Mueller <smueller@chronox.de>
> 
> I don't like the fact that we're putting arbitrary limits on
> the AD, as well as the fact that the way you're doing it the
> AD has to be copied.
> 
> How about simply saying that the first X bytes of the input
> shall be the AD?

When looking deeper into skcipher_sendmsg, I see that the input data is copied 
into the kernel using memcpy_fromiovec. The memory is allocated before the 
memcpy call by skcipher_alloc_sgl.

The AD is input data and therefore needs to be copied into the kernel just 
like plaintext. Of course it is possible to require user space to concatenate 
the AD with the plaintext (or ciphertext in case of decryption). However, I 
see the following drawbacks when we do that:

- either user space has to do a very good memory allocation or it has to copy 
the data into the buffer before the plaintext. Also, if the plaintext buffer 
is not allocated in the right way, even the plaintext has to be copied to the 
newly created buffer that concatenates the AD with the plaintext. So, unless 
user space is very careful, at least some memcpy must be done in user space to 
accommodate the requirement that AD and plaintext is concatenated.

- The kernel function of skcipher_sendmsg would now become very complicated, 
because a similar logic currently applied to plaintext needs to be also 
implemented to copy and track the initial AD into the kernel.

However I see your point as the sendmsg approach to handle AD currently 
implements two memcpy() calls: one is the copy_from_user of the sendmsg system 
call framework to copy in msg and then the memcpy in skcipher_sendmsg.

To avoid the cluttering of the skcipher_sendmsg function (which already is 
complex), may I propose that a setsockopt call is used just like the SET_IV 
call? When using the setsockopt call, I see the following advantages:

- only one memcpy (the copy_from_user) is needed to move the data into kernel 
land -- this would be then en-par with the skcipher_sendmsg approach.

- the implementation of the setsockopt approach would be very clean and small 
as just one copy_from_user call is to be made followed by a 
aead_request_set_assoc call.

- user space memory handling is very clean as user space does not need a very 
specific memory setup to deliver AD. So, if the memory layout is not as 
specific as needed for the AEAD call, with the setsockopt approach, we do not 
need additional memcpy calls in user space.

In any case, we need to set some upper boundary for the maximum size of the 
AD. As I do not know of any standardized upper limit for the AD size, I would 
consider the standard CAVP testing requirements. These tests have the maximum 
AD size of 1<<16. When using the setsockopt call approach we can allocate the 
in-kernel AD memory at the setsockopt call time. IMHO it would be save now to 
limit the maximum AD size to 1<<16 at this point.
> 
> Cheers,


-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH v2 06/10] crypto: AF_ALG: make setkey optional
From: Stephan Mueller @ 2014-11-19  2:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Daniel Borkmann, quentin.gouchet, LKML, linux-crypto, ABI/API
In-Reply-To: <20141118141013.GC12100@gondor.apana.org.au>

Am Dienstag, 18. November 2014, 22:10:13 schrieb Herbert Xu:

Hi Herbert,

> On Sun, Nov 16, 2014 at 03:26:58AM +0100, Stephan Mueller wrote:
> > The current AF_ALG implementation requires that a userspace interface
> > implementation must provide a callback for setkey. Such a call is not
> > appliable to random number generators.
> > 
> > To prepare AF_ALG for the addition of a random number generator user
> > space interface, this function callback invocation is made optional.
> > 
> > Signed-off-by: Stephan Mueller <smueller@chronox.de>
> 
> Did you actually try this? AFAICS setkey is already optional.

You are correct. I tested the kernel without my patch and the setkey on the 
RNG handle is rejected. I now also see the check already present in the 
alg_setkey function.

This patch will be removed from a new patchset.
> 
> Cheers,


-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH v12 1/3] drm: rockchip: Add basic drm driver
From: Mark yao @ 2014-11-19  2:02 UTC (permalink / raw)
  To: Boris Brezillon, heiko, David Airlie, Rob Clark, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Randy Dunlap,
	Grant Likely, Greg Kroah-Hartman, John Stultz, Rom Lemarchand,
	devicetree, linux-doc, linux-kernel, dri-devel, linux-api,
	linux-rockchip, dianders, marcheu, dbehr, olof, djkurtz, cf, xxm,
	huangtao
In-Reply-To: <546BEDD9.1000205@rock-chips.com>


[-- Attachment #1.1: Type: text/plain, Size: 3918 bytes --]

On 2014年11月19日 09:09, Mark yao wrote:
> On 2014年11月18日 22:24, Daniel Vetter wrote:
>> On Tue, Nov 18, 2014 at 02:21:30PM +0100, Boris Brezillon wrote:
>>> Hi Daniel,
>>>
>>> On Tue, 18 Nov 2014 09:32:34 +0100
>>> Daniel Vetter <daniel@ffwll.ch> wrote:
>>>
>>>> On Tue, Nov 18, 2014 at 04:00:29PM +0800, Mark Yao wrote:
>>>>> From: Mark yao <mark.yao@rock-chips.com>
>>>>>
>>>>> This patch adds the basic structure of a DRM Driver for Rockchip 
>>>>> Socs.
>>>>>
>>>>> Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
>>>>> Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
>>>>> Acked-by: Daniel Vetter <daniel@ffwll.ch>
>>>>> Reviewed-by: Rob Clark <robdclark@gmail.com>
>>>>> ---
>>>>> Changes in v2:
>>>>> - use the component framework to defer main drm driver probe
>>>>>    until all VOP devices have been probed.
>>>>> - use dma-mapping API with ARM_DMA_USE_IOMMU, create dma mapping by
>>>>>    master device and each vop device can shared the drm dma mapping.
>>>>> - use drm_crtc_init_with_planes and drm_universal_plane_init.
>>>>> - remove unnecessary middle layers.
>>>>> - add cursor set, move funcs to rockchip drm crtc.
>>>>> - use vop reset at first init
>>>>> - reference framebuffer when used and unreference when swap out vop
>>>>>
>>>>> Changes in v3:
>>>>> - change "crtc->fb" to "crtc->primary-fb"
>>>>> Adviced by Daniel Vetter
>>>>> - init cursor plane with universal api, remove unnecessary cursor 
>>>>> set,move
>>>>>
>>>>> Changes in v4:
>>>>> Adviced by David Herrmann
>>>>> - remove drm_platform_*() usage, use register drm device directly.
>>>> Minor fixup for that part below.
>>>>
>>>> [snip]
>>>>
>>>>> +static int rockchip_drm_bind(struct device *dev)
>>>>> +{
>>>>> +    struct drm_device *drm;
>>>>> +    int ret;
>>>>> +
>>>>> +    drm = drm_dev_alloc(&rockchip_drm_driver, dev);
>>>>> +    if (!drm)
>>>>> +        return -ENOMEM;
>>>>> +
>>>>> +    ret = drm_dev_set_unique(drm, "%s", dev_name(dev));
>>>>> +    if (ret)
>>>>> +        goto err_free;
>>>> Please call rockchip_drm_load here directly and don't put it as the 
>>>> ->load
>>>> function into the driver vtable. The point of the alloc/register 
>>>> split is
>>>> that the driver can be completely set up _before_ we register 
>>>> anything.
>>>> But for backwards compat and historical reasons ->load is called 
>>>> somewhere
>>>> in the middle (so that you could access the minor nodes if needed, 
>>>> since
>>>> some drivers do that).
>>> I tried to do the same in the atmel-hlcdc DRM driver, but I need the
>>> primary drm_minor to register the connectors (see this kernel
>>> backtrace [1]), which means I either initialize the connector in the
>>> wrong place (currently part of the drm load process), or I just can't
>>> call atmel_hlcdc_dc_load before registering the drm device...
>> Hm right, wonder who that works with rockchip tbh.
>>
>> We did split up the drm_connector setup into _init and register, so 
>> if you
>> want to do this then you need to move the call to drm_connector_register
>> below the call to drm_dev_register.
>>
>> We should probably have a drm_connectors_register_all helper which does
>> this for all connectors on the connector list. And also grabs the
>> appropriate lock.
>>
>> I guess it's somewhat obvious that no one yet actually tried this ;-)
>> -Daniel
> right, I re-test the driver with it, get the same problem with Boris.
> I should move drm_connector_register below the drm_dev_register.
I try move connector_register below drm_dev_register, but unfortunately, 
drm_dev_register call
drm_mode_group_init_legacy_group to save crtc, connector into list, it 
need below  connector_register.
so that problem is that now: connector_register need bewteen minor node 
create and
drm_mode_group_init_legacy_group.
I'm consider to revert the drm/rockchip v13.


[-- Attachment #1.2: Type: text/html, Size: 7546 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCH v12 1/3] drm: rockchip: Add basic drm driver
From: Mark yao @ 2014-11-19  1:09 UTC (permalink / raw)
  To: Boris Brezillon, heiko, David Airlie, Rob Clark, Rob Herring,
	Pawel Moll, Mark Rutland, Ian Campbell, Kumar Gala, Randy Dunlap,
	Grant Likely, Greg Kroah-Hartman, John Stultz, Rom Lemarchand,
	devicetree, linux-doc, linux-kernel, dri-devel, linux-api,
	linux-rockchip, dianders, marcheu, dbehr, olof, djkurtz, cf, xxm,
	huangtao, kever.
In-Reply-To: <20141118142447.GC25711@phenom.ffwll.local>

On 2014年11月18日 22:24, Daniel Vetter wrote:
> On Tue, Nov 18, 2014 at 02:21:30PM +0100, Boris Brezillon wrote:
>> Hi Daniel,
>>
>> On Tue, 18 Nov 2014 09:32:34 +0100
>> Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>>> On Tue, Nov 18, 2014 at 04:00:29PM +0800, Mark Yao wrote:
>>>> From: Mark yao <mark.yao@rock-chips.com>
>>>>
>>>> This patch adds the basic structure of a DRM Driver for Rockchip Socs.
>>>>
>>>> Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
>>>> Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
>>>> Acked-by: Daniel Vetter <daniel@ffwll.ch>
>>>> Reviewed-by: Rob Clark <robdclark@gmail.com>
>>>> ---
>>>> Changes in v2:
>>>> - use the component framework to defer main drm driver probe
>>>>    until all VOP devices have been probed.
>>>> - use dma-mapping API with ARM_DMA_USE_IOMMU, create dma mapping by
>>>>    master device and each vop device can shared the drm dma mapping.
>>>> - use drm_crtc_init_with_planes and drm_universal_plane_init.
>>>> - remove unnecessary middle layers.
>>>> - add cursor set, move funcs to rockchip drm crtc.
>>>> - use vop reset at first init
>>>> - reference framebuffer when used and unreference when swap out vop
>>>>
>>>> Changes in v3:
>>>> - change "crtc->fb" to "crtc->primary-fb"
>>>> Adviced by Daniel Vetter
>>>> - init cursor plane with universal api, remove unnecessary cursor set,move
>>>>
>>>> Changes in v4:
>>>> Adviced by David Herrmann
>>>> - remove drm_platform_*() usage, use register drm device directly.
>>> Minor fixup for that part below.
>>>
>>> [snip]
>>>
>>>> +static int rockchip_drm_bind(struct device *dev)
>>>> +{
>>>> +	struct drm_device *drm;
>>>> +	int ret;
>>>> +
>>>> +	drm = drm_dev_alloc(&rockchip_drm_driver, dev);
>>>> +	if (!drm)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	ret = drm_dev_set_unique(drm, "%s", dev_name(dev));
>>>> +	if (ret)
>>>> +		goto err_free;
>>> Please call rockchip_drm_load here directly and don't put it as the ->load
>>> function into the driver vtable. The point of the alloc/register split is
>>> that the driver can be completely set up _before_ we register anything.
>>> But for backwards compat and historical reasons ->load is called somewhere
>>> in the middle (so that you could access the minor nodes if needed, since
>>> some drivers do that).
>> I tried to do the same in the atmel-hlcdc DRM driver, but I need the
>> primary drm_minor to register the connectors (see this kernel
>> backtrace [1]), which means I either initialize the connector in the
>> wrong place (currently part of the drm load process), or I just can't
>> call atmel_hlcdc_dc_load before registering the drm device...
> Hm right, wonder who that works with rockchip tbh.
>
> We did split up the drm_connector setup into _init and register, so if you
> want to do this then you need to move the call to drm_connector_register
> below the call to drm_dev_register.
>
> We should probably have a drm_connectors_register_all helper which does
> this for all connectors on the connector list. And also grabs the
> appropriate lock.
>
> I guess it's somewhat obvious that no one yet actually tried this ;-)
> -Daniel
right, I re-test the driver with it, get the same problem with Boris.
I should move drm_connector_register below the drm_dev_register.


^ permalink raw reply

* Re: [PATCH v2 02/10] crypto: AF_ALG: user space interface for cipher info
From: Herbert Xu @ 2014-11-19  1:05 UTC (permalink / raw)
  To: Stephan Mueller
  Cc: Steffen Klassert, Daniel Borkmann, quentin.gouchet, LKML,
	linux-crypto, ABI/API
In-Reply-To: <34547475.R7WC8JFgoJ@tachyon.chronox.de>

On Wed, Nov 19, 2014 at 02:02:33AM +0100, Stephan Mueller wrote:
>
> I am wondering why cryptouser.h is in include/linux -- shouldn't it be in 
> include/uapi/linux? Aren't the definitions in that header file needed for 
> userspace to talk to the netlink socket? I guess that is also the reason why I 

It should be.  I think crypto_user predates the uapi split so
that's probably why it's in the wrong place.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2 02/10] crypto: AF_ALG: user space interface for cipher info
From: Stephan Mueller @ 2014-11-19  1:02 UTC (permalink / raw)
  To: Herbert Xu, Steffen Klassert
  Cc: Daniel Borkmann, quentin.gouchet, LKML, linux-crypto, ABI/API
In-Reply-To: <20141118140822.GB12100@gondor.apana.org.au>

Am Dienstag, 18. November 2014, 22:08:23 schrieb Herbert Xu:

Hi Herbert,

> On Sun, Nov 16, 2014 at 03:24:25AM +0100, Stephan Mueller wrote:
> > The AF_ALG interface allows normal cipher (hash, encrypt, decrypt).
> > However, it does not allow user space to obtain the following generic
> > 
> > information about the currently active cipher:
> > 	* block size of the cipher
> > 	
> > 	* IV size of the cipher
> > 	
> > 	* for AEAD, the maximum authentication tag size
> > 
> > The patch adds a getsockopt interface for the symmetric ciphers to
> > answer such information requests from user space.
> > 
> > The kernel crypto API function calls are used to obtain the real data.
> > As all data are simple integer values, the getsockopt handler function
> > uses put_user() to return the integer value to user space in the
> > *optval parameter of getsockopt.
> > 
> > Signed-off-by: Stephan Mueller <smueller@chronox.de>
> 
> We already have crypto_user so you should be extending that to
> cover what's missing.

Looking into that, I think nothing is missing -- thanks for the pointer. Thus 
I think I can withdraw that patch and just simply update libkcapi to use that 
user space netlink interface.

Though, I yet have to try using that interface ;-)

I am wondering why cryptouser.h is in include/linux -- shouldn't it be in 
include/uapi/linux? Aren't the definitions in that header file needed for 
userspace to talk to the netlink socket? I guess that is also the reason why I 
do not see the interface API details in /usr/linux on my F21 system.
> 
> PS These paramters should not vary depending on the implementation,
> if they do then one of the implementations must be buggy.
> 
> Cheers,


-- 
Ciao
Stephan

^ permalink raw reply

* Re: [PATCH v2 01/10] crypto: AF_ALG: add user space interface for AEAD
From: Stephan Mueller @ 2014-11-19  0:34 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Borkmann, quentin.gouchet-Re5JQEeQqe8AvxtiuMwx3w, LKML,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA, ABI/API
In-Reply-To: <20141118140631.GA12100-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>

Am Dienstag, 18. November 2014, 22:06:31 schrieb Herbert Xu:

Hi Herbert,

> On Sun, Nov 16, 2014 at 03:23:50AM +0100, Stephan Mueller wrote:
> > AEAD requires the following data in addition to normal symmetric
> > 
> > ciphers:
> > 	* Associated authentication data of arbitrary length
> > 	
> > 	* Authentication tag for decryption
> > 	
> > 	* Length of authentication tag for encryption
> > 
> > The authentication tag data is communicated as part of the actual
> > ciphertext as mandated by the kernel crypto API. Therefore we only need
> > to provide a user space interface for the associated authentication data
> > as well as for the authentication tag length.
> > 
> > This patch adds both as a setsockopt interface that is identical to the
> > AF_ALG interface for setting an IV and for selecting the cipher
> > operation type (encrypt or decrypt).
> > 
> > Signed-off-by: Stephan Mueller <smueller-T9tCv8IpfcWELgA04lAiVw@public.gmane.org>
> 
> I don't like the fact that we're putting arbitrary limits on
> the AD, as well as the fact that the way you're doing it the
> AD has to be copied.
> 
> How about simply saying that the first X bytes of the input
> shall be the AD?

That is a very good idea.

To cover that approach, the kernel needs to be informed about the length of 
the authentication data size to separate the ciphertext/plaintext from the 
authentication data.

To cover that, I would recommend to simply send a u32 value to the kernel for 
the AD size instead of the AD. The kernel then can adjust the pointers as 
necessary.

I will update the patch in the next days to cover that scenario.

Thanks

-- 
Ciao
Stephan

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox