* [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call @ 2014-11-06 16:07 David Drysdale 2014-11-06 16:07 ` [PATCHv6 1/3] syscalls,x86: implement " David Drysdale ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: David Drysdale @ 2014-11-06 16:07 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, David Drysdale Here's another pass at this. Some things to discuss in particular: 1) The current approach for interpreted execs (i.e. mostly "#!" scripts) gives them an argv[1] filename like "/dev/fd/<fd>/<path>". This means that script execution in a /proc-less system isn't going to work, at least until interpreters get smart enough to spot and special-case the leading "/dev/fd/<fd>", or until there's something to use in place of /dev/fd -> /proc/self/fd (e.g. Al's dupfs suggestion, https://lkml.org/lkml/2014/10/19/141). So is an execveat(2) that (currently) only works for non-interpreted programs still useful? 2) I don't like having to add a new LOOKUP_EMPTY_NOPATH flag just to prevent O_PATH fds from being fexecve()ed -- alternative suggestions welcomed. (More generally, I don't have a great feel for what O_PATH is for; how bad would it be to just allow them to be fexecve()ed?) ......... This patch set adds execveat(2) for x86, and is derived from Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). The primary aim of adding an execveat syscall is to allow an implementation of fexecve(3) that does not rely on the /proc filesystem, at least for executables (rather than scripts). The current glibc version of fexecve(3) is implemented via /proc, which causes problems in sandboxed or otherwise restricted environments. Given the desire for a /proc-free fexecve() implementation, HPA suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be an appropriate generalization. Also, having a new syscall means that it can take a flags argument without back-compatibility concerns. The current implementation just defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be added in future -- for example, flags for new namespaces (as suggested at https://lkml.org/lkml/2006/7/11/474). Related history: - https://lkml.org/lkml/2006/12/27/123 is an example of someone realizing that fexecve() is likely to fail in a chroot environment. - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered documenting the /proc requirement of fexecve(3) in its manpage, to "prevent other people from wasting their time". - https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that it's not possible to fexecve() a file descriptor for a script with close-on-exec set (which is possible with the implementation here). - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a problem where a process that did setuid() could not fexecve() because it no longer had access to /proc/self/fd; this has since been fixed. Changes since v5: - Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts that invoke an interpreter fail the exec (as they will not be able to access the invoked file). [Andy Lutomirski] - Don't truncate long paths. [Andy Lutomirski] - Commonize code to open the executed file. [Eric W. Biederman] - Mark O_PATH file descriptors so they cannot be fexecve()ed. - Make self-test more helpful, and add additional cases: - file offset non-zero - binary file without execute bit - O_CLOEXEC fds Changes since v4, suggested by Eric W. Biederman: - Use empty filename with AT_EMPTY_PATH flag rather than NULL pathname to request fexecve-like behaviour. - Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>") rather than using d_path(). - Patch against v3.17 (bfe01a5ba249) Changes since Meredydd's v3 patch: - Added a selftest. - Added a man page. - Left open_exec() signature untouched to reduce patch impact elsewhere (as suggested by Al Viro). - Filled in bprm->filename with d_path() into a buffer, to avoid use of potentially-ephemeral dentry->d_name. - Patch against v3.14 (455c6fdbd21916). David Drysdale (2): syscalls,x86: implement execveat() system call syscalls,x86: add selftest for execveat(2) arch/x86/ia32/audit.c | 1 + arch/x86/ia32/ia32entry.S | 1 + arch/x86/kernel/audit_64.c | 1 + arch/x86/kernel/entry_64.S | 28 +++ arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 2 + arch/x86/um/sys_call_table_64.c | 1 + fs/binfmt_em86.c | 4 + fs/binfmt_misc.c | 4 + fs/binfmt_script.c | 10 + fs/exec.c | 115 ++++++++++-- fs/namei.c | 8 +- include/linux/binfmts.h | 4 + include/linux/compat.h | 3 + include/linux/fs.h | 1 + include/linux/namei.h | 1 + include/linux/sched.h | 4 + include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 3 + lib/audit.c | 3 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/exec/.gitignore | 7 + tools/testing/selftests/exec/Makefile | 25 +++ tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++ 25 files changed, 542 insertions(+), 15 deletions(-) create mode 100644 tools/testing/selftests/exec/.gitignore create mode 100644 tools/testing/selftests/exec/Makefile create mode 100644 tools/testing/selftests/exec/execveat.c -- 2.1.0.rc2.206.gedb03e5 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCHv6 1/3] syscalls,x86: implement execveat() system call 2014-11-06 16:07 [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call David Drysdale @ 2014-11-06 16:07 ` David Drysdale [not found] ` <1415290033-15771-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2014-11-06 16:07 ` [PATCHv6 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale [not found] ` <1415290033-15771-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2 siblings, 1 reply; 9+ messages in thread From: David Drysdale @ 2014-11-06 16:07 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86, linux-arch, linux-api, David Drysdale Add a new execveat(2) system call. execveat() is to execve() as openat() is to open(): it takes a file descriptor that refers to a directory, and resolves the filename relative to that. In addition, if the filename is empty and AT_EMPTY_PATH is specified, execveat() executes the file to which the file descriptor refers. This replicates the functionality of fexecve(), which is a system call in other UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and so relies on /proc being mounted). The filename fed to the executed program as argv[0] (or the name of the script fed to a script interpreter) will be of the form "/dev/fd/<fd>" (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively reflecting how the executable was found. This does however mean that execution of a script in a /proc-less environment won't work; also, script execution via an O_CLOEXEC file descriptor fails (as the file will not be accessible after exec). Only x86-64, i386 and x32 ABIs are supported in this patch. Based on patches by Meredydd Luff <meredydd@senatehouse.org> Signed-off-by: David Drysdale <drysdale@google.com> --- arch/x86/ia32/audit.c | 1 + arch/x86/ia32/ia32entry.S | 1 + arch/x86/kernel/audit_64.c | 1 + arch/x86/kernel/entry_64.S | 28 ++++++++++ arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 2 + arch/x86/um/sys_call_table_64.c | 1 + fs/binfmt_em86.c | 4 ++ fs/binfmt_misc.c | 4 ++ fs/binfmt_script.c | 10 ++++ fs/exec.c | 115 +++++++++++++++++++++++++++++++++----- fs/namei.c | 8 ++- include/linux/binfmts.h | 4 ++ include/linux/compat.h | 3 + include/linux/fs.h | 1 + include/linux/namei.h | 1 + include/linux/sched.h | 4 ++ include/linux/syscalls.h | 4 ++ include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 3 + lib/audit.c | 3 + 21 files changed, 188 insertions(+), 15 deletions(-) diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c index 5d7b381da692..2eccc8932ae6 100644 --- a/arch/x86/ia32/audit.c +++ b/arch/x86/ia32/audit.c @@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall) case __NR_socketcall: return 4; case __NR_execve: + case __NR_execveat: return 5; default: return 1; diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 4299eb05023c..2516c09743e0 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -464,6 +464,7 @@ GLOBAL(\label) PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn PTREGSCALL stub32_sigreturn, sys32_sigreturn PTREGSCALL stub32_execve, compat_sys_execve + PTREGSCALL stub32_execveat, compat_sys_execveat PTREGSCALL stub32_fork, sys_fork PTREGSCALL stub32_vfork, sys_vfork diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c index 06d3e5a14d9d..f3672508b249 100644 --- a/arch/x86/kernel/audit_64.c +++ b/arch/x86/kernel/audit_64.c @@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall) case __NR_openat: return 3; case __NR_execve: + case __NR_execveat: return 5; default: return 0; diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 2fac1343a90b..00c4526e6ffe 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -665,6 +665,20 @@ ENTRY(stub_execve) CFI_ENDPROC END(stub_execve) +ENTRY(stub_execveat) + CFI_STARTPROC + addq $8, %rsp + PARTIAL_FRAME 0 + SAVE_REST + FIXUP_TOP_OF_STACK %r11 + call sys_execveat + RESTORE_TOP_OF_STACK %r11 + movq %rax,RAX(%rsp) + RESTORE_REST + jmp int_ret_from_sys_call + CFI_ENDPROC +END(stub_execveat) + /* * sigreturn is special because it needs to restore all registers on return. * This cannot be done with SYSRET, so use the IRET return path instead. @@ -710,6 +724,20 @@ ENTRY(stub_x32_execve) CFI_ENDPROC END(stub_x32_execve) +ENTRY(stub_x32_execveat) + CFI_STARTPROC + addq $8, %rsp + PARTIAL_FRAME 0 + SAVE_REST + FIXUP_TOP_OF_STACK %r11 + call compat_sys_execveat + RESTORE_TOP_OF_STACK %r11 + movq %rax,RAX(%rsp) + RESTORE_REST + jmp int_ret_from_sys_call + CFI_ENDPROC +END(stub_x32_execveat) + #endif /* diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index 028b78168d85..2633e3195455 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -363,3 +363,4 @@ 354 i386 seccomp sys_seccomp 355 i386 getrandom sys_getrandom 356 i386 memfd_create sys_memfd_create +357 i386 execveat sys_execveat stub32_execveat diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 35dd922727b9..1af5badd159c 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -327,6 +327,7 @@ 318 common getrandom sys_getrandom 319 common memfd_create sys_memfd_create 320 common kexec_file_load sys_kexec_file_load +321 64 execveat stub_execveat # # x32-specific system call numbers start at 512 to avoid cache impact @@ -365,3 +366,4 @@ 542 x32 getsockopt compat_sys_getsockopt 543 x32 io_setup compat_sys_io_setup 544 x32 io_submit compat_sys_io_submit +545 x32 execveat stub_x32_execveat diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c index f2f0723070ca..20c3649d0691 100644 --- a/arch/x86/um/sys_call_table_64.c +++ b/arch/x86/um/sys_call_table_64.c @@ -31,6 +31,7 @@ #define stub_fork sys_fork #define stub_vfork sys_vfork #define stub_execve sys_execve +#define stub_execveat sys_execveat #define stub_rt_sigreturn sys_rt_sigreturn #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat) diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c index f37b08cea1f7..490538536cb4 100644 --- a/fs/binfmt_em86.c +++ b/fs/binfmt_em86.c @@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm) return -ENOEXEC; } + /* Need to be able to load the file after exec */ + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) + return -ENOENT; + allow_write_access(bprm->file); fput(bprm->file); bprm->file = NULL; diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index b60500300dd7..e659f5562356 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -127,6 +127,10 @@ static int load_misc_binary(struct linux_binprm *bprm) if (!fmt) goto _ret; + /* Need to be able to load the file after exec */ + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) + return -ENOENT; + if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) { retval = remove_arg_zero(bprm); if (retval) diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index 5027a3e14922..afdf4e3cafc2 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm) if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!')) return -ENOEXEC; + + /* + * If the script filename will be inaccessible after exec, typically + * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give + * up now (on the assumption that the interpreter will want to load + * this file). + */ + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) + return -ENOENT; + /* * This section does the #! interpretation. * Sorta complicated, but hopefully it will work. -TYT diff --git a/fs/exec.c b/fs/exec.c index a2b42a98c743..800d232c17bb 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -747,18 +747,26 @@ EXPORT_SYMBOL(setup_arg_pages); #endif /* CONFIG_MMU */ -static struct file *do_open_exec(struct filename *name) +static struct file *do_open_execat(int fd, struct filename *name, int flags) { struct file *file; int err; - static const struct open_flags open_exec_flags = { + struct open_flags open_exec_flags = { .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, .acc_mode = MAY_EXEC | MAY_OPEN, .intent = LOOKUP_OPEN, .lookup_flags = LOOKUP_FOLLOW, }; - file = do_filp_open(AT_FDCWD, name, &open_exec_flags); + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) + return ERR_PTR(-EINVAL); + if (flags & AT_SYMLINK_NOFOLLOW) + open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; + if (flags & AT_EMPTY_PATH) + open_exec_flags.lookup_flags |= (LOOKUP_EMPTY | + LOOKUP_EMPTY_NOPATH); + + file = do_filp_open(fd, name, &open_exec_flags); if (IS_ERR(file)) goto out; @@ -769,12 +777,13 @@ static struct file *do_open_exec(struct filename *name) if (file->f_path.mnt->mnt_flags & MNT_NOEXEC) goto exit; - fsnotify_open(file); - err = deny_write_access(file); if (err) goto exit; + if (name->name[0] != '\0') + fsnotify_open(file); + out: return file; @@ -786,7 +795,7 @@ exit: struct file *open_exec(const char *name) { struct filename tmp = { .name = name }; - return do_open_exec(&tmp); + return do_open_execat(AT_FDCWD, &tmp, 0); } EXPORT_SYMBOL(open_exec); @@ -1422,10 +1431,12 @@ static int exec_binprm(struct linux_binprm *bprm) /* * sys_execve() executes a new program. */ -static int do_execve_common(struct filename *filename, - struct user_arg_ptr argv, - struct user_arg_ptr envp) +static int do_execveat_common(int fd, struct filename *filename, + struct user_arg_ptr argv, + struct user_arg_ptr envp, + int flags) { + char *pathbuf = NULL; struct linux_binprm *bprm; struct file *file; struct files_struct *displaced; @@ -1466,7 +1477,7 @@ static int do_execve_common(struct filename *filename, check_unsafe_exec(bprm); current->in_execve = 1; - file = do_open_exec(filename); + file = do_open_execat(fd, filename, flags); retval = PTR_ERR(file); if (IS_ERR(file)) goto out_unmark; @@ -1474,7 +1485,30 @@ static int do_execve_common(struct filename *filename, sched_exec(); bprm->file = file; - bprm->filename = bprm->interp = filename->name; + if (fd == AT_FDCWD || filename->name[0] == '/') { + bprm->filename = filename->name; + } else { + /* "/dev/fd/2147483647/" + filename->name */ + int maxlen = 19 + strlen(filename->name); + + pathbuf = kmalloc(maxlen, GFP_TEMPORARY); + if (!pathbuf) { + retval = -ENOMEM; + goto out_unmark; + } + if (filename->name[0] == '\0') + sprintf(pathbuf, "/dev/fd/%d", fd); + else + snprintf(pathbuf, maxlen, + "/dev/fd/%d/%s", fd, filename->name); + /* Record that a name derived from an O_CLOEXEC fd will be + * inaccessible after exec. Relies on having exclusive access to + * current->files (due to unshare_files above). */ + if (close_on_exec(fd, current->files->fdt)) + bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE; + bprm->filename = pathbuf; + } + bprm->interp = bprm->filename; retval = bprm_mm_init(bprm); if (retval) @@ -1532,6 +1566,7 @@ out_unmark: out_free: free_bprm(bprm); + kfree(pathbuf); out_files: if (displaced) @@ -1547,7 +1582,18 @@ int do_execve(struct filename *filename, { struct user_arg_ptr argv = { .ptr.native = __argv }; struct user_arg_ptr envp = { .ptr.native = __envp }; - return do_execve_common(filename, argv, envp); + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); +} + +int do_execveat(int fd, struct filename *filename, + const char __user *const __user *__argv, + const char __user *const __user *__envp, + int flags) +{ + struct user_arg_ptr argv = { .ptr.native = __argv }; + struct user_arg_ptr envp = { .ptr.native = __envp }; + + return do_execveat_common(fd, filename, argv, envp, flags); } #ifdef CONFIG_COMPAT @@ -1563,7 +1609,23 @@ static int compat_do_execve(struct filename *filename, .is_compat = true, .ptr.compat = __envp, }; - return do_execve_common(filename, argv, envp); + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); +} + +static int compat_do_execveat(int fd, struct filename *filename, + const compat_uptr_t __user *__argv, + const compat_uptr_t __user *__envp, + int flags) +{ + struct user_arg_ptr argv = { + .is_compat = true, + .ptr.compat = __argv, + }; + struct user_arg_ptr envp = { + .is_compat = true, + .ptr.compat = __envp, + }; + return do_execveat_common(fd, filename, argv, envp, flags); } #endif @@ -1603,6 +1665,20 @@ SYSCALL_DEFINE3(execve, { return do_execve(getname(filename), argv, envp); } + +SYSCALL_DEFINE5(execveat, + int, fd, const char __user *, filename, + const char __user *const __user *, argv, + const char __user *const __user *, envp, + int, flags) +{ + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; + + return do_execveat(fd, + getname_flags(filename, lookup_flags, NULL), + argv, envp, flags); +} + #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, const compat_uptr_t __user *, argv, @@ -1610,4 +1686,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, { return compat_do_execve(getname(filename), argv, envp); } + +COMPAT_SYSCALL_DEFINE5(execveat, int, fd, + const char __user *, filename, + const compat_uptr_t __user *, argv, + const compat_uptr_t __user *, envp, + int, flags) +{ + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; + + return compat_do_execveat(fd, + getname_flags(filename, lookup_flags, NULL), + argv, envp, flags); +} #endif diff --git a/fs/namei.c b/fs/namei.c index a7b05bf82d31..757df6777ae5 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -130,7 +130,7 @@ void final_putname(struct filename *name) #define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename)) -static struct filename * +struct filename * getname_flags(const char __user *filename, int flags, int *empty) { struct filename *result, *err; @@ -1891,6 +1891,12 @@ static int path_init(int dfd, const char *name, unsigned int flags, fdput(f); return -ENOTDIR; } + } else if (flags & LOOKUP_EMPTY_NOPATH) { + /* When using the fd alone, disallow O_PATH files */ + if (f.file->f_mode & FMODE_PATH) { + fdput(f); + return -EBADF; + } } nd->path = f.file->f_path; diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 61f29e5ea840..576e4639ca60 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -53,6 +53,10 @@ struct linux_binprm { #define BINPRM_FLAGS_EXECFD_BIT 1 #define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT) +/* filename of the binary will be inaccessible after exec */ +#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2 +#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT) + /* Function parameter for binfmt->coredump */ struct coredump_params { const siginfo_t *siginfo; diff --git a/include/linux/compat.h b/include/linux/compat.h index e6494261eaff..7450ca2ac1fc 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int); asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv, const compat_uptr_t __user *envp); +asmlinkage long compat_sys_execveat(int dfd, const char __user *filename, + const compat_uptr_t __user *argv, + const compat_uptr_t __user *envp, int flags); asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp, compat_ulong_t __user *outp, compat_ulong_t __user *exp, diff --git a/include/linux/fs.h b/include/linux/fs.h index 94187721ad41..e9818574d738 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *, extern struct file * dentry_open(const struct path *, int, const struct cred *); extern int filp_close(struct file *, fl_owner_t id); +extern struct filename *getname_flags(const char __user *, int, int *); extern struct filename *getname(const char __user *); extern struct filename *getname_kernel(const char *); diff --git a/include/linux/namei.h b/include/linux/namei.h index 492de72560fa..eaa25cc72213 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -55,6 +55,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_JUMPED 0x1000 #define LOOKUP_ROOT 0x2000 #define LOOKUP_EMPTY 0x4000 +#define LOOKUP_EMPTY_NOPATH 0x8000 extern int user_path_at(int, const char __user *, unsigned, struct path *); extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); diff --git a/include/linux/sched.h b/include/linux/sched.h index b867a4dab38a..33e056da7d33 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2430,6 +2430,10 @@ extern void do_group_exit(int); extern int do_execve(struct filename *, const char __user * const __user *, const char __user * const __user *); +extern int do_execveat(int, struct filename *, + const char __user * const __user *, + const char __user * const __user *, + int); extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *); struct task_struct *fork_idle(int); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 0f86d85a9ce4..df5422294deb 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, asmlinkage long sys_getrandom(char __user *buf, size_t count, unsigned int flags); +asmlinkage long sys_execveat(int dfd, const char __user *filename, + const char __user *const __user *argv, + const char __user *const __user *envp, int flags); + #endif diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 11d11bc5c78f..feef07d29663 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp) __SYSCALL(__NR_getrandom, sys_getrandom) #define __NR_memfd_create 279 __SYSCALL(__NR_memfd_create, sys_memfd_create) +#define __NR_execveat 280 +__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat) #undef __NR_syscalls -#define __NR_syscalls 280 +#define __NR_syscalls 281 /* * All syscalls below here should go away really, diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 391d4ddb6f4b..efb06058ad3e 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -218,3 +218,6 @@ cond_syscall(sys_kcmp); /* operate on Secure Computing state */ cond_syscall(sys_seccomp); + +/* execveat */ +cond_syscall(sys_execveat); diff --git a/lib/audit.c b/lib/audit.c index 1d726a22565b..b8fb5ee81e26 100644 --- a/lib/audit.c +++ b/lib/audit.c @@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall) case __NR_socketcall: return 4; #endif +#ifdef __NR_execveat + case __NR_execveat: +#endif case __NR_execve: return 5; default: -- 2.1.0.rc2.206.gedb03e5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <1415290033-15771-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCHv6 1/3] syscalls,x86: implement execveat() system call [not found] ` <1415290033-15771-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> @ 2014-11-06 18:06 ` Kees Cook [not found] ` <CAGXu5jKp7wEKnoeOVGUGCeJ3Mv-kkOm+QpXiu4HdWuvQx99auw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Kees Cook @ 2014-11-06 18:06 UTC (permalink / raw) To: David Drysdale Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-arch, Linux API On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: > Add a new execveat(2) system call. execveat() is to execve() as > openat() is to open(): it takes a file descriptor that refers to a > directory, and resolves the filename relative to that. > > In addition, if the filename is empty and AT_EMPTY_PATH is specified, > execveat() executes the file to which the file descriptor refers. This > replicates the functionality of fexecve(), which is a system call in > other UNIXen, but in Linux glibc it depends on opening > "/proc/self/fd/<fd>" (and so relies on /proc being mounted). > > The filename fed to the executed program as argv[0] (or the name of the > script fed to a script interpreter) will be of the form "/dev/fd/<fd>" > (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively > reflecting how the executable was found. This does however mean that > execution of a script in a /proc-less environment won't work; also, > script execution via an O_CLOEXEC file descriptor fails (as the file > will not be accessible after exec). > > Only x86-64, i386 and x32 ABIs are supported in this patch. > > Based on patches by Meredydd Luff <meredydd-zPN50pYk8eUaUu29zAJCuw@public.gmane.org> This'll be quite nice for doing launches into a tight sandbox. > > Signed-off-by: David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> > --- > arch/x86/ia32/audit.c | 1 + > arch/x86/ia32/ia32entry.S | 1 + > arch/x86/kernel/audit_64.c | 1 + > arch/x86/kernel/entry_64.S | 28 ++++++++++ > arch/x86/syscalls/syscall_32.tbl | 1 + > arch/x86/syscalls/syscall_64.tbl | 2 + > arch/x86/um/sys_call_table_64.c | 1 + > fs/binfmt_em86.c | 4 ++ > fs/binfmt_misc.c | 4 ++ > fs/binfmt_script.c | 10 ++++ > fs/exec.c | 115 +++++++++++++++++++++++++++++++++----- > fs/namei.c | 8 ++- > include/linux/binfmts.h | 4 ++ > include/linux/compat.h | 3 + > include/linux/fs.h | 1 + > include/linux/namei.h | 1 + > include/linux/sched.h | 4 ++ > include/linux/syscalls.h | 4 ++ > include/uapi/asm-generic/unistd.h | 4 +- > kernel/sys_ni.c | 3 + > lib/audit.c | 3 + > 21 files changed, 188 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c > index 5d7b381da692..2eccc8932ae6 100644 > --- a/arch/x86/ia32/audit.c > +++ b/arch/x86/ia32/audit.c > @@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall) > case __NR_socketcall: > return 4; > case __NR_execve: > + case __NR_execveat: > return 5; > default: > return 1; > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S > index 4299eb05023c..2516c09743e0 100644 > --- a/arch/x86/ia32/ia32entry.S > +++ b/arch/x86/ia32/ia32entry.S > @@ -464,6 +464,7 @@ GLOBAL(\label) > PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn > PTREGSCALL stub32_sigreturn, sys32_sigreturn > PTREGSCALL stub32_execve, compat_sys_execve > + PTREGSCALL stub32_execveat, compat_sys_execveat > PTREGSCALL stub32_fork, sys_fork > PTREGSCALL stub32_vfork, sys_vfork > > diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c > index 06d3e5a14d9d..f3672508b249 100644 > --- a/arch/x86/kernel/audit_64.c > +++ b/arch/x86/kernel/audit_64.c > @@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall) > case __NR_openat: > return 3; > case __NR_execve: > + case __NR_execveat: > return 5; > default: > return 0; > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > index 2fac1343a90b..00c4526e6ffe 100644 > --- a/arch/x86/kernel/entry_64.S > +++ b/arch/x86/kernel/entry_64.S > @@ -665,6 +665,20 @@ ENTRY(stub_execve) > CFI_ENDPROC > END(stub_execve) > > +ENTRY(stub_execveat) > + CFI_STARTPROC > + addq $8, %rsp > + PARTIAL_FRAME 0 > + SAVE_REST > + FIXUP_TOP_OF_STACK %r11 > + call sys_execveat > + RESTORE_TOP_OF_STACK %r11 > + movq %rax,RAX(%rsp) > + RESTORE_REST > + jmp int_ret_from_sys_call > + CFI_ENDPROC > +END(stub_execveat) > + > /* > * sigreturn is special because it needs to restore all registers on return. > * This cannot be done with SYSRET, so use the IRET return path instead. > @@ -710,6 +724,20 @@ ENTRY(stub_x32_execve) > CFI_ENDPROC > END(stub_x32_execve) > > +ENTRY(stub_x32_execveat) > + CFI_STARTPROC > + addq $8, %rsp > + PARTIAL_FRAME 0 > + SAVE_REST > + FIXUP_TOP_OF_STACK %r11 > + call compat_sys_execveat > + RESTORE_TOP_OF_STACK %r11 > + movq %rax,RAX(%rsp) > + RESTORE_REST > + jmp int_ret_from_sys_call > + CFI_ENDPROC > +END(stub_x32_execveat) > + > #endif > > /* > diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl > index 028b78168d85..2633e3195455 100644 > --- a/arch/x86/syscalls/syscall_32.tbl > +++ b/arch/x86/syscalls/syscall_32.tbl > @@ -363,3 +363,4 @@ > 354 i386 seccomp sys_seccomp > 355 i386 getrandom sys_getrandom > 356 i386 memfd_create sys_memfd_create > +357 i386 execveat sys_execveat stub32_execveat > diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl > index 35dd922727b9..1af5badd159c 100644 > --- a/arch/x86/syscalls/syscall_64.tbl > +++ b/arch/x86/syscalls/syscall_64.tbl > @@ -327,6 +327,7 @@ > 318 common getrandom sys_getrandom > 319 common memfd_create sys_memfd_create > 320 common kexec_file_load sys_kexec_file_load > +321 64 execveat stub_execveat > > # > # x32-specific system call numbers start at 512 to avoid cache impact > @@ -365,3 +366,4 @@ > 542 x32 getsockopt compat_sys_getsockopt > 543 x32 io_setup compat_sys_io_setup > 544 x32 io_submit compat_sys_io_submit > +545 x32 execveat stub_x32_execveat > diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c > index f2f0723070ca..20c3649d0691 100644 > --- a/arch/x86/um/sys_call_table_64.c > +++ b/arch/x86/um/sys_call_table_64.c > @@ -31,6 +31,7 @@ > #define stub_fork sys_fork > #define stub_vfork sys_vfork > #define stub_execve sys_execve > +#define stub_execveat sys_execveat > #define stub_rt_sigreturn sys_rt_sigreturn > > #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat) > diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c > index f37b08cea1f7..490538536cb4 100644 > --- a/fs/binfmt_em86.c > +++ b/fs/binfmt_em86.c > @@ -42,6 +42,10 @@ static int load_em86(struct linux_binprm *bprm) > return -ENOEXEC; > } > > + /* Need to be able to load the file after exec */ > + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) > + return -ENOENT; > + > allow_write_access(bprm->file); > fput(bprm->file); > bprm->file = NULL; > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > index b60500300dd7..e659f5562356 100644 > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -127,6 +127,10 @@ static int load_misc_binary(struct linux_binprm *bprm) > if (!fmt) > goto _ret; > > + /* Need to be able to load the file after exec */ > + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) > + return -ENOENT; > + > if (!(fmt->flags & MISC_FMT_PRESERVE_ARGV0)) { > retval = remove_arg_zero(bprm); > if (retval) > diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c > index 5027a3e14922..afdf4e3cafc2 100644 > --- a/fs/binfmt_script.c > +++ b/fs/binfmt_script.c > @@ -24,6 +24,16 @@ static int load_script(struct linux_binprm *bprm) > > if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!')) > return -ENOEXEC; > + > + /* > + * If the script filename will be inaccessible after exec, typically > + * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give > + * up now (on the assumption that the interpreter will want to load > + * this file). > + */ > + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) > + return -ENOENT; > + > /* > * This section does the #! interpretation. > * Sorta complicated, but hopefully it will work. -TYT > diff --git a/fs/exec.c b/fs/exec.c > index a2b42a98c743..800d232c17bb 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -747,18 +747,26 @@ EXPORT_SYMBOL(setup_arg_pages); > > #endif /* CONFIG_MMU */ > > -static struct file *do_open_exec(struct filename *name) > +static struct file *do_open_execat(int fd, struct filename *name, int flags) > { > struct file *file; > int err; > - static const struct open_flags open_exec_flags = { > + struct open_flags open_exec_flags = { > .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, > .acc_mode = MAY_EXEC | MAY_OPEN, > .intent = LOOKUP_OPEN, > .lookup_flags = LOOKUP_FOLLOW, > }; > > - file = do_filp_open(AT_FDCWD, name, &open_exec_flags); > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > + return ERR_PTR(-EINVAL); > + if (flags & AT_SYMLINK_NOFOLLOW) > + open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > + if (flags & AT_EMPTY_PATH) > + open_exec_flags.lookup_flags |= (LOOKUP_EMPTY | > + LOOKUP_EMPTY_NOPATH); > + > + file = do_filp_open(fd, name, &open_exec_flags); > if (IS_ERR(file)) > goto out; > > @@ -769,12 +777,13 @@ static struct file *do_open_exec(struct filename *name) > if (file->f_path.mnt->mnt_flags & MNT_NOEXEC) > goto exit; > > - fsnotify_open(file); > - > err = deny_write_access(file); > if (err) > goto exit; > > + if (name->name[0] != '\0') > + fsnotify_open(file); > + > out: > return file; > > @@ -786,7 +795,7 @@ exit: > struct file *open_exec(const char *name) > { > struct filename tmp = { .name = name }; > - return do_open_exec(&tmp); > + return do_open_execat(AT_FDCWD, &tmp, 0); > } > EXPORT_SYMBOL(open_exec); > > @@ -1422,10 +1431,12 @@ static int exec_binprm(struct linux_binprm *bprm) > /* > * sys_execve() executes a new program. > */ > -static int do_execve_common(struct filename *filename, > - struct user_arg_ptr argv, > - struct user_arg_ptr envp) > +static int do_execveat_common(int fd, struct filename *filename, > + struct user_arg_ptr argv, > + struct user_arg_ptr envp, > + int flags) > { > + char *pathbuf = NULL; > struct linux_binprm *bprm; > struct file *file; > struct files_struct *displaced; > @@ -1466,7 +1477,7 @@ static int do_execve_common(struct filename *filename, > check_unsafe_exec(bprm); > current->in_execve = 1; > > - file = do_open_exec(filename); > + file = do_open_execat(fd, filename, flags); > retval = PTR_ERR(file); > if (IS_ERR(file)) > goto out_unmark; > @@ -1474,7 +1485,30 @@ static int do_execve_common(struct filename *filename, > sched_exec(); > > bprm->file = file; > - bprm->filename = bprm->interp = filename->name; > + if (fd == AT_FDCWD || filename->name[0] == '/') { > + bprm->filename = filename->name; > + } else { > + /* "/dev/fd/2147483647/" + filename->name */ > + int maxlen = 19 + strlen(filename->name); This should be 20 + strlen (to include the trailing NULL). However, I think this whole bit of code could be replaced with kasprintf... > + > + pathbuf = kmalloc(maxlen, GFP_TEMPORARY); > + if (!pathbuf) { > + retval = -ENOMEM; > + goto out_unmark; > + } > + if (filename->name[0] == '\0') > + sprintf(pathbuf, "/dev/fd/%d", fd); > + else > + snprintf(pathbuf, maxlen, > + "/dev/fd/%d/%s", fd, filename->name); Maybe something like this? A bit messy, so maybe your original if/else would be more readable. } else { pathbuf = kasprintf("/dev/fd/%d%s%s", fd, filename->name[0] == '\0' ? "" : "/", filename->name[0] == '\0' ? "" : filename->name); if (!pathbuf) { retval = -ENOMEM; goto out_unmark; } } > + /* Record that a name derived from an O_CLOEXEC fd will be > + * inaccessible after exec. Relies on having exclusive access to > + * current->files (due to unshare_files above). */ > + if (close_on_exec(fd, current->files->fdt)) > + bprm->interp_flags |= BINPRM_FLAGS_PATH_INACCESSIBLE; > + bprm->filename = pathbuf; > + } > + bprm->interp = bprm->filename; > > retval = bprm_mm_init(bprm); > if (retval) > @@ -1532,6 +1566,7 @@ out_unmark: > > out_free: > free_bprm(bprm); > + kfree(pathbuf); > > out_files: > if (displaced) > @@ -1547,7 +1582,18 @@ int do_execve(struct filename *filename, > { > struct user_arg_ptr argv = { .ptr.native = __argv }; > struct user_arg_ptr envp = { .ptr.native = __envp }; > - return do_execve_common(filename, argv, envp); > + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); > +} > + > +int do_execveat(int fd, struct filename *filename, > + const char __user *const __user *__argv, > + const char __user *const __user *__envp, > + int flags) > +{ > + struct user_arg_ptr argv = { .ptr.native = __argv }; > + struct user_arg_ptr envp = { .ptr.native = __envp }; > + > + return do_execveat_common(fd, filename, argv, envp, flags); > } > > #ifdef CONFIG_COMPAT > @@ -1563,7 +1609,23 @@ static int compat_do_execve(struct filename *filename, > .is_compat = true, > .ptr.compat = __envp, > }; > - return do_execve_common(filename, argv, envp); > + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); > +} > + > +static int compat_do_execveat(int fd, struct filename *filename, > + const compat_uptr_t __user *__argv, > + const compat_uptr_t __user *__envp, > + int flags) > +{ > + struct user_arg_ptr argv = { > + .is_compat = true, > + .ptr.compat = __argv, > + }; > + struct user_arg_ptr envp = { > + .is_compat = true, > + .ptr.compat = __envp, > + }; > + return do_execveat_common(fd, filename, argv, envp, flags); > } > #endif > > @@ -1603,6 +1665,20 @@ SYSCALL_DEFINE3(execve, > { > return do_execve(getname(filename), argv, envp); > } > + > +SYSCALL_DEFINE5(execveat, > + int, fd, const char __user *, filename, > + const char __user *const __user *, argv, > + const char __user *const __user *, envp, > + int, flags) > +{ > + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; > + > + return do_execveat(fd, > + getname_flags(filename, lookup_flags, NULL), > + argv, envp, flags); > +} > + > #ifdef CONFIG_COMPAT > COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, > const compat_uptr_t __user *, argv, > @@ -1610,4 +1686,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, > { > return compat_do_execve(getname(filename), argv, envp); > } > + > +COMPAT_SYSCALL_DEFINE5(execveat, int, fd, > + const char __user *, filename, > + const compat_uptr_t __user *, argv, > + const compat_uptr_t __user *, envp, > + int, flags) > +{ > + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; > + > + return compat_do_execveat(fd, > + getname_flags(filename, lookup_flags, NULL), > + argv, envp, flags); > +} > #endif > diff --git a/fs/namei.c b/fs/namei.c > index a7b05bf82d31..757df6777ae5 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -130,7 +130,7 @@ void final_putname(struct filename *name) > > #define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename)) > > -static struct filename * > +struct filename * > getname_flags(const char __user *filename, int flags, int *empty) > { > struct filename *result, *err; > @@ -1891,6 +1891,12 @@ static int path_init(int dfd, const char *name, unsigned int flags, > fdput(f); > return -ENOTDIR; > } > + } else if (flags & LOOKUP_EMPTY_NOPATH) { > + /* When using the fd alone, disallow O_PATH files */ > + if (f.file->f_mode & FMODE_PATH) { > + fdput(f); > + return -EBADF; > + } > } > > nd->path = f.file->f_path; > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index 61f29e5ea840..576e4639ca60 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -53,6 +53,10 @@ struct linux_binprm { > #define BINPRM_FLAGS_EXECFD_BIT 1 > #define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT) > > +/* filename of the binary will be inaccessible after exec */ > +#define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2 > +#define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT) > + > /* Function parameter for binfmt->coredump */ > struct coredump_params { > const siginfo_t *siginfo; > diff --git a/include/linux/compat.h b/include/linux/compat.h > index e6494261eaff..7450ca2ac1fc 100644 > --- a/include/linux/compat.h > +++ b/include/linux/compat.h > @@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int); > > asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv, > const compat_uptr_t __user *envp); > +asmlinkage long compat_sys_execveat(int dfd, const char __user *filename, > + const compat_uptr_t __user *argv, > + const compat_uptr_t __user *envp, int flags); > > asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp, > compat_ulong_t __user *outp, compat_ulong_t __user *exp, > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 94187721ad41..e9818574d738 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *, > extern struct file * dentry_open(const struct path *, int, const struct cred *); > extern int filp_close(struct file *, fl_owner_t id); > > +extern struct filename *getname_flags(const char __user *, int, int *); > extern struct filename *getname(const char __user *); > extern struct filename *getname_kernel(const char *); > > diff --git a/include/linux/namei.h b/include/linux/namei.h > index 492de72560fa..eaa25cc72213 100644 > --- a/include/linux/namei.h > +++ b/include/linux/namei.h > @@ -55,6 +55,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; > #define LOOKUP_JUMPED 0x1000 > #define LOOKUP_ROOT 0x2000 > #define LOOKUP_EMPTY 0x4000 > +#define LOOKUP_EMPTY_NOPATH 0x8000 > > extern int user_path_at(int, const char __user *, unsigned, struct path *); > extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty); > diff --git a/include/linux/sched.h b/include/linux/sched.h > index b867a4dab38a..33e056da7d33 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -2430,6 +2430,10 @@ extern void do_group_exit(int); > extern int do_execve(struct filename *, > const char __user * const __user *, > const char __user * const __user *); > +extern int do_execveat(int, struct filename *, > + const char __user * const __user *, > + const char __user * const __user *, > + int); > extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *); > struct task_struct *fork_idle(int); > extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index 0f86d85a9ce4..df5422294deb 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, > asmlinkage long sys_getrandom(char __user *buf, size_t count, > unsigned int flags); > > +asmlinkage long sys_execveat(int dfd, const char __user *filename, > + const char __user *const __user *argv, > + const char __user *const __user *envp, int flags); > + > #endif > diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h > index 11d11bc5c78f..feef07d29663 100644 > --- a/include/uapi/asm-generic/unistd.h > +++ b/include/uapi/asm-generic/unistd.h > @@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp) > __SYSCALL(__NR_getrandom, sys_getrandom) > #define __NR_memfd_create 279 > __SYSCALL(__NR_memfd_create, sys_memfd_create) > +#define __NR_execveat 280 > +__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat) > > #undef __NR_syscalls > -#define __NR_syscalls 280 > +#define __NR_syscalls 281 > > /* > * All syscalls below here should go away really, > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c > index 391d4ddb6f4b..efb06058ad3e 100644 > --- a/kernel/sys_ni.c > +++ b/kernel/sys_ni.c > @@ -218,3 +218,6 @@ cond_syscall(sys_kcmp); > > /* operate on Secure Computing state */ > cond_syscall(sys_seccomp); > + > +/* execveat */ > +cond_syscall(sys_execveat); > diff --git a/lib/audit.c b/lib/audit.c > index 1d726a22565b..b8fb5ee81e26 100644 > --- a/lib/audit.c > +++ b/lib/audit.c > @@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall) > case __NR_socketcall: > return 4; > #endif > +#ifdef __NR_execveat > + case __NR_execveat: > +#endif > case __NR_execve: > return 5; > default: > -- > 2.1.0.rc2.206.gedb03e5 > Thanks for working on this! -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAGXu5jKp7wEKnoeOVGUGCeJ3Mv-kkOm+QpXiu4HdWuvQx99auw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCHv6 1/3] syscalls,x86: implement execveat() system call [not found] ` <CAGXu5jKp7wEKnoeOVGUGCeJ3Mv-kkOm+QpXiu4HdWuvQx99auw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-11-07 13:19 ` David Drysdale 0 siblings, 0 replies; 9+ messages in thread From: David Drysdale @ 2014-11-07 13:19 UTC (permalink / raw) To: Kees Cook Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-arch, Linux API On Thu, Nov 6, 2014 at 6:06 PM, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote: > On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: >> diff --git a/fs/exec.c b/fs/exec.c >> index a2b42a98c743..800d232c17bb 100644 >> --- a/fs/exec.c >> +++ b/fs/exec.c >> @@ -1422,10 +1431,12 @@ static int exec_binprm(struct linux_binprm *bprm) >> /* >> * sys_execve() executes a new program. >> */ >> -static int do_execve_common(struct filename *filename, >> - struct user_arg_ptr argv, >> - struct user_arg_ptr envp) >> +static int do_execveat_common(int fd, struct filename *filename, >> + struct user_arg_ptr argv, >> + struct user_arg_ptr envp, >> + int flags) >> { >> + char *pathbuf = NULL; >> struct linux_binprm *bprm; >> struct file *file; >> struct files_struct *displaced; >> @@ -1466,7 +1477,7 @@ static int do_execve_common(struct filename *filename, >> check_unsafe_exec(bprm); >> current->in_execve = 1; >> >> - file = do_open_exec(filename); >> + file = do_open_execat(fd, filename, flags); >> retval = PTR_ERR(file); >> if (IS_ERR(file)) >> goto out_unmark; >> @@ -1474,7 +1485,30 @@ static int do_execve_common(struct filename *filename, >> sched_exec(); >> >> bprm->file = file; >> - bprm->filename = bprm->interp = filename->name; >> + if (fd == AT_FDCWD || filename->name[0] == '/') { >> + bprm->filename = filename->name; >> + } else { >> + /* "/dev/fd/2147483647/" + filename->name */ >> + int maxlen = 19 + strlen(filename->name); > > This should be 20 + strlen (to include the trailing NULL). However, I > think this whole bit of code could be replaced with kasprintf... > >> + >> + pathbuf = kmalloc(maxlen, GFP_TEMPORARY); >> + if (!pathbuf) { >> + retval = -ENOMEM; >> + goto out_unmark; >> + } >> + if (filename->name[0] == '\0') >> + sprintf(pathbuf, "/dev/fd/%d", fd); >> + else >> + snprintf(pathbuf, maxlen, >> + "/dev/fd/%d/%s", fd, filename->name); > > Maybe something like this? A bit messy, so maybe your original if/else > would be more readable. > > } else { > pathbuf = kasprintf("/dev/fd/%d%s%s", fd, > filename->name[0] == '\0' ? "" : "/", > filename->name[0] == '\0' ? "" > : filename->name); > if (!pathbuf) { > retval = -ENOMEM; > goto out_unmark; > } > } Ah, I didn't know about kasprintf(), that will make things much easier -- thanks! ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCHv6 2/3] syscalls,x86: add selftest for execveat(2) 2014-11-06 16:07 [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call David Drysdale 2014-11-06 16:07 ` [PATCHv6 1/3] syscalls,x86: implement " David Drysdale @ 2014-11-06 16:07 ` David Drysdale 2014-11-06 18:07 ` Kees Cook [not found] ` <1415290033-15771-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2 siblings, 1 reply; 9+ messages in thread From: David Drysdale @ 2014-11-06 16:07 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86, linux-arch, linux-api, David Drysdale Signed-off-by: David Drysdale <drysdale@google.com> --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/exec/.gitignore | 7 + tools/testing/selftests/exec/Makefile | 25 +++ tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++ 4 files changed, 354 insertions(+) create mode 100644 tools/testing/selftests/exec/.gitignore create mode 100644 tools/testing/selftests/exec/Makefile create mode 100644 tools/testing/selftests/exec/execveat.c diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 36ff2e4c7b6f..210cf68b3511 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -14,6 +14,7 @@ TARGETS += powerpc TARGETS += user TARGETS += sysctl TARGETS += firmware +TARGETS += exec TARGETS_HOTPLUG = cpu-hotplug TARGETS_HOTPLUG += memory-hotplug diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore new file mode 100644 index 000000000000..778147d01af9 --- /dev/null +++ b/tools/testing/selftests/exec/.gitignore @@ -0,0 +1,7 @@ +subdir* +script* +execveat +execveat.symlink +execveat.moved +execveat.ephemeral +execveat.denatured \ No newline at end of file diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile new file mode 100644 index 000000000000..c97e0aaea02d --- /dev/null +++ b/tools/testing/selftests/exec/Makefile @@ -0,0 +1,25 @@ +CC = $(CROSS_COMPILE)gcc +CFLAGS = -Wall +BINARIES = execveat +DEPS = execveat.symlink execveat.denatured script subdir +all: $(BINARIES) $(DEPS) + +subdir: + mkdir -p $@ +script: + echo '#!/bin/sh' > $@ + echo 'exit $$*' >> $@ + chmod +x $@ +execveat.symlink: execveat + ln -s -f $< $@ +execveat.denatured: execveat + cp $< $@ + chmod -x $@ +%: %.c + $(CC) $(CFLAGS) -o $@ $^ + +run_tests: all + ./execveat + +clean: + rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c new file mode 100644 index 000000000000..f6ea48176393 --- /dev/null +++ b/tools/testing/selftests/exec/execveat.c @@ -0,0 +1,321 @@ +/* + * Copyright (c) 2014 Google, Inc. + * + * Licensed under the terms of the GNU GPL License version 2 + * + * Selftests for execveat(2). + */ + +#define _GNU_SOURCE /* to get O_PATH, AT_EMPTY_PATH */ +#include <sys/sendfile.h> +#include <sys/stat.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <errno.h> +#include <fcntl.h> +#include <limits.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +static char *envp[] = { "IN_TEST=yes", NULL, NULL }; +static char *argv[] = { "execveat", "99", NULL }; + +static int execveat_(int fd, const char *path, char **argv, char **envp, + int flags) +{ +#ifdef __NR_execveat + return syscall(__NR_execveat, fd, path, argv, envp, flags); +#else + errno = -ENOSYS; + return -1; +#endif +} + +#define check_execveat_fail(fd, path, flags, errno) \ + _check_execveat_fail(fd, path, flags, errno, #errno) +static int _check_execveat_fail(int fd, const char *path, int flags, + int expected_errno, const char *errno_str) +{ + int rc; + + errno = 0; + printf("Check failure of execveat(%d, '%s', %d) with %s... ", + fd, path?:"(null)", flags, errno_str); + rc = execveat_(fd, path, argv, envp, flags); + + if (rc > 0) { + printf("[FAIL] (unexpected success from execveat(2))\n"); + return 1; + } + if (errno != expected_errno) { + printf("[FAIL] (expected errno %d (%s) not %d (%s)\n", + expected_errno, strerror(expected_errno), + errno, strerror(errno)); + return 1; + } + printf("[OK]\n"); + return 0; +} + +static int check_execveat_invoked_rc(int fd, const char *path, int flags, + int expected_rc) +{ + int status; + int rc; + pid_t child; + + printf("Check success of execveat(%d, '%s', %d)... ", + fd, path?:"(null)", flags); + child = fork(); + if (child < 0) { + printf("[FAIL] (fork() failed)\n"); + return 1; + } + if (child == 0) { + /* Child: do execveat(). */ + rc = execveat_(fd, path, argv, envp, flags); + printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n", + rc, errno, strerror(errno)); + exit(1); /* should not reach here */ + } + /* Parent: wait for & check child's exit status. */ + rc = waitpid(child, &status, 0); + if (rc != child) { + printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc); + return 1; + } + if (!WIFEXITED(status)) { + printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n", + child, status); + return 1; + } + if (WEXITSTATUS(status) != expected_rc) { + printf("[FAIL] (child %d exited with %d not %d)\n", + child, WEXITSTATUS(status), expected_rc); + return 1; + } + printf("[OK]\n"); + return 0; +} + +static int check_execveat(int fd, const char *path, int flags) +{ + return check_execveat_invoked_rc(fd, path, flags, 99); +} + +static char *concat(const char *left, const char *right) +{ + char *result = malloc(strlen(left) + strlen(right) + 1); + + strcpy(result, left); + strcat(result, right); + return result; +} + +static int open_or_die(const char *filename, int flags) +{ + int fd = open(filename, flags); + + if (fd < 0) { + printf("Failed to open '%s'; " + "check prerequisites are available\n", filename); + exit(1); + } + return fd; +} + +static int run_tests(void) +{ + int fail = 0; + char *fullname = realpath("execveat", NULL); + char *fullname_script = realpath("script", NULL); + char *fullname_symlink = concat(fullname, ".symlink"); + int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY); + int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral", + O_DIRECTORY|O_RDONLY); + int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY); + int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH); + int dot_dfd_cloexec = open_or_die(".", O_DIRECTORY|O_RDONLY|O_CLOEXEC); + int fd = open_or_die("execveat", O_RDONLY); + int fd_path = open_or_die("execveat", O_RDONLY|O_PATH); + int fd_symlink = open_or_die("execveat.symlink", O_RDONLY); + int fd_denatured = open_or_die("execveat.denatured", O_RDONLY); + int fd_script = open_or_die("script", O_RDONLY); + int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY); + int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY); + int fd_cloexec = open_or_die("execveat", O_RDONLY|O_CLOEXEC); + int fd_script_cloexec = open_or_die("script", O_RDONLY|O_CLOEXEC); + + /* Change file position to confirm it doesn't affect anything */ + lseek(fd, 10, SEEK_SET); + + /* Normal executable file: */ + /* dfd + path */ + fail += check_execveat(subdir_dfd, "../execveat", 0); + fail += check_execveat(dot_dfd, "execveat", 0); + fail += check_execveat(dot_dfd_path, "execveat", 0); + /* absolute path */ + fail += check_execveat(AT_FDCWD, fullname, 0); + /* absolute path with nonsense dfd */ + fail += check_execveat(99, fullname, 0); + /* fd + no path */ + fail += check_execveat(fd, "", AT_EMPTY_PATH); + /* O_CLOEXEC fd + no path */ + fail += check_execveat(fd_cloexec, "", AT_EMPTY_PATH); + + /* Mess with executable file that's already open: */ + /* fd + no path to a file that's been renamed */ + rename("execveat.ephemeral", "execveat.moved"); + fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH); + /* fd + no path to a file that's been deleted */ + unlink("execveat.moved"); /* remove the file now fd open */ + fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH); + + /* Invalid argument failures */ + fail += check_execveat_fail(fd, "", 0, ENOENT); + fail += check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT); + + /* Symlink to executable file: */ + /* dfd + path */ + fail += check_execveat(dot_dfd, "execveat.symlink", 0); + fail += check_execveat(dot_dfd_path, "execveat.symlink", 0); + /* absolute path */ + fail += check_execveat(AT_FDCWD, fullname_symlink, 0); + /* fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */ + fail += check_execveat(fd_symlink, "", AT_EMPTY_PATH); + fail += check_execveat(fd_symlink, "", + AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW); + + /* Symlink fails when AT_SYMLINK_NOFOLLOW set: */ + /* dfd + path */ + fail += check_execveat_fail(dot_dfd, "execveat.symlink", + AT_SYMLINK_NOFOLLOW, ELOOP); + fail += check_execveat_fail(dot_dfd_path, "execveat.symlink", + AT_SYMLINK_NOFOLLOW, ELOOP); + /* absolute path */ + fail += check_execveat_fail(AT_FDCWD, fullname_symlink, + AT_SYMLINK_NOFOLLOW, ELOOP); + + /* Shell script wrapping executable file: */ + /* dfd + path */ + fail += check_execveat(subdir_dfd, "../script", 0); + fail += check_execveat(dot_dfd, "script", 0); + fail += check_execveat(dot_dfd_path, "script", 0); + /* absolute path */ + fail += check_execveat(AT_FDCWD, fullname_script, 0); + /* fd + no path */ + fail += check_execveat(fd_script, "", AT_EMPTY_PATH); + fail += check_execveat(fd_script, "", + AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW); + /* O_CLOEXEC fd fails for a script (as script file inaccessible) */ + fail += check_execveat_fail(fd_script_cloexec, "", AT_EMPTY_PATH, + ENOENT); + fail += check_execveat_fail(dot_dfd_cloexec, "script", 0, ENOENT); + + /* Mess with script file that's already open: */ + /* fd + no path to a file that's been renamed */ + rename("script.ephemeral", "script.moved"); + fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH); + /* fd + no path to a file that's been deleted */ + unlink("script.moved"); /* remove the file while fd open */ + fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH); + + /* Rename a subdirectory in the path: */ + rename("subdir.ephemeral", "subdir.moved"); + fail += check_execveat(subdir_dfd_ephemeral, "../script", 0); + fail += check_execveat(subdir_dfd_ephemeral, "script", 0); + /* Remove the subdir and its contents */ + unlink("subdir.moved/script"); + unlink("subdir.moved"); + /* Shell loads via deleted subdir OK because name starts with .. */ + fail += check_execveat(subdir_dfd_ephemeral, "../script", 0); + fail += check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT); + + /* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */ + fail += check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL); + /* Invalid path => ENOENT */ + fail += check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT); + fail += check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT); + fail += check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT); + /* Attempt to execute directory => EACCES */ + fail += check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES); + /* Attempt to execute non-executable => EACCES */ + fail += check_execveat_fail(dot_dfd, "Makefile", 0, EACCES); + fail += check_execveat_fail(fd_denatured, "", AT_EMPTY_PATH, EACCES); + /* Attempt to execute file opened with O_PATH => EBADF */ + fail += check_execveat_fail(fd_path, "", AT_EMPTY_PATH, EBADF); + /* Attempt to execute nonsense FD => EBADF */ + fail += check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF); + fail += check_execveat_fail(99, "execveat", 0, EBADF); + /* Attempt to execute relative to non-directory => ENOTDIR */ + fail += check_execveat_fail(fd, "execveat", 0, ENOTDIR); + + return fail; +} + +static void exe_cp(const char *src, const char *dest) +{ + int in_fd = open_or_die(src, O_RDONLY); + int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755); + struct stat info; + + fstat(in_fd, &info); + sendfile(out_fd, in_fd, NULL, info.st_size); + close(in_fd); + close(out_fd); +} + +static void prerequisites(void) +{ + int fd; + const char *script = "#!/bin/sh\nexit $*\n"; + + /* Create ephemeral copies of files */ + exe_cp("execveat", "execveat.ephemeral"); + exe_cp("script", "script.ephemeral"); + mkdir("subdir.ephemeral", 0755); + + fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755); + write(fd, script, strlen(script)); + close(fd); +} + +int main(int argc, char **argv) +{ + int ii; + int rc; + const char *verbose = getenv("VERBOSE"); + + if (argc >= 2) { + /* If we are invoked with an argument, don't run tests. */ + const char *in_test = getenv("IN_TEST"); + + if (verbose) { + printf(" invoked with:"); + for (ii = 0; ii < argc; ii++) + printf(" [%d]='%s'", ii, argv[ii]); + printf("\n"); + } + + /* Check expected environment transferred. */ + if (!in_test || strcmp(in_test, "yes") != 0) { + printf("[FAIL] (no IN_TEST=yes in env)\n"); + return 1; + } + + /* Use the final argument as an exit code. */ + rc = atoi(argv[argc - 1]); + fflush(stdout); + } else { + prerequisites(); + if (verbose) + envp[1] = "VERBOSE=1"; + rc = run_tests(); + if (rc > 0) + printf("%d tests failed\n", rc); + } + return rc; +} -- 2.1.0.rc2.206.gedb03e5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCHv6 2/3] syscalls,x86: add selftest for execveat(2) 2014-11-06 16:07 ` [PATCHv6 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale @ 2014-11-06 18:07 ` Kees Cook 0 siblings, 0 replies; 9+ messages in thread From: Kees Cook @ 2014-11-06 18:07 UTC (permalink / raw) To: David Drysdale Cc: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86@kernel.org, linux-arch, Linux API On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <drysdale@google.com> wrote: > Signed-off-by: David Drysdale <drysdale@google.com> > --- > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/exec/.gitignore | 7 + > tools/testing/selftests/exec/Makefile | 25 +++ > tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++ > 4 files changed, 354 insertions(+) > create mode 100644 tools/testing/selftests/exec/.gitignore > create mode 100644 tools/testing/selftests/exec/Makefile > create mode 100644 tools/testing/selftests/exec/execveat.c > > diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile > index 36ff2e4c7b6f..210cf68b3511 100644 > --- a/tools/testing/selftests/Makefile > +++ b/tools/testing/selftests/Makefile > @@ -14,6 +14,7 @@ TARGETS += powerpc > TARGETS += user > TARGETS += sysctl > TARGETS += firmware > +TARGETS += exec > > TARGETS_HOTPLUG = cpu-hotplug > TARGETS_HOTPLUG += memory-hotplug > diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore > new file mode 100644 > index 000000000000..778147d01af9 > --- /dev/null > +++ b/tools/testing/selftests/exec/.gitignore > @@ -0,0 +1,7 @@ > +subdir* > +script* > +execveat > +execveat.symlink > +execveat.moved > +execveat.ephemeral > +execveat.denatured > \ No newline at end of file > diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile > new file mode 100644 > index 000000000000..c97e0aaea02d > --- /dev/null > +++ b/tools/testing/selftests/exec/Makefile > @@ -0,0 +1,25 @@ > +CC = $(CROSS_COMPILE)gcc > +CFLAGS = -Wall > +BINARIES = execveat > +DEPS = execveat.symlink execveat.denatured script subdir > +all: $(BINARIES) $(DEPS) > + > +subdir: > + mkdir -p $@ > +script: > + echo '#!/bin/sh' > $@ > + echo 'exit $$*' >> $@ > + chmod +x $@ > +execveat.symlink: execveat > + ln -s -f $< $@ > +execveat.denatured: execveat > + cp $< $@ > + chmod -x $@ > +%: %.c > + $(CC) $(CFLAGS) -o $@ $^ > + > +run_tests: all > + ./execveat > + > +clean: > + rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved > diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c > new file mode 100644 > index 000000000000..f6ea48176393 > --- /dev/null > +++ b/tools/testing/selftests/exec/execveat.c > @@ -0,0 +1,321 @@ > +/* > + * Copyright (c) 2014 Google, Inc. > + * > + * Licensed under the terms of the GNU GPL License version 2 > + * > + * Selftests for execveat(2). > + */ > + > +#define _GNU_SOURCE /* to get O_PATH, AT_EMPTY_PATH */ > +#include <sys/sendfile.h> > +#include <sys/stat.h> > +#include <sys/syscall.h> > +#include <sys/types.h> > +#include <sys/wait.h> > +#include <errno.h> > +#include <fcntl.h> > +#include <limits.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <string.h> > +#include <unistd.h> > + > +static char *envp[] = { "IN_TEST=yes", NULL, NULL }; > +static char *argv[] = { "execveat", "99", NULL }; > + > +static int execveat_(int fd, const char *path, char **argv, char **envp, > + int flags) > +{ > +#ifdef __NR_execveat > + return syscall(__NR_execveat, fd, path, argv, envp, flags); > +#else > + errno = -ENOSYS; > + return -1; > +#endif > +} > + > +#define check_execveat_fail(fd, path, flags, errno) \ > + _check_execveat_fail(fd, path, flags, errno, #errno) > +static int _check_execveat_fail(int fd, const char *path, int flags, > + int expected_errno, const char *errno_str) > +{ > + int rc; > + > + errno = 0; > + printf("Check failure of execveat(%d, '%s', %d) with %s... ", > + fd, path?:"(null)", flags, errno_str); > + rc = execveat_(fd, path, argv, envp, flags); > + > + if (rc > 0) { > + printf("[FAIL] (unexpected success from execveat(2))\n"); > + return 1; > + } > + if (errno != expected_errno) { > + printf("[FAIL] (expected errno %d (%s) not %d (%s)\n", > + expected_errno, strerror(expected_errno), > + errno, strerror(errno)); > + return 1; > + } > + printf("[OK]\n"); > + return 0; > +} > + > +static int check_execveat_invoked_rc(int fd, const char *path, int flags, > + int expected_rc) > +{ > + int status; > + int rc; > + pid_t child; > + > + printf("Check success of execveat(%d, '%s', %d)... ", > + fd, path?:"(null)", flags); > + child = fork(); > + if (child < 0) { > + printf("[FAIL] (fork() failed)\n"); > + return 1; > + } > + if (child == 0) { > + /* Child: do execveat(). */ > + rc = execveat_(fd, path, argv, envp, flags); > + printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n", > + rc, errno, strerror(errno)); > + exit(1); /* should not reach here */ > + } > + /* Parent: wait for & check child's exit status. */ > + rc = waitpid(child, &status, 0); > + if (rc != child) { > + printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc); > + return 1; > + } > + if (!WIFEXITED(status)) { > + printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n", > + child, status); > + return 1; > + } > + if (WEXITSTATUS(status) != expected_rc) { > + printf("[FAIL] (child %d exited with %d not %d)\n", > + child, WEXITSTATUS(status), expected_rc); > + return 1; > + } > + printf("[OK]\n"); > + return 0; > +} > + > +static int check_execveat(int fd, const char *path, int flags) > +{ > + return check_execveat_invoked_rc(fd, path, flags, 99); > +} > + > +static char *concat(const char *left, const char *right) > +{ > + char *result = malloc(strlen(left) + strlen(right) + 1); > + > + strcpy(result, left); > + strcat(result, right); > + return result; > +} > + > +static int open_or_die(const char *filename, int flags) > +{ > + int fd = open(filename, flags); > + > + if (fd < 0) { > + printf("Failed to open '%s'; " > + "check prerequisites are available\n", filename); > + exit(1); > + } > + return fd; > +} > + > +static int run_tests(void) > +{ > + int fail = 0; > + char *fullname = realpath("execveat", NULL); > + char *fullname_script = realpath("script", NULL); > + char *fullname_symlink = concat(fullname, ".symlink"); > + int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY); > + int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral", > + O_DIRECTORY|O_RDONLY); > + int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY); > + int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH); > + int dot_dfd_cloexec = open_or_die(".", O_DIRECTORY|O_RDONLY|O_CLOEXEC); > + int fd = open_or_die("execveat", O_RDONLY); > + int fd_path = open_or_die("execveat", O_RDONLY|O_PATH); > + int fd_symlink = open_or_die("execveat.symlink", O_RDONLY); > + int fd_denatured = open_or_die("execveat.denatured", O_RDONLY); > + int fd_script = open_or_die("script", O_RDONLY); > + int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY); > + int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY); > + int fd_cloexec = open_or_die("execveat", O_RDONLY|O_CLOEXEC); > + int fd_script_cloexec = open_or_die("script", O_RDONLY|O_CLOEXEC); > + > + /* Change file position to confirm it doesn't affect anything */ > + lseek(fd, 10, SEEK_SET); > + > + /* Normal executable file: */ > + /* dfd + path */ > + fail += check_execveat(subdir_dfd, "../execveat", 0); > + fail += check_execveat(dot_dfd, "execveat", 0); > + fail += check_execveat(dot_dfd_path, "execveat", 0); > + /* absolute path */ > + fail += check_execveat(AT_FDCWD, fullname, 0); > + /* absolute path with nonsense dfd */ > + fail += check_execveat(99, fullname, 0); > + /* fd + no path */ > + fail += check_execveat(fd, "", AT_EMPTY_PATH); > + /* O_CLOEXEC fd + no path */ > + fail += check_execveat(fd_cloexec, "", AT_EMPTY_PATH); > + > + /* Mess with executable file that's already open: */ > + /* fd + no path to a file that's been renamed */ > + rename("execveat.ephemeral", "execveat.moved"); > + fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH); > + /* fd + no path to a file that's been deleted */ > + unlink("execveat.moved"); /* remove the file now fd open */ > + fail += check_execveat(fd_ephemeral, "", AT_EMPTY_PATH); > + > + /* Invalid argument failures */ > + fail += check_execveat_fail(fd, "", 0, ENOENT); > + fail += check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT); > + > + /* Symlink to executable file: */ > + /* dfd + path */ > + fail += check_execveat(dot_dfd, "execveat.symlink", 0); > + fail += check_execveat(dot_dfd_path, "execveat.symlink", 0); > + /* absolute path */ > + fail += check_execveat(AT_FDCWD, fullname_symlink, 0); > + /* fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */ > + fail += check_execveat(fd_symlink, "", AT_EMPTY_PATH); > + fail += check_execveat(fd_symlink, "", > + AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW); > + > + /* Symlink fails when AT_SYMLINK_NOFOLLOW set: */ > + /* dfd + path */ > + fail += check_execveat_fail(dot_dfd, "execveat.symlink", > + AT_SYMLINK_NOFOLLOW, ELOOP); > + fail += check_execveat_fail(dot_dfd_path, "execveat.symlink", > + AT_SYMLINK_NOFOLLOW, ELOOP); > + /* absolute path */ > + fail += check_execveat_fail(AT_FDCWD, fullname_symlink, > + AT_SYMLINK_NOFOLLOW, ELOOP); > + > + /* Shell script wrapping executable file: */ > + /* dfd + path */ > + fail += check_execveat(subdir_dfd, "../script", 0); > + fail += check_execveat(dot_dfd, "script", 0); > + fail += check_execveat(dot_dfd_path, "script", 0); > + /* absolute path */ > + fail += check_execveat(AT_FDCWD, fullname_script, 0); > + /* fd + no path */ > + fail += check_execveat(fd_script, "", AT_EMPTY_PATH); > + fail += check_execveat(fd_script, "", > + AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW); > + /* O_CLOEXEC fd fails for a script (as script file inaccessible) */ > + fail += check_execveat_fail(fd_script_cloexec, "", AT_EMPTY_PATH, > + ENOENT); > + fail += check_execveat_fail(dot_dfd_cloexec, "script", 0, ENOENT); > + > + /* Mess with script file that's already open: */ > + /* fd + no path to a file that's been renamed */ > + rename("script.ephemeral", "script.moved"); > + fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH); > + /* fd + no path to a file that's been deleted */ > + unlink("script.moved"); /* remove the file while fd open */ > + fail += check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH); > + > + /* Rename a subdirectory in the path: */ > + rename("subdir.ephemeral", "subdir.moved"); > + fail += check_execveat(subdir_dfd_ephemeral, "../script", 0); > + fail += check_execveat(subdir_dfd_ephemeral, "script", 0); > + /* Remove the subdir and its contents */ > + unlink("subdir.moved/script"); > + unlink("subdir.moved"); > + /* Shell loads via deleted subdir OK because name starts with .. */ > + fail += check_execveat(subdir_dfd_ephemeral, "../script", 0); > + fail += check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT); > + > + /* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */ > + fail += check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL); > + /* Invalid path => ENOENT */ > + fail += check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT); > + fail += check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT); > + fail += check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT); > + /* Attempt to execute directory => EACCES */ > + fail += check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES); > + /* Attempt to execute non-executable => EACCES */ > + fail += check_execveat_fail(dot_dfd, "Makefile", 0, EACCES); > + fail += check_execveat_fail(fd_denatured, "", AT_EMPTY_PATH, EACCES); > + /* Attempt to execute file opened with O_PATH => EBADF */ > + fail += check_execveat_fail(fd_path, "", AT_EMPTY_PATH, EBADF); > + /* Attempt to execute nonsense FD => EBADF */ > + fail += check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF); > + fail += check_execveat_fail(99, "execveat", 0, EBADF); > + /* Attempt to execute relative to non-directory => ENOTDIR */ > + fail += check_execveat_fail(fd, "execveat", 0, ENOTDIR); > + I'd add some tests that check PATH_MAX with the /dev/fd/n/filename off-by-one I noticed. That could catch any regressions there. -Kees > + return fail; > +} > + > +static void exe_cp(const char *src, const char *dest) > +{ > + int in_fd = open_or_die(src, O_RDONLY); > + int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755); > + struct stat info; > + > + fstat(in_fd, &info); > + sendfile(out_fd, in_fd, NULL, info.st_size); > + close(in_fd); > + close(out_fd); > +} > + > +static void prerequisites(void) > +{ > + int fd; > + const char *script = "#!/bin/sh\nexit $*\n"; > + > + /* Create ephemeral copies of files */ > + exe_cp("execveat", "execveat.ephemeral"); > + exe_cp("script", "script.ephemeral"); > + mkdir("subdir.ephemeral", 0755); > + > + fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755); > + write(fd, script, strlen(script)); > + close(fd); > +} > + > +int main(int argc, char **argv) > +{ > + int ii; > + int rc; > + const char *verbose = getenv("VERBOSE"); > + > + if (argc >= 2) { > + /* If we are invoked with an argument, don't run tests. */ > + const char *in_test = getenv("IN_TEST"); > + > + if (verbose) { > + printf(" invoked with:"); > + for (ii = 0; ii < argc; ii++) > + printf(" [%d]='%s'", ii, argv[ii]); > + printf("\n"); > + } > + > + /* Check expected environment transferred. */ > + if (!in_test || strcmp(in_test, "yes") != 0) { > + printf("[FAIL] (no IN_TEST=yes in env)\n"); > + return 1; > + } > + > + /* Use the final argument as an exit code. */ > + rc = atoi(argv[argc - 1]); > + fflush(stdout); > + } else { > + prerequisites(); > + if (verbose) > + envp[1] = "VERBOSE=1"; > + rc = run_tests(); > + if (rc > 0) > + printf("%d tests failed\n", rc); > + } > + return rc; > +} > -- > 2.1.0.rc2.206.gedb03e5 > -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <1415290033-15771-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* [PATCHv6 man-pages 3/3] execveat.2: initial man page for execveat(2) [not found] ` <1415290033-15771-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> @ 2014-11-06 16:07 ` David Drysdale 2014-11-06 16:55 ` [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call Andy Lutomirski 1 sibling, 0 replies; 9+ messages in thread From: David Drysdale @ 2014-11-06 16:07 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, David Drysdale Signed-off-by: David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> --- man2/execveat.2 | 153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 man2/execveat.2 diff --git a/man2/execveat.2 b/man2/execveat.2 new file mode 100644 index 000000000000..937d79e4c4f0 --- /dev/null +++ b/man2/execveat.2 @@ -0,0 +1,153 @@ +.\" Copyright (c) 2014 Google, Inc. +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual" +.SH NAME +execveat \- execute program relative to a directory file descriptor +.SH SYNOPSIS +.B #include <unistd.h> +.sp +.BI "int execveat(int " fd ", const char *" pathname "," +.br +.BI " char *const " argv "[], char *const " envp "[]," +.br +.BI " int " flags); +.SH DESCRIPTION +The +.BR execveat () +system call executes the program pointed to by the combination of \fIfd\fP and \fIpathname\fP. +The +.BR execveat () +system call operates in exactly the same way as +.BR execve (2), +except for the differences described in this manual page. + +If the pathname given in +.I pathname +is relative, then it is interpreted relative to the directory +referred to by the file descriptor +.I fd +(rather than relative to the current working directory of +the calling process, as is done by +.BR execve (2) +for a relative pathname). + +If +.I pathname +is relative and +.I fd +is the special value +.BR AT_FDCWD , +then +.I pathname +is interpreted relative to the current working +directory of the calling process (like +.BR execve (2)). + +If +.I pathname +is absolute, then +.I fd +is ignored. + +If +.I pathname +is an empty string and the +.BR AT_EMPTY_PATH +flag is specified, then the file descriptor +.I fd +specifies the file to be executed. + +.I flags +can either be 0, or include the following flags: +.TP +.BR AT_EMPTY_PATH +If +.I pathname +is an empty string, operate on the file referred to by +.IR fd +(which may have been obtained using the +.BR open (2) +.B O_PATH +flag). +.TP +.B AT_SYMLINK_NOFOLLOW +If the file identified by +.I fd +and a non-NULL +.I pathname +is a symbolic link, then the call fails with the error +.BR EINVAL . +.SH "RETURN VALUE" +On success, +.BR execveat () +does not return. On error \-1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +The same errors that occur for +.BR execve (2) +can also occur for +.BR execveat (). +The following additional errors can occur for +.BR execveat (): +.TP +.B EBADF +.I fd +is not a valid file descriptor. +.TP +.B ENOENT +The program identified by \fIfd\fP and \fIpathname\fP requires the +use of an interpreter program (such as a script starting with +"#!") but the file descriptor +.I fd +was opened with the +.B O_CLOEXEC +flag and so the program file is inaccessible to the launched interpreter. +.TP +.B EINVAL +Invalid flag specified in +.IR flags . +.TP +.B ENOTDIR +.I pathname +is relative and +.I fd +is a file descriptor referring to a file other than a directory. +.SH VERSIONS +.BR execveat () +was added to Linux in kernel 3.???. +.SH NOTES +In addition to the reasons explained in +.BR openat (2), +the +.BR execveat () +system call is also needed to allow +.BR fexecve (3) +to be implemented on systems that do not have the +.I /proc +filesystem mounted. +.SH SEE ALSO +.BR execve (2), +.BR fexecve (3) -- 2.1.0.rc2.206.gedb03e5 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call [not found] ` <1415290033-15771-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2014-11-06 16:07 ` [PATCHv6 man-pages 3/3] execveat.2: initial man page " David Drysdale @ 2014-11-06 16:55 ` Andy Lutomirski [not found] ` <CALCETrXPVpiPp27Q08K2D=Vt66ZhUZn3GBNEcTvejVUZjy0kyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 9+ messages in thread From: Andy Lutomirski @ 2014-11-06 16:55 UTC (permalink / raw) To: David Drysdale Cc: Eric W. Biederman, Alexander Viro, Meredydd Luff, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, X86 ML, linux-arch, Linux API On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: > Here's another pass at this. Some things to discuss in particular: > > 1) The current approach for interpreted execs (i.e. mostly "#!" scripts) > gives them an argv[1] filename like "/dev/fd/<fd>/<path>". This > means that script execution in a /proc-less system isn't going to > work, at least until interpreters get smart enough to spot and > special-case the leading "/dev/fd/<fd>", or until there's something > to use in place of /dev/fd -> /proc/self/fd (e.g. Al's dupfs > suggestion, https://lkml.org/lkml/2014/10/19/141). > > So is an execveat(2) that (currently) only works for non-interpreted > programs still useful? I think it is. I would make sure to return a distinguishable error code in the event that the failure happens because of one of the unsupported cases. > > 2) I don't like having to add a new LOOKUP_EMPTY_NOPATH flag > just to prevent O_PATH fds from being fexecve()ed -- alternative > suggestions welcomed. (More generally, I don't have a great > feel for what O_PATH is for; how bad would it be to just allow > them to be fexecve()ed?) If you fexecve an O_PATH fd, does it at least check that you have execute permission on the inode? If so, it seems okay to allow it. --Andy > > ......... > > This patch set adds execveat(2) for x86, and is derived from Meredydd > Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). > > The primary aim of adding an execveat syscall is to allow an > implementation of fexecve(3) that does not rely on the /proc > filesystem, at least for executables (rather than scripts). The > current glibc version of fexecve(3) is implemented via /proc, which > causes problems in sandboxed or otherwise restricted environments. > > Given the desire for a /proc-free fexecve() implementation, HPA > suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) > syscall would be an appropriate generalization. > > Also, having a new syscall means that it can take a flags argument > without back-compatibility concerns. The current implementation just > defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other > flags could be added in future -- for example, flags for new namespaces > (as suggested at https://lkml.org/lkml/2006/7/11/474). > > Related history: > - https://lkml.org/lkml/2006/12/27/123 is an example of someone > realizing that fexecve() is likely to fail in a chroot environment. > - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered > documenting the /proc requirement of fexecve(3) in its manpage, to > "prevent other people from wasting their time". > - https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that > it's not possible to fexecve() a file descriptor for a script with > close-on-exec set (which is possible with the implementation here). Confused. How does it work for a close-on-exec script? I understand how it works for a close-on-exec ELF binary. --Andy > - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a > problem where a process that did setuid() could not fexecve() > because it no longer had access to /proc/self/fd; this has since > been fixed. > > > Changes since v5: > - Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts > that invoke an interpreter fail the exec (as they will not be able > to access the invoked file). [Andy Lutomirski] > - Don't truncate long paths. [Andy Lutomirski] > - Commonize code to open the executed file. [Eric W. Biederman] > - Mark O_PATH file descriptors so they cannot be fexecve()ed. > - Make self-test more helpful, and add additional cases: > - file offset non-zero > - binary file without execute bit > - O_CLOEXEC fds > > Changes since v4, suggested by Eric W. Biederman: > - Use empty filename with AT_EMPTY_PATH flag rather than NULL > pathname to request fexecve-like behaviour. > - Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>") > rather than using d_path(). > - Patch against v3.17 (bfe01a5ba249) > > Changes since Meredydd's v3 patch: > - Added a selftest. > - Added a man page. > - Left open_exec() signature untouched to reduce patch impact > elsewhere (as suggested by Al Viro). > - Filled in bprm->filename with d_path() into a buffer, to avoid use > of potentially-ephemeral dentry->d_name. > - Patch against v3.14 (455c6fdbd21916). > > > David Drysdale (2): > syscalls,x86: implement execveat() system call > syscalls,x86: add selftest for execveat(2) > > arch/x86/ia32/audit.c | 1 + > arch/x86/ia32/ia32entry.S | 1 + > arch/x86/kernel/audit_64.c | 1 + > arch/x86/kernel/entry_64.S | 28 +++ > arch/x86/syscalls/syscall_32.tbl | 1 + > arch/x86/syscalls/syscall_64.tbl | 2 + > arch/x86/um/sys_call_table_64.c | 1 + > fs/binfmt_em86.c | 4 + > fs/binfmt_misc.c | 4 + > fs/binfmt_script.c | 10 + > fs/exec.c | 115 ++++++++++-- > fs/namei.c | 8 +- > include/linux/binfmts.h | 4 + > include/linux/compat.h | 3 + > include/linux/fs.h | 1 + > include/linux/namei.h | 1 + > include/linux/sched.h | 4 + > include/linux/syscalls.h | 4 + > include/uapi/asm-generic/unistd.h | 4 +- > kernel/sys_ni.c | 3 + > lib/audit.c | 3 + > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/exec/.gitignore | 7 + > tools/testing/selftests/exec/Makefile | 25 +++ > tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++ > 25 files changed, 542 insertions(+), 15 deletions(-) > create mode 100644 tools/testing/selftests/exec/.gitignore > create mode 100644 tools/testing/selftests/exec/Makefile > create mode 100644 tools/testing/selftests/exec/execveat.c > > -- > 2.1.0.rc2.206.gedb03e5 -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CALCETrXPVpiPp27Q08K2D=Vt66ZhUZn3GBNEcTvejVUZjy0kyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call [not found] ` <CALCETrXPVpiPp27Q08K2D=Vt66ZhUZn3GBNEcTvejVUZjy0kyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-11-07 13:20 ` David Drysdale 0 siblings, 0 replies; 9+ messages in thread From: David Drysdale @ 2014-11-07 13:20 UTC (permalink / raw) To: Andy Lutomirski Cc: Eric W. Biederman, Alexander Viro, Meredydd Luff, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, Rich Felker, Christoph Hellwig, X86 ML, linux-arch, Linux API On Thu, Nov 6, 2014 at 4:55 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: > On Thu, Nov 6, 2014 at 8:07 AM, David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: >> Here's another pass at this. Some things to discuss in particular: >> >> 1) The current approach for interpreted execs (i.e. mostly "#!" scripts) >> gives them an argv[1] filename like "/dev/fd/<fd>/<path>". This >> means that script execution in a /proc-less system isn't going to >> work, at least until interpreters get smart enough to spot and >> special-case the leading "/dev/fd/<fd>", or until there's something >> to use in place of /dev/fd -> /proc/self/fd (e.g. Al's dupfs >> suggestion, https://lkml.org/lkml/2014/10/19/141). >> >> So is an execveat(2) that (currently) only works for non-interpreted >> programs still useful? > > I think it is. I would make sure to return a distinguishable error > code in the event that the failure happens because of one of the > unsupported cases. > >> >> 2) I don't like having to add a new LOOKUP_EMPTY_NOPATH flag >> just to prevent O_PATH fds from being fexecve()ed -- alternative >> suggestions welcomed. (More generally, I don't have a great >> feel for what O_PATH is for; how bad would it be to just allow >> them to be fexecve()ed?) > > If you fexecve an O_PATH fd, does it at least check that you have > execute permission on the inode? If so, it seems okay to allow it. Yes, the same checks will happen for an O_PATH fd as for a normal fd. I'll add an explicit test case for that too. > --Andy > >> >> ......... >> >> This patch set adds execveat(2) for x86, and is derived from Meredydd >> Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). >> >> The primary aim of adding an execveat syscall is to allow an >> implementation of fexecve(3) that does not rely on the /proc >> filesystem, at least for executables (rather than scripts). The >> current glibc version of fexecve(3) is implemented via /proc, which >> causes problems in sandboxed or otherwise restricted environments. >> >> Given the desire for a /proc-free fexecve() implementation, HPA >> suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) >> syscall would be an appropriate generalization. >> >> Also, having a new syscall means that it can take a flags argument >> without back-compatibility concerns. The current implementation just >> defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other >> flags could be added in future -- for example, flags for new namespaces >> (as suggested at https://lkml.org/lkml/2006/7/11/474). >> >> Related history: >> - https://lkml.org/lkml/2006/12/27/123 is an example of someone >> realizing that fexecve() is likely to fail in a chroot environment. >> - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered >> documenting the /proc requirement of fexecve(3) in its manpage, to >> "prevent other people from wasting their time". >> - https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that >> it's not possible to fexecve() a file descriptor for a script with >> close-on-exec set (which is possible with the implementation here). > > Confused. How does it work for a close-on-exec script? I understand > how it works for a close-on-exec ELF binary. My bad, it doesn't -- I forgot to update that part of the commit description, it's left over from the version that used d_path(). > --Andy > >> - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a >> problem where a process that did setuid() could not fexecve() >> because it no longer had access to /proc/self/fd; this has since >> been fixed. >> >> >> Changes since v5: >> - Set new flag in bprm->interp_flags for O_CLOEXEC fds, so that binfmts >> that invoke an interpreter fail the exec (as they will not be able >> to access the invoked file). [Andy Lutomirski] >> - Don't truncate long paths. [Andy Lutomirski] >> - Commonize code to open the executed file. [Eric W. Biederman] >> - Mark O_PATH file descriptors so they cannot be fexecve()ed. >> - Make self-test more helpful, and add additional cases: >> - file offset non-zero >> - binary file without execute bit >> - O_CLOEXEC fds >> >> Changes since v4, suggested by Eric W. Biederman: >> - Use empty filename with AT_EMPTY_PATH flag rather than NULL >> pathname to request fexecve-like behaviour. >> - Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>") >> rather than using d_path(). >> - Patch against v3.17 (bfe01a5ba249) >> >> Changes since Meredydd's v3 patch: >> - Added a selftest. >> - Added a man page. >> - Left open_exec() signature untouched to reduce patch impact >> elsewhere (as suggested by Al Viro). >> - Filled in bprm->filename with d_path() into a buffer, to avoid use >> of potentially-ephemeral dentry->d_name. >> - Patch against v3.14 (455c6fdbd21916). >> >> >> David Drysdale (2): >> syscalls,x86: implement execveat() system call >> syscalls,x86: add selftest for execveat(2) >> >> arch/x86/ia32/audit.c | 1 + >> arch/x86/ia32/ia32entry.S | 1 + >> arch/x86/kernel/audit_64.c | 1 + >> arch/x86/kernel/entry_64.S | 28 +++ >> arch/x86/syscalls/syscall_32.tbl | 1 + >> arch/x86/syscalls/syscall_64.tbl | 2 + >> arch/x86/um/sys_call_table_64.c | 1 + >> fs/binfmt_em86.c | 4 + >> fs/binfmt_misc.c | 4 + >> fs/binfmt_script.c | 10 + >> fs/exec.c | 115 ++++++++++-- >> fs/namei.c | 8 +- >> include/linux/binfmts.h | 4 + >> include/linux/compat.h | 3 + >> include/linux/fs.h | 1 + >> include/linux/namei.h | 1 + >> include/linux/sched.h | 4 + >> include/linux/syscalls.h | 4 + >> include/uapi/asm-generic/unistd.h | 4 +- >> kernel/sys_ni.c | 3 + >> lib/audit.c | 3 + >> tools/testing/selftests/Makefile | 1 + >> tools/testing/selftests/exec/.gitignore | 7 + >> tools/testing/selftests/exec/Makefile | 25 +++ >> tools/testing/selftests/exec/execveat.c | 321 ++++++++++++++++++++++++++++++++ >> 25 files changed, 542 insertions(+), 15 deletions(-) >> create mode 100644 tools/testing/selftests/exec/.gitignore >> create mode 100644 tools/testing/selftests/exec/Makefile >> create mode 100644 tools/testing/selftests/exec/execveat.c >> >> -- >> 2.1.0.rc2.206.gedb03e5 > > > > -- > Andy Lutomirski > AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-11-07 13:20 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-06 16:07 [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call David Drysdale 2014-11-06 16:07 ` [PATCHv6 1/3] syscalls,x86: implement " David Drysdale [not found] ` <1415290033-15771-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2014-11-06 18:06 ` Kees Cook [not found] ` <CAGXu5jKp7wEKnoeOVGUGCeJ3Mv-kkOm+QpXiu4HdWuvQx99auw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-11-07 13:19 ` David Drysdale 2014-11-06 16:07 ` [PATCHv6 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale 2014-11-06 18:07 ` Kees Cook [not found] ` <1415290033-15771-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2014-11-06 16:07 ` [PATCHv6 man-pages 3/3] execveat.2: initial man page " David Drysdale 2014-11-06 16:55 ` [PATCHv6 RFC 0/3] syscalls,x86: Add execveat() system call Andy Lutomirski [not found] ` <CALCETrXPVpiPp27Q08K2D=Vt66ZhUZn3GBNEcTvejVUZjy0kyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-11-07 13:20 ` David Drysdale
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).