* [PATCHv5 0/3] syscalls,x86: Add execveat() system call @ 2014-10-22 11:44 David Drysdale 2014-10-22 11:44 ` [PATCHv5 1/3] syscalls,x86: implement " David Drysdale [not found] ` <1413978269-17274-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 0 siblings, 2 replies; 13+ messages in thread From: David Drysdale @ 2014-10-22 11:44 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, x86, linux-arch, linux-api, David Drysdale This patch set adds execveat(2) for x86, and is derived from Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). The primary aim of adding an execveat syscall is to allow an implementation of fexecve(3) that does not rely on the /proc filesystem. The current glibc version of fexecve(3) is implemented via /proc, which causes problems in sandboxed or otherwise restricted environments. Given the desire for a /proc-free fexecve() implementation, HPA suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be an appropriate generalization. Also, having a new syscall means that it can take a flags argument without back-compatibility concerns. The current implementation just defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be added in future -- for example, flags for new namespaces (as suggested at https://lkml.org/lkml/2006/7/11/474). Related history: - https://lkml.org/lkml/2006/12/27/123 is an example of someone realizing that fexecve() is likely to fail in a chroot environment. - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered documenting the /proc requirement of fexecve(3) in its manpage, to "prevent other people from wasting their time". - https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that it's not possible to fexecve() a file descriptor for a script with close-on-exec set (which is possible with the implementation here). - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a problem where a process that did setuid() could not fexecve() because it no longer had access to /proc/self/fd; this has since been fixed. Changes since v4, suggested by Eric W. Biederman: - Use empty filename with AT_EMPTY_PATH flag rather than NULL pathname to request fexecve-like behaviour. - Build pathname as "/dev/fd/<fd>/<filename>" (or "/dev/fd/<fd>") rather than using d_path(). - Patch against v3.17 (bfe01a5ba249) Changes since Meredydd's v3 patch: - Added a selftest. - Added a man page. - Left open_exec() signature untouched to reduce patch impact elsewhere (as suggested by Al Viro). - Filled in bprm->filename with d_path() into a buffer, to avoid use of potentially-ephemeral dentry->d_name. - Patch against v3.14 (455c6fdbd21916). David Drysdale (2): syscalls,x86: implement execveat() system call syscalls,x86: add selftest for execveat(2) arch/x86/ia32/audit.c | 1 + arch/x86/ia32/ia32entry.S | 1 + arch/x86/kernel/audit_64.c | 1 + arch/x86/kernel/entry_64.S | 28 +++ arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 2 + arch/x86/um/sys_call_table_64.c | 1 + fs/exec.c | 130 ++++++++++++-- fs/namei.c | 2 +- include/linux/compat.h | 3 + include/linux/fs.h | 1 + include/linux/sched.h | 4 + include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 3 + lib/audit.c | 3 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/exec/.gitignore | 6 + tools/testing/selftests/exec/Makefile | 22 +++ tools/testing/selftests/exec/execveat.c | 295 ++++++++++++++++++++++++++++++++ 20 files changed, 497 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/exec/.gitignore create mode 100644 tools/testing/selftests/exec/Makefile create mode 100644 tools/testing/selftests/exec/execveat.c -- 2.1.0.rc2.206.gedb03e5 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCHv5 1/3] syscalls,x86: implement execveat() system call 2014-10-22 11:44 [PATCHv5 0/3] syscalls,x86: Add execveat() system call David Drysdale @ 2014-10-22 11:44 ` David Drysdale 2014-10-22 18:07 ` Eric W. Biederman [not found] ` <1413978269-17274-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> [not found] ` <1413978269-17274-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 1 sibling, 2 replies; 13+ messages in thread From: David Drysdale @ 2014-10-22 11:44 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, x86, linux-arch, linux-api, David Drysdale Add a new system execveat(2) syscall. execveat() is to execve() as openat() is to open(): it takes a file descriptor that refers to a directory, and resolves the filename relative to that. In addition, if the filename is empty and AT_EMPTY_PATH is specified, execveat() executes the file to which the file descriptor refers. This replicates the functionality of fexecve(), which is a system call in other UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and so relies on /proc being mounted). The filename fed to the executed program as argv[0] (or the name of the script fed to a script interpreter) will be of the form "/dev/fd/<fd>" (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively reflecting how the executable was found. This does however mean that execution of a script in a /proc-less environment won't work. Only x86-64, i386 and x32 ABIs are supported in this patch. Based on patches by Meredydd Luff <meredydd@senatehouse.org> Signed-off-by: David Drysdale <drysdale@google.com> --- arch/x86/ia32/audit.c | 1 + arch/x86/ia32/ia32entry.S | 1 + arch/x86/kernel/audit_64.c | 1 + arch/x86/kernel/entry_64.S | 28 ++++++++ arch/x86/syscalls/syscall_32.tbl | 1 + arch/x86/syscalls/syscall_64.tbl | 2 + arch/x86/um/sys_call_table_64.c | 1 + fs/exec.c | 130 ++++++++++++++++++++++++++++++++++---- fs/namei.c | 2 +- include/linux/compat.h | 3 + include/linux/fs.h | 1 + include/linux/sched.h | 4 ++ include/linux/syscalls.h | 4 ++ include/uapi/asm-generic/unistd.h | 4 +- kernel/sys_ni.c | 3 + lib/audit.c | 3 + 16 files changed, 173 insertions(+), 16 deletions(-) diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c index 5d7b381da692..2eccc8932ae6 100644 --- a/arch/x86/ia32/audit.c +++ b/arch/x86/ia32/audit.c @@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall) case __NR_socketcall: return 4; case __NR_execve: + case __NR_execveat: return 5; default: return 1; diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 4299eb05023c..2516c09743e0 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -464,6 +464,7 @@ GLOBAL(\label) PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn PTREGSCALL stub32_sigreturn, sys32_sigreturn PTREGSCALL stub32_execve, compat_sys_execve + PTREGSCALL stub32_execveat, compat_sys_execveat PTREGSCALL stub32_fork, sys_fork PTREGSCALL stub32_vfork, sys_vfork diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c index 06d3e5a14d9d..f3672508b249 100644 --- a/arch/x86/kernel/audit_64.c +++ b/arch/x86/kernel/audit_64.c @@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall) case __NR_openat: return 3; case __NR_execve: + case __NR_execveat: return 5; default: return 0; diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 2fac1343a90b..00c4526e6ffe 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -665,6 +665,20 @@ ENTRY(stub_execve) CFI_ENDPROC END(stub_execve) +ENTRY(stub_execveat) + CFI_STARTPROC + addq $8, %rsp + PARTIAL_FRAME 0 + SAVE_REST + FIXUP_TOP_OF_STACK %r11 + call sys_execveat + RESTORE_TOP_OF_STACK %r11 + movq %rax,RAX(%rsp) + RESTORE_REST + jmp int_ret_from_sys_call + CFI_ENDPROC +END(stub_execveat) + /* * sigreturn is special because it needs to restore all registers on return. * This cannot be done with SYSRET, so use the IRET return path instead. @@ -710,6 +724,20 @@ ENTRY(stub_x32_execve) CFI_ENDPROC END(stub_x32_execve) +ENTRY(stub_x32_execveat) + CFI_STARTPROC + addq $8, %rsp + PARTIAL_FRAME 0 + SAVE_REST + FIXUP_TOP_OF_STACK %r11 + call compat_sys_execveat + RESTORE_TOP_OF_STACK %r11 + movq %rax,RAX(%rsp) + RESTORE_REST + jmp int_ret_from_sys_call + CFI_ENDPROC +END(stub_x32_execveat) + #endif /* diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl index 028b78168d85..2633e3195455 100644 --- a/arch/x86/syscalls/syscall_32.tbl +++ b/arch/x86/syscalls/syscall_32.tbl @@ -363,3 +363,4 @@ 354 i386 seccomp sys_seccomp 355 i386 getrandom sys_getrandom 356 i386 memfd_create sys_memfd_create +357 i386 execveat sys_execveat stub32_execveat diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 35dd922727b9..1af5badd159c 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -327,6 +327,7 @@ 318 common getrandom sys_getrandom 319 common memfd_create sys_memfd_create 320 common kexec_file_load sys_kexec_file_load +321 64 execveat stub_execveat # # x32-specific system call numbers start at 512 to avoid cache impact @@ -365,3 +366,4 @@ 542 x32 getsockopt compat_sys_getsockopt 543 x32 io_setup compat_sys_io_setup 544 x32 io_submit compat_sys_io_submit +545 x32 execveat stub_x32_execveat diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c index f2f0723070ca..20c3649d0691 100644 --- a/arch/x86/um/sys_call_table_64.c +++ b/arch/x86/um/sys_call_table_64.c @@ -31,6 +31,7 @@ #define stub_fork sys_fork #define stub_vfork sys_vfork #define stub_execve sys_execve +#define stub_execveat sys_execveat #define stub_rt_sigreturn sys_rt_sigreturn #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat) diff --git a/fs/exec.c b/fs/exec.c index a2b42a98c743..92a6e14f096a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -747,7 +747,7 @@ EXPORT_SYMBOL(setup_arg_pages); #endif /* CONFIG_MMU */ -static struct file *do_open_exec(struct filename *name) +static struct file *do_open_execat(int fd, struct filename *name, int flags) { struct file *file; int err; @@ -757,10 +757,34 @@ static struct file *do_open_exec(struct filename *name) .intent = LOOKUP_OPEN, .lookup_flags = LOOKUP_FOLLOW, }; + static const struct open_flags open_exec_nofollow_flags = { + .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, + .acc_mode = MAY_EXEC | MAY_OPEN, + .intent = LOOKUP_OPEN, + .lookup_flags = 0, + }; - file = do_filp_open(AT_FDCWD, name, &open_exec_flags); - if (IS_ERR(file)) - goto out; + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) + return ERR_PTR(-EINVAL); + + if (name->name[0] != '\0') { + const struct open_flags *oflags = ((flags & AT_SYMLINK_NOFOLLOW) + ? &open_exec_nofollow_flags + : &open_exec_flags); + + file = do_filp_open(fd, name, oflags); + if (IS_ERR(file)) + goto out; + } else { + file = fget(fd); + if (!file) + return ERR_PTR(-EBADF); + + err = inode_permission(file->f_path.dentry->d_inode, + open_exec_flags.acc_mode); + if (err) + goto exit; + } err = -EACCES; if (!S_ISREG(file_inode(file)->i_mode)) @@ -769,12 +793,13 @@ static struct file *do_open_exec(struct filename *name) if (file->f_path.mnt->mnt_flags & MNT_NOEXEC) goto exit; - fsnotify_open(file); - err = deny_write_access(file); if (err) goto exit; + if (name->name[0] != '\0') + fsnotify_open(file); + out: return file; @@ -786,7 +811,7 @@ exit: struct file *open_exec(const char *name) { struct filename tmp = { .name = name }; - return do_open_exec(&tmp); + return do_open_execat(AT_FDCWD, &tmp, 0); } EXPORT_SYMBOL(open_exec); @@ -1422,10 +1447,12 @@ static int exec_binprm(struct linux_binprm *bprm) /* * sys_execve() executes a new program. */ -static int do_execve_common(struct filename *filename, - struct user_arg_ptr argv, - struct user_arg_ptr envp) +static int do_execveat_common(int fd, struct filename *filename, + struct user_arg_ptr argv, + struct user_arg_ptr envp, + int flags) { + char *pathbuf = NULL; struct linux_binprm *bprm; struct file *file; struct files_struct *displaced; @@ -1466,7 +1493,7 @@ static int do_execve_common(struct filename *filename, check_unsafe_exec(bprm); current->in_execve = 1; - file = do_open_exec(filename); + file = do_open_execat(fd, filename, flags); retval = PTR_ERR(file); if (IS_ERR(file)) goto out_unmark; @@ -1474,7 +1501,27 @@ static int do_execve_common(struct filename *filename, sched_exec(); bprm->file = file; - bprm->filename = bprm->interp = filename->name; + if (fd == AT_FDCWD || filename->name[0] == '/') { + bprm->filename = filename->name; + } else { + /* + * Build a pathname that reflects how we got to the file, + * either "/dev/fd/<fd>" (for an empty filename) or + * "/dev/fd/<fd>/<filename>". + */ + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY); + if (!pathbuf) { + retval = -ENOMEM; + goto out_unmark; + } + bprm->filename = pathbuf; + if (filename->name[0] == '\0') + sprintf(pathbuf, "/dev/fd/%d", fd); + else + snprintf(pathbuf, PATH_MAX, + "/dev/fd/%d/%s", fd, filename->name); + } + bprm->interp = bprm->filename; retval = bprm_mm_init(bprm); if (retval) @@ -1532,6 +1579,7 @@ out_unmark: out_free: free_bprm(bprm); + kfree(pathbuf); out_files: if (displaced) @@ -1547,7 +1595,18 @@ int do_execve(struct filename *filename, { struct user_arg_ptr argv = { .ptr.native = __argv }; struct user_arg_ptr envp = { .ptr.native = __envp }; - return do_execve_common(filename, argv, envp); + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); +} + +int do_execveat(int fd, struct filename *filename, + const char __user *const __user *__argv, + const char __user *const __user *__envp, + int flags) +{ + struct user_arg_ptr argv = { .ptr.native = __argv }; + struct user_arg_ptr envp = { .ptr.native = __envp }; + + return do_execveat_common(fd, filename, argv, envp, flags); } #ifdef CONFIG_COMPAT @@ -1563,7 +1622,23 @@ static int compat_do_execve(struct filename *filename, .is_compat = true, .ptr.compat = __envp, }; - return do_execve_common(filename, argv, envp); + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); +} + +static int compat_do_execveat(int fd, struct filename *filename, + const compat_uptr_t __user *__argv, + const compat_uptr_t __user *__envp, + int flags) +{ + struct user_arg_ptr argv = { + .is_compat = true, + .ptr.compat = __argv, + }; + struct user_arg_ptr envp = { + .is_compat = true, + .ptr.compat = __envp, + }; + return do_execveat_common(fd, filename, argv, envp, flags); } #endif @@ -1603,6 +1678,20 @@ SYSCALL_DEFINE3(execve, { return do_execve(getname(filename), argv, envp); } + +SYSCALL_DEFINE5(execveat, + int, fd, const char __user *, filename, + const char __user *const __user *, argv, + const char __user *const __user *, envp, + int, flags) +{ + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; + + return do_execveat(fd, + getname_flags(filename, lookup_flags, NULL), + argv, envp, flags); +} + #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, const compat_uptr_t __user *, argv, @@ -1610,4 +1699,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, { return compat_do_execve(getname(filename), argv, envp); } + +COMPAT_SYSCALL_DEFINE5(execveat, int, fd, + const char __user *, filename, + const compat_uptr_t __user *, argv, + const compat_uptr_t __user *, envp, + int, flags) +{ + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; + + return compat_do_execveat(fd, + getname_flags(filename, lookup_flags, NULL), + argv, envp, flags); +} #endif diff --git a/fs/namei.c b/fs/namei.c index a7b05bf82d31..553c84d3e0cc 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -130,7 +130,7 @@ void final_putname(struct filename *name) #define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename)) -static struct filename * +struct filename * getname_flags(const char __user *filename, int flags, int *empty) { struct filename *result, *err; diff --git a/include/linux/compat.h b/include/linux/compat.h index e6494261eaff..7450ca2ac1fc 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int); asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv, const compat_uptr_t __user *envp); +asmlinkage long compat_sys_execveat(int dfd, const char __user *filename, + const compat_uptr_t __user *argv, + const compat_uptr_t __user *envp, int flags); asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp, compat_ulong_t __user *outp, compat_ulong_t __user *exp, diff --git a/include/linux/fs.h b/include/linux/fs.h index 94187721ad41..e9818574d738 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *, extern struct file * dentry_open(const struct path *, int, const struct cred *); extern int filp_close(struct file *, fl_owner_t id); +extern struct filename *getname_flags(const char __user *, int, int *); extern struct filename *getname(const char __user *); extern struct filename *getname_kernel(const char *); diff --git a/include/linux/sched.h b/include/linux/sched.h index b867a4dab38a..33e056da7d33 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2430,6 +2430,10 @@ extern void do_group_exit(int); extern int do_execve(struct filename *, const char __user * const __user *, const char __user * const __user *); +extern int do_execveat(int, struct filename *, + const char __user * const __user *, + const char __user * const __user *, + int); extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *); struct task_struct *fork_idle(int); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 0f86d85a9ce4..df5422294deb 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, asmlinkage long sys_getrandom(char __user *buf, size_t count, unsigned int flags); +asmlinkage long sys_execveat(int dfd, const char __user *filename, + const char __user *const __user *argv, + const char __user *const __user *envp, int flags); + #endif diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 11d11bc5c78f..feef07d29663 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp) __SYSCALL(__NR_getrandom, sys_getrandom) #define __NR_memfd_create 279 __SYSCALL(__NR_memfd_create, sys_memfd_create) +#define __NR_execveat 280 +__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat) #undef __NR_syscalls -#define __NR_syscalls 280 +#define __NR_syscalls 281 /* * All syscalls below here should go away really, diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 391d4ddb6f4b..efb06058ad3e 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -218,3 +218,6 @@ cond_syscall(sys_kcmp); /* operate on Secure Computing state */ cond_syscall(sys_seccomp); + +/* execveat */ +cond_syscall(sys_execveat); diff --git a/lib/audit.c b/lib/audit.c index 1d726a22565b..b8fb5ee81e26 100644 --- a/lib/audit.c +++ b/lib/audit.c @@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall) case __NR_socketcall: return 4; #endif +#ifdef __NR_execveat + case __NR_execveat: +#endif case __NR_execve: return 5; default: -- 2.1.0.rc2.206.gedb03e5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCHv5 1/3] syscalls,x86: implement execveat() system call 2014-10-22 11:44 ` [PATCHv5 1/3] syscalls,x86: implement " David Drysdale @ 2014-10-22 18:07 ` Eric W. Biederman 2014-10-23 6:40 ` David Drysdale [not found] ` <1413978269-17274-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 13+ messages in thread From: Eric W. Biederman @ 2014-10-22 18:07 UTC (permalink / raw) To: David Drysdale Cc: Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, x86, linux-arch, linux-api David Drysdale <drysdale@google.com> writes: > Add a new system execveat(2) syscall. execveat() is to execve() as > openat() is to open(): it takes a file descriptor that refers to a > directory, and resolves the filename relative to that. > > In addition, if the filename is empty and AT_EMPTY_PATH is specified, > execveat() executes the file to which the file descriptor refers. This > replicates the functionality of fexecve(), which is a system call in > other UNIXen, but in Linux glibc it depends on opening > "/proc/self/fd/<fd>" (and so relies on /proc being mounted). > > The filename fed to the executed program as argv[0] (or the name of the > script fed to a script interpreter) will be of the form "/dev/fd/<fd>" > (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively > reflecting how the executable was found. This does however mean that > execution of a script in a /proc-less environment won't work. > > Only x86-64, i386 and x32 ABIs are supported in this patch. > > Based on patches by Meredydd Luff <meredydd@senatehouse.org> > > Signed-off-by: David Drysdale <drysdale@google.com> > --- > arch/x86/ia32/audit.c | 1 + > arch/x86/ia32/ia32entry.S | 1 + > arch/x86/kernel/audit_64.c | 1 + > arch/x86/kernel/entry_64.S | 28 ++++++++ > arch/x86/syscalls/syscall_32.tbl | 1 + > arch/x86/syscalls/syscall_64.tbl | 2 + > arch/x86/um/sys_call_table_64.c | 1 + > fs/exec.c | 130 ++++++++++++++++++++++++++++++++++---- > fs/namei.c | 2 +- > include/linux/compat.h | 3 + > include/linux/fs.h | 1 + > include/linux/sched.h | 4 ++ > include/linux/syscalls.h | 4 ++ > include/uapi/asm-generic/unistd.h | 4 +- > kernel/sys_ni.c | 3 + > lib/audit.c | 3 + > 16 files changed, 173 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c > index 5d7b381da692..2eccc8932ae6 100644 > --- a/arch/x86/ia32/audit.c > +++ b/arch/x86/ia32/audit.c > @@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall) > case __NR_socketcall: > return 4; > case __NR_execve: > + case __NR_execveat: > return 5; > default: > return 1; > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S > index 4299eb05023c..2516c09743e0 100644 > --- a/arch/x86/ia32/ia32entry.S > +++ b/arch/x86/ia32/ia32entry.S > @@ -464,6 +464,7 @@ GLOBAL(\label) > PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn > PTREGSCALL stub32_sigreturn, sys32_sigreturn > PTREGSCALL stub32_execve, compat_sys_execve > + PTREGSCALL stub32_execveat, compat_sys_execveat > PTREGSCALL stub32_fork, sys_fork > PTREGSCALL stub32_vfork, sys_vfork > > diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c > index 06d3e5a14d9d..f3672508b249 100644 > --- a/arch/x86/kernel/audit_64.c > +++ b/arch/x86/kernel/audit_64.c > @@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall) > case __NR_openat: > return 3; > case __NR_execve: > + case __NR_execveat: > return 5; > default: > return 0; > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > index 2fac1343a90b..00c4526e6ffe 100644 > --- a/arch/x86/kernel/entry_64.S > +++ b/arch/x86/kernel/entry_64.S > @@ -665,6 +665,20 @@ ENTRY(stub_execve) > CFI_ENDPROC > END(stub_execve) > > +ENTRY(stub_execveat) > + CFI_STARTPROC > + addq $8, %rsp > + PARTIAL_FRAME 0 > + SAVE_REST > + FIXUP_TOP_OF_STACK %r11 > + call sys_execveat > + RESTORE_TOP_OF_STACK %r11 > + movq %rax,RAX(%rsp) > + RESTORE_REST > + jmp int_ret_from_sys_call > + CFI_ENDPROC > +END(stub_execveat) > + > /* > * sigreturn is special because it needs to restore all registers on return. > * This cannot be done with SYSRET, so use the IRET return path instead. > @@ -710,6 +724,20 @@ ENTRY(stub_x32_execve) > CFI_ENDPROC > END(stub_x32_execve) > > +ENTRY(stub_x32_execveat) > + CFI_STARTPROC > + addq $8, %rsp > + PARTIAL_FRAME 0 > + SAVE_REST > + FIXUP_TOP_OF_STACK %r11 > + call compat_sys_execveat > + RESTORE_TOP_OF_STACK %r11 > + movq %rax,RAX(%rsp) > + RESTORE_REST > + jmp int_ret_from_sys_call > + CFI_ENDPROC > +END(stub_x32_execveat) > + > #endif > > /* > diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl > index 028b78168d85..2633e3195455 100644 > --- a/arch/x86/syscalls/syscall_32.tbl > +++ b/arch/x86/syscalls/syscall_32.tbl > @@ -363,3 +363,4 @@ > 354 i386 seccomp sys_seccomp > 355 i386 getrandom sys_getrandom > 356 i386 memfd_create sys_memfd_create > +357 i386 execveat sys_execveat stub32_execveat > diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl > index 35dd922727b9..1af5badd159c 100644 > --- a/arch/x86/syscalls/syscall_64.tbl > +++ b/arch/x86/syscalls/syscall_64.tbl > @@ -327,6 +327,7 @@ > 318 common getrandom sys_getrandom > 319 common memfd_create sys_memfd_create > 320 common kexec_file_load sys_kexec_file_load > +321 64 execveat stub_execveat > > # > # x32-specific system call numbers start at 512 to avoid cache impact > @@ -365,3 +366,4 @@ > 542 x32 getsockopt compat_sys_getsockopt > 543 x32 io_setup compat_sys_io_setup > 544 x32 io_submit compat_sys_io_submit > +545 x32 execveat stub_x32_execveat > diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c > index f2f0723070ca..20c3649d0691 100644 > --- a/arch/x86/um/sys_call_table_64.c > +++ b/arch/x86/um/sys_call_table_64.c > @@ -31,6 +31,7 @@ > #define stub_fork sys_fork > #define stub_vfork sys_vfork > #define stub_execve sys_execve > +#define stub_execveat sys_execveat > #define stub_rt_sigreturn sys_rt_sigreturn > > #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat) > diff --git a/fs/exec.c b/fs/exec.c > index a2b42a98c743..92a6e14f096a 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -747,7 +747,7 @@ EXPORT_SYMBOL(setup_arg_pages); > > #endif /* CONFIG_MMU */ > > -static struct file *do_open_exec(struct filename *name) > +static struct file *do_open_execat(int fd, struct filename *name, int flags) > { > struct file *file; > int err; > @@ -757,10 +757,34 @@ static struct file *do_open_exec(struct filename *name) > .intent = LOOKUP_OPEN, > .lookup_flags = LOOKUP_FOLLOW, > }; > + static const struct open_flags open_exec_nofollow_flags = { > + .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, > + .acc_mode = MAY_EXEC | MAY_OPEN, > + .intent = LOOKUP_OPEN, > + .lookup_flags = 0, > + }; > > - file = do_filp_open(AT_FDCWD, name, &open_exec_flags); > - if (IS_ERR(file)) > - goto out; > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > + return ERR_PTR(-EINVAL); > + > + if (name->name[0] != '\0') { Is it really necessary to special case AT_EMPTY_PATH here. I would have thought the existing logic in namei.c would have been fine assuning we passed LOOKUP_EMPTY. > + const struct open_flags *oflags = ((flags & AT_SYMLINK_NOFOLLOW) > + ? &open_exec_nofollow_flags > + : &open_exec_flags); > + > + file = do_filp_open(fd, name, oflags); > + if (IS_ERR(file)) > + goto out; > + } else { > + file = fget(fd); > + if (!file) > + return ERR_PTR(-EBADF); > + > + err = inode_permission(file->f_path.dentry->d_inode, > + open_exec_flags.acc_mode); > + if (err) > + goto exit; > + } > > err = -EACCES; > if (!S_ISREG(file_inode(file)->i_mode)) > @@ -769,12 +793,13 @@ static struct file *do_open_exec(struct filename *name) > if (file->f_path.mnt->mnt_flags & MNT_NOEXEC) > goto exit; > > - fsnotify_open(file); > - > err = deny_write_access(file); > if (err) > goto exit; > > + if (name->name[0] != '\0') > + fsnotify_open(file); > + > out: > return file; > > @@ -786,7 +811,7 @@ exit: > struct file *open_exec(const char *name) > { > struct filename tmp = { .name = name }; > - return do_open_exec(&tmp); > + return do_open_execat(AT_FDCWD, &tmp, 0); > } > EXPORT_SYMBOL(open_exec); > > @@ -1422,10 +1447,12 @@ static int exec_binprm(struct linux_binprm *bprm) > /* > * sys_execve() executes a new program. > */ > -static int do_execve_common(struct filename *filename, > - struct user_arg_ptr argv, > - struct user_arg_ptr envp) > +static int do_execveat_common(int fd, struct filename *filename, > + struct user_arg_ptr argv, > + struct user_arg_ptr envp, > + int flags) > { > + char *pathbuf = NULL; > struct linux_binprm *bprm; > struct file *file; > struct files_struct *displaced; > @@ -1466,7 +1493,7 @@ static int do_execve_common(struct filename *filename, > check_unsafe_exec(bprm); > current->in_execve = 1; > > - file = do_open_exec(filename); > + file = do_open_execat(fd, filename, flags); > retval = PTR_ERR(file); > if (IS_ERR(file)) > goto out_unmark; > @@ -1474,7 +1501,27 @@ static int do_execve_common(struct filename *filename, > sched_exec(); > > bprm->file = file; > - bprm->filename = bprm->interp = filename->name; > + if (fd == AT_FDCWD || filename->name[0] == '/') { > + bprm->filename = filename->name; > + } else { > + /* > + * Build a pathname that reflects how we got to the file, > + * either "/dev/fd/<fd>" (for an empty filename) or > + * "/dev/fd/<fd>/<filename>". > + */ > + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY); > + if (!pathbuf) { > + retval = -ENOMEM; > + goto out_unmark; > + } > + bprm->filename = pathbuf; > + if (filename->name[0] == '\0') > + sprintf(pathbuf, "/dev/fd/%d", fd); > + else > + snprintf(pathbuf, PATH_MAX, > + "/dev/fd/%d/%s", fd, filename->name); > + } > + bprm->interp = bprm->filename; > > retval = bprm_mm_init(bprm); > if (retval) > @@ -1532,6 +1579,7 @@ out_unmark: > > out_free: > free_bprm(bprm); > + kfree(pathbuf); > > out_files: > if (displaced) > @@ -1547,7 +1595,18 @@ int do_execve(struct filename *filename, > { > struct user_arg_ptr argv = { .ptr.native = __argv }; > struct user_arg_ptr envp = { .ptr.native = __envp }; > - return do_execve_common(filename, argv, envp); > + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); > +} > + > +int do_execveat(int fd, struct filename *filename, > + const char __user *const __user *__argv, > + const char __user *const __user *__envp, > + int flags) > +{ > + struct user_arg_ptr argv = { .ptr.native = __argv }; > + struct user_arg_ptr envp = { .ptr.native = __envp }; > + > + return do_execveat_common(fd, filename, argv, envp, flags); > } > > #ifdef CONFIG_COMPAT > @@ -1563,7 +1622,23 @@ static int compat_do_execve(struct filename *filename, > .is_compat = true, > .ptr.compat = __envp, > }; > - return do_execve_common(filename, argv, envp); > + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); > +} > + > +static int compat_do_execveat(int fd, struct filename *filename, > + const compat_uptr_t __user *__argv, > + const compat_uptr_t __user *__envp, > + int flags) > +{ > + struct user_arg_ptr argv = { > + .is_compat = true, > + .ptr.compat = __argv, > + }; > + struct user_arg_ptr envp = { > + .is_compat = true, > + .ptr.compat = __envp, > + }; > + return do_execveat_common(fd, filename, argv, envp, flags); > } > #endif > > @@ -1603,6 +1678,20 @@ SYSCALL_DEFINE3(execve, > { > return do_execve(getname(filename), argv, envp); > } > + > +SYSCALL_DEFINE5(execveat, > + int, fd, const char __user *, filename, > + const char __user *const __user *, argv, > + const char __user *const __user *, envp, > + int, flags) > +{ > + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; > + > + return do_execveat(fd, > + getname_flags(filename, lookup_flags, NULL), > + argv, envp, flags); > +} > + > #ifdef CONFIG_COMPAT > COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, > const compat_uptr_t __user *, argv, > @@ -1610,4 +1699,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, > { > return compat_do_execve(getname(filename), argv, envp); > } > + > +COMPAT_SYSCALL_DEFINE5(execveat, int, fd, > + const char __user *, filename, > + const compat_uptr_t __user *, argv, > + const compat_uptr_t __user *, envp, > + int, flags) > +{ > + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; > + > + return compat_do_execveat(fd, > + getname_flags(filename, lookup_flags, NULL), > + argv, envp, flags); > +} > #endif > diff --git a/fs/namei.c b/fs/namei.c > index a7b05bf82d31..553c84d3e0cc 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -130,7 +130,7 @@ void final_putname(struct filename *name) > > #define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename)) > > -static struct filename * > +struct filename * > getname_flags(const char __user *filename, int flags, int *empty) > { > struct filename *result, *err; > diff --git a/include/linux/compat.h b/include/linux/compat.h > index e6494261eaff..7450ca2ac1fc 100644 > --- a/include/linux/compat.h > +++ b/include/linux/compat.h > @@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int); > > asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv, > const compat_uptr_t __user *envp); > +asmlinkage long compat_sys_execveat(int dfd, const char __user *filename, > + const compat_uptr_t __user *argv, > + const compat_uptr_t __user *envp, int flags); > > asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp, > compat_ulong_t __user *outp, compat_ulong_t __user *exp, > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 94187721ad41..e9818574d738 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *, > extern struct file * dentry_open(const struct path *, int, const struct cred *); > extern int filp_close(struct file *, fl_owner_t id); > > +extern struct filename *getname_flags(const char __user *, int, int *); > extern struct filename *getname(const char __user *); > extern struct filename *getname_kernel(const char *); > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index b867a4dab38a..33e056da7d33 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -2430,6 +2430,10 @@ extern void do_group_exit(int); > extern int do_execve(struct filename *, > const char __user * const __user *, > const char __user * const __user *); > +extern int do_execveat(int, struct filename *, > + const char __user * const __user *, > + const char __user * const __user *, > + int); > extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *); > struct task_struct *fork_idle(int); > extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index 0f86d85a9ce4..df5422294deb 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, > asmlinkage long sys_getrandom(char __user *buf, size_t count, > unsigned int flags); > > +asmlinkage long sys_execveat(int dfd, const char __user *filename, > + const char __user *const __user *argv, > + const char __user *const __user *envp, int flags); > + > #endif > diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h > index 11d11bc5c78f..feef07d29663 100644 > --- a/include/uapi/asm-generic/unistd.h > +++ b/include/uapi/asm-generic/unistd.h > @@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp) > __SYSCALL(__NR_getrandom, sys_getrandom) > #define __NR_memfd_create 279 > __SYSCALL(__NR_memfd_create, sys_memfd_create) > +#define __NR_execveat 280 > +__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat) > > #undef __NR_syscalls > -#define __NR_syscalls 280 > +#define __NR_syscalls 281 > > /* > * All syscalls below here should go away really, > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c > index 391d4ddb6f4b..efb06058ad3e 100644 > --- a/kernel/sys_ni.c > +++ b/kernel/sys_ni.c > @@ -218,3 +218,6 @@ cond_syscall(sys_kcmp); > > /* operate on Secure Computing state */ > cond_syscall(sys_seccomp); > + > +/* execveat */ > +cond_syscall(sys_execveat); > diff --git a/lib/audit.c b/lib/audit.c > index 1d726a22565b..b8fb5ee81e26 100644 > --- a/lib/audit.c > +++ b/lib/audit.c > @@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall) > case __NR_socketcall: > return 4; > #endif > +#ifdef __NR_execveat > + case __NR_execveat: > +#endif > case __NR_execve: > return 5; > default: ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv5 1/3] syscalls,x86: implement execveat() system call 2014-10-22 18:07 ` Eric W. Biederman @ 2014-10-23 6:40 ` David Drysdale 0 siblings, 0 replies; 13+ messages in thread From: David Drysdale @ 2014-10-23 6:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, X86 ML, linux-arch, Linux API On Wed, Oct 22, 2014 at 7:07 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > David Drysdale <drysdale@google.com> writes: > >> Add a new system execveat(2) syscall. execveat() is to execve() as >> openat() is to open(): it takes a file descriptor that refers to a >> directory, and resolves the filename relative to that. >> >> In addition, if the filename is empty and AT_EMPTY_PATH is specified, >> execveat() executes the file to which the file descriptor refers. This >> replicates the functionality of fexecve(), which is a system call in >> other UNIXen, but in Linux glibc it depends on opening >> "/proc/self/fd/<fd>" (and so relies on /proc being mounted). >> >> The filename fed to the executed program as argv[0] (or the name of the >> script fed to a script interpreter) will be of the form "/dev/fd/<fd>" >> (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively >> reflecting how the executable was found. This does however mean that >> execution of a script in a /proc-less environment won't work. >> >> Only x86-64, i386 and x32 ABIs are supported in this patch. >> >> Based on patches by Meredydd Luff <meredydd@senatehouse.org> >> >> Signed-off-by: David Drysdale <drysdale@google.com> >> --- >> arch/x86/ia32/audit.c | 1 + >> arch/x86/ia32/ia32entry.S | 1 + >> arch/x86/kernel/audit_64.c | 1 + >> arch/x86/kernel/entry_64.S | 28 ++++++++ >> arch/x86/syscalls/syscall_32.tbl | 1 + >> arch/x86/syscalls/syscall_64.tbl | 2 + >> arch/x86/um/sys_call_table_64.c | 1 + >> fs/exec.c | 130 ++++++++++++++++++++++++++++++++++---- >> fs/namei.c | 2 +- >> include/linux/compat.h | 3 + >> include/linux/fs.h | 1 + >> include/linux/sched.h | 4 ++ >> include/linux/syscalls.h | 4 ++ >> include/uapi/asm-generic/unistd.h | 4 +- >> kernel/sys_ni.c | 3 + >> lib/audit.c | 3 + >> 16 files changed, 173 insertions(+), 16 deletions(-) >> >> diff --git a/arch/x86/ia32/audit.c b/arch/x86/ia32/audit.c >> index 5d7b381da692..2eccc8932ae6 100644 >> --- a/arch/x86/ia32/audit.c >> +++ b/arch/x86/ia32/audit.c >> @@ -35,6 +35,7 @@ int ia32_classify_syscall(unsigned syscall) >> case __NR_socketcall: >> return 4; >> case __NR_execve: >> + case __NR_execveat: >> return 5; >> default: >> return 1; >> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S >> index 4299eb05023c..2516c09743e0 100644 >> --- a/arch/x86/ia32/ia32entry.S >> +++ b/arch/x86/ia32/ia32entry.S >> @@ -464,6 +464,7 @@ GLOBAL(\label) >> PTREGSCALL stub32_rt_sigreturn, sys32_rt_sigreturn >> PTREGSCALL stub32_sigreturn, sys32_sigreturn >> PTREGSCALL stub32_execve, compat_sys_execve >> + PTREGSCALL stub32_execveat, compat_sys_execveat >> PTREGSCALL stub32_fork, sys_fork >> PTREGSCALL stub32_vfork, sys_vfork >> >> diff --git a/arch/x86/kernel/audit_64.c b/arch/x86/kernel/audit_64.c >> index 06d3e5a14d9d..f3672508b249 100644 >> --- a/arch/x86/kernel/audit_64.c >> +++ b/arch/x86/kernel/audit_64.c >> @@ -50,6 +50,7 @@ int audit_classify_syscall(int abi, unsigned syscall) >> case __NR_openat: >> return 3; >> case __NR_execve: >> + case __NR_execveat: >> return 5; >> default: >> return 0; >> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S >> index 2fac1343a90b..00c4526e6ffe 100644 >> --- a/arch/x86/kernel/entry_64.S >> +++ b/arch/x86/kernel/entry_64.S >> @@ -665,6 +665,20 @@ ENTRY(stub_execve) >> CFI_ENDPROC >> END(stub_execve) >> >> +ENTRY(stub_execveat) >> + CFI_STARTPROC >> + addq $8, %rsp >> + PARTIAL_FRAME 0 >> + SAVE_REST >> + FIXUP_TOP_OF_STACK %r11 >> + call sys_execveat >> + RESTORE_TOP_OF_STACK %r11 >> + movq %rax,RAX(%rsp) >> + RESTORE_REST >> + jmp int_ret_from_sys_call >> + CFI_ENDPROC >> +END(stub_execveat) >> + >> /* >> * sigreturn is special because it needs to restore all registers on return. >> * This cannot be done with SYSRET, so use the IRET return path instead. >> @@ -710,6 +724,20 @@ ENTRY(stub_x32_execve) >> CFI_ENDPROC >> END(stub_x32_execve) >> >> +ENTRY(stub_x32_execveat) >> + CFI_STARTPROC >> + addq $8, %rsp >> + PARTIAL_FRAME 0 >> + SAVE_REST >> + FIXUP_TOP_OF_STACK %r11 >> + call compat_sys_execveat >> + RESTORE_TOP_OF_STACK %r11 >> + movq %rax,RAX(%rsp) >> + RESTORE_REST >> + jmp int_ret_from_sys_call >> + CFI_ENDPROC >> +END(stub_x32_execveat) >> + >> #endif >> >> /* >> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl >> index 028b78168d85..2633e3195455 100644 >> --- a/arch/x86/syscalls/syscall_32.tbl >> +++ b/arch/x86/syscalls/syscall_32.tbl >> @@ -363,3 +363,4 @@ >> 354 i386 seccomp sys_seccomp >> 355 i386 getrandom sys_getrandom >> 356 i386 memfd_create sys_memfd_create >> +357 i386 execveat sys_execveat stub32_execveat >> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl >> index 35dd922727b9..1af5badd159c 100644 >> --- a/arch/x86/syscalls/syscall_64.tbl >> +++ b/arch/x86/syscalls/syscall_64.tbl >> @@ -327,6 +327,7 @@ >> 318 common getrandom sys_getrandom >> 319 common memfd_create sys_memfd_create >> 320 common kexec_file_load sys_kexec_file_load >> +321 64 execveat stub_execveat >> >> # >> # x32-specific system call numbers start at 512 to avoid cache impact >> @@ -365,3 +366,4 @@ >> 542 x32 getsockopt compat_sys_getsockopt >> 543 x32 io_setup compat_sys_io_setup >> 544 x32 io_submit compat_sys_io_submit >> +545 x32 execveat stub_x32_execveat >> diff --git a/arch/x86/um/sys_call_table_64.c b/arch/x86/um/sys_call_table_64.c >> index f2f0723070ca..20c3649d0691 100644 >> --- a/arch/x86/um/sys_call_table_64.c >> +++ b/arch/x86/um/sys_call_table_64.c >> @@ -31,6 +31,7 @@ >> #define stub_fork sys_fork >> #define stub_vfork sys_vfork >> #define stub_execve sys_execve >> +#define stub_execveat sys_execveat >> #define stub_rt_sigreturn sys_rt_sigreturn >> >> #define __SYSCALL_COMMON(nr, sym, compat) __SYSCALL_64(nr, sym, compat) >> diff --git a/fs/exec.c b/fs/exec.c >> index a2b42a98c743..92a6e14f096a 100644 >> --- a/fs/exec.c >> +++ b/fs/exec.c >> @@ -747,7 +747,7 @@ EXPORT_SYMBOL(setup_arg_pages); >> >> #endif /* CONFIG_MMU */ >> >> -static struct file *do_open_exec(struct filename *name) >> +static struct file *do_open_execat(int fd, struct filename *name, int flags) >> { >> struct file *file; >> int err; >> @@ -757,10 +757,34 @@ static struct file *do_open_exec(struct filename *name) >> .intent = LOOKUP_OPEN, >> .lookup_flags = LOOKUP_FOLLOW, >> }; >> + static const struct open_flags open_exec_nofollow_flags = { >> + .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, >> + .acc_mode = MAY_EXEC | MAY_OPEN, >> + .intent = LOOKUP_OPEN, >> + .lookup_flags = 0, >> + }; >> >> - file = do_filp_open(AT_FDCWD, name, &open_exec_flags); >> - if (IS_ERR(file)) >> - goto out; >> + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) >> + return ERR_PTR(-EINVAL); >> + >> + if (name->name[0] != '\0') { > > Is it really necessary to special case AT_EMPTY_PATH here. I would > have thought the existing logic in namei.c would have been fine > assuning we passed LOOKUP_EMPTY. Just using do_filp_open() throughout looks mostly plausible on a quick experiment, but my initial version appears to make O_PATH fds unexpectedly fexecve()-able (I'm glad I had a test case for that). I'll look for a way around that, hopefully without an explicit special case. >> + const struct open_flags *oflags = ((flags & AT_SYMLINK_NOFOLLOW) >> + ? &open_exec_nofollow_flags >> + : &open_exec_flags); >> + >> + file = do_filp_open(fd, name, oflags); >> + if (IS_ERR(file)) >> + goto out; >> + } else { >> + file = fget(fd); >> + if (!file) >> + return ERR_PTR(-EBADF); >> + >> + err = inode_permission(file->f_path.dentry->d_inode, >> + open_exec_flags.acc_mode); >> + if (err) >> + goto exit; >> + } >> >> err = -EACCES; >> if (!S_ISREG(file_inode(file)->i_mode)) >> @@ -769,12 +793,13 @@ static struct file *do_open_exec(struct filename *name) >> if (file->f_path.mnt->mnt_flags & MNT_NOEXEC) >> goto exit; >> >> - fsnotify_open(file); >> - >> err = deny_write_access(file); >> if (err) >> goto exit; >> >> + if (name->name[0] != '\0') >> + fsnotify_open(file); >> + >> out: >> return file; >> >> @@ -786,7 +811,7 @@ exit: >> struct file *open_exec(const char *name) >> { >> struct filename tmp = { .name = name }; >> - return do_open_exec(&tmp); >> + return do_open_execat(AT_FDCWD, &tmp, 0); >> } >> EXPORT_SYMBOL(open_exec); >> >> @@ -1422,10 +1447,12 @@ static int exec_binprm(struct linux_binprm *bprm) >> /* >> * sys_execve() executes a new program. >> */ >> -static int do_execve_common(struct filename *filename, >> - struct user_arg_ptr argv, >> - struct user_arg_ptr envp) >> +static int do_execveat_common(int fd, struct filename *filename, >> + struct user_arg_ptr argv, >> + struct user_arg_ptr envp, >> + int flags) >> { >> + char *pathbuf = NULL; >> struct linux_binprm *bprm; >> struct file *file; >> struct files_struct *displaced; >> @@ -1466,7 +1493,7 @@ static int do_execve_common(struct filename *filename, >> check_unsafe_exec(bprm); >> current->in_execve = 1; >> >> - file = do_open_exec(filename); >> + file = do_open_execat(fd, filename, flags); >> retval = PTR_ERR(file); >> if (IS_ERR(file)) >> goto out_unmark; >> @@ -1474,7 +1501,27 @@ static int do_execve_common(struct filename *filename, >> sched_exec(); >> >> bprm->file = file; >> - bprm->filename = bprm->interp = filename->name; >> + if (fd == AT_FDCWD || filename->name[0] == '/') { >> + bprm->filename = filename->name; >> + } else { >> + /* >> + * Build a pathname that reflects how we got to the file, >> + * either "/dev/fd/<fd>" (for an empty filename) or >> + * "/dev/fd/<fd>/<filename>". >> + */ >> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY); >> + if (!pathbuf) { >> + retval = -ENOMEM; >> + goto out_unmark; >> + } >> + bprm->filename = pathbuf; >> + if (filename->name[0] == '\0') >> + sprintf(pathbuf, "/dev/fd/%d", fd); >> + else >> + snprintf(pathbuf, PATH_MAX, >> + "/dev/fd/%d/%s", fd, filename->name); >> + } >> + bprm->interp = bprm->filename; >> >> retval = bprm_mm_init(bprm); >> if (retval) >> @@ -1532,6 +1579,7 @@ out_unmark: >> >> out_free: >> free_bprm(bprm); >> + kfree(pathbuf); >> >> out_files: >> if (displaced) >> @@ -1547,7 +1595,18 @@ int do_execve(struct filename *filename, >> { >> struct user_arg_ptr argv = { .ptr.native = __argv }; >> struct user_arg_ptr envp = { .ptr.native = __envp }; >> - return do_execve_common(filename, argv, envp); >> + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); >> +} >> + >> +int do_execveat(int fd, struct filename *filename, >> + const char __user *const __user *__argv, >> + const char __user *const __user *__envp, >> + int flags) >> +{ >> + struct user_arg_ptr argv = { .ptr.native = __argv }; >> + struct user_arg_ptr envp = { .ptr.native = __envp }; >> + >> + return do_execveat_common(fd, filename, argv, envp, flags); >> } >> >> #ifdef CONFIG_COMPAT >> @@ -1563,7 +1622,23 @@ static int compat_do_execve(struct filename *filename, >> .is_compat = true, >> .ptr.compat = __envp, >> }; >> - return do_execve_common(filename, argv, envp); >> + return do_execveat_common(AT_FDCWD, filename, argv, envp, 0); >> +} >> + >> +static int compat_do_execveat(int fd, struct filename *filename, >> + const compat_uptr_t __user *__argv, >> + const compat_uptr_t __user *__envp, >> + int flags) >> +{ >> + struct user_arg_ptr argv = { >> + .is_compat = true, >> + .ptr.compat = __argv, >> + }; >> + struct user_arg_ptr envp = { >> + .is_compat = true, >> + .ptr.compat = __envp, >> + }; >> + return do_execveat_common(fd, filename, argv, envp, flags); >> } >> #endif >> >> @@ -1603,6 +1678,20 @@ SYSCALL_DEFINE3(execve, >> { >> return do_execve(getname(filename), argv, envp); >> } >> + >> +SYSCALL_DEFINE5(execveat, >> + int, fd, const char __user *, filename, >> + const char __user *const __user *, argv, >> + const char __user *const __user *, envp, >> + int, flags) >> +{ >> + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; >> + >> + return do_execveat(fd, >> + getname_flags(filename, lookup_flags, NULL), >> + argv, envp, flags); >> +} >> + >> #ifdef CONFIG_COMPAT >> COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, >> const compat_uptr_t __user *, argv, >> @@ -1610,4 +1699,17 @@ COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename, >> { >> return compat_do_execve(getname(filename), argv, envp); >> } >> + >> +COMPAT_SYSCALL_DEFINE5(execveat, int, fd, >> + const char __user *, filename, >> + const compat_uptr_t __user *, argv, >> + const compat_uptr_t __user *, envp, >> + int, flags) >> +{ >> + int lookup_flags = (flags & AT_EMPTY_PATH) ? LOOKUP_EMPTY : 0; >> + >> + return compat_do_execveat(fd, >> + getname_flags(filename, lookup_flags, NULL), >> + argv, envp, flags); >> +} >> #endif >> diff --git a/fs/namei.c b/fs/namei.c >> index a7b05bf82d31..553c84d3e0cc 100644 >> --- a/fs/namei.c >> +++ b/fs/namei.c >> @@ -130,7 +130,7 @@ void final_putname(struct filename *name) >> >> #define EMBEDDED_NAME_MAX (PATH_MAX - sizeof(struct filename)) >> >> -static struct filename * >> +struct filename * >> getname_flags(const char __user *filename, int flags, int *empty) >> { >> struct filename *result, *err; >> diff --git a/include/linux/compat.h b/include/linux/compat.h >> index e6494261eaff..7450ca2ac1fc 100644 >> --- a/include/linux/compat.h >> +++ b/include/linux/compat.h >> @@ -357,6 +357,9 @@ asmlinkage long compat_sys_lseek(unsigned int, compat_off_t, unsigned int); >> >> asmlinkage long compat_sys_execve(const char __user *filename, const compat_uptr_t __user *argv, >> const compat_uptr_t __user *envp); >> +asmlinkage long compat_sys_execveat(int dfd, const char __user *filename, >> + const compat_uptr_t __user *argv, >> + const compat_uptr_t __user *envp, int flags); >> >> asmlinkage long compat_sys_select(int n, compat_ulong_t __user *inp, >> compat_ulong_t __user *outp, compat_ulong_t __user *exp, >> diff --git a/include/linux/fs.h b/include/linux/fs.h >> index 94187721ad41..e9818574d738 100644 >> --- a/include/linux/fs.h >> +++ b/include/linux/fs.h >> @@ -2060,6 +2060,7 @@ extern struct file *file_open_root(struct dentry *, struct vfsmount *, >> extern struct file * dentry_open(const struct path *, int, const struct cred *); >> extern int filp_close(struct file *, fl_owner_t id); >> >> +extern struct filename *getname_flags(const char __user *, int, int *); >> extern struct filename *getname(const char __user *); >> extern struct filename *getname_kernel(const char *); >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index b867a4dab38a..33e056da7d33 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -2430,6 +2430,10 @@ extern void do_group_exit(int); >> extern int do_execve(struct filename *, >> const char __user * const __user *, >> const char __user * const __user *); >> +extern int do_execveat(int, struct filename *, >> + const char __user * const __user *, >> + const char __user * const __user *, >> + int); >> extern long do_fork(unsigned long, unsigned long, unsigned long, int __user *, int __user *); >> struct task_struct *fork_idle(int); >> extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); >> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h >> index 0f86d85a9ce4..df5422294deb 100644 >> --- a/include/linux/syscalls.h >> +++ b/include/linux/syscalls.h >> @@ -876,4 +876,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, >> asmlinkage long sys_getrandom(char __user *buf, size_t count, >> unsigned int flags); >> >> +asmlinkage long sys_execveat(int dfd, const char __user *filename, >> + const char __user *const __user *argv, >> + const char __user *const __user *envp, int flags); >> + >> #endif >> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h >> index 11d11bc5c78f..feef07d29663 100644 >> --- a/include/uapi/asm-generic/unistd.h >> +++ b/include/uapi/asm-generic/unistd.h >> @@ -705,9 +705,11 @@ __SYSCALL(__NR_seccomp, sys_seccomp) >> __SYSCALL(__NR_getrandom, sys_getrandom) >> #define __NR_memfd_create 279 >> __SYSCALL(__NR_memfd_create, sys_memfd_create) >> +#define __NR_execveat 280 >> +__SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat) >> >> #undef __NR_syscalls >> -#define __NR_syscalls 280 >> +#define __NR_syscalls 281 >> >> /* >> * All syscalls below here should go away really, >> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c >> index 391d4ddb6f4b..efb06058ad3e 100644 >> --- a/kernel/sys_ni.c >> +++ b/kernel/sys_ni.c >> @@ -218,3 +218,6 @@ cond_syscall(sys_kcmp); >> >> /* operate on Secure Computing state */ >> cond_syscall(sys_seccomp); >> + >> +/* execveat */ >> +cond_syscall(sys_execveat); >> diff --git a/lib/audit.c b/lib/audit.c >> index 1d726a22565b..b8fb5ee81e26 100644 >> --- a/lib/audit.c >> +++ b/lib/audit.c >> @@ -54,6 +54,9 @@ int audit_classify_syscall(int abi, unsigned syscall) >> case __NR_socketcall: >> return 4; >> #endif >> +#ifdef __NR_execveat >> + case __NR_execveat: >> +#endif >> case __NR_execve: >> return 5; >> default: ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <1413978269-17274-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCHv5 1/3] syscalls,x86: implement execveat() system call [not found] ` <1413978269-17274-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> @ 2014-10-22 18:44 ` Andy Lutomirski 2014-10-27 18:03 ` David Drysdale 0 siblings, 1 reply; 13+ messages in thread From: Andy Lutomirski @ 2014-10-22 18:44 UTC (permalink / raw) To: David Drysdale Cc: Eric W. Biederman, Alexander Viro, Meredydd Luff, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, X86 ML, linux-arch, Linux API On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: > Add a new system execveat(2) syscall. execveat() is to execve() as > openat() is to open(): it takes a file descriptor that refers to a > directory, and resolves the filename relative to that. > > bprm->file = file; > - bprm->filename = bprm->interp = filename->name; > + if (fd == AT_FDCWD || filename->name[0] == '/') { > + bprm->filename = filename->name; > + } else { > + /* > + * Build a pathname that reflects how we got to the file, > + * either "/dev/fd/<fd>" (for an empty filename) or > + * "/dev/fd/<fd>/<filename>". > + */ > + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY); > + if (!pathbuf) { > + retval = -ENOMEM; > + goto out_unmark; > + } > + bprm->filename = pathbuf; > + if (filename->name[0] == '\0') > + sprintf(pathbuf, "/dev/fd/%d", fd); If the fd is O_CLOEXEC, then this will result in a confused child process. Should we fail exec attempts like that for non-static programs? (E.g. set filename to "" or something and fix up the binfmt drivers to handle that?) > + else > + snprintf(pathbuf, PATH_MAX, > + "/dev/fd/%d/%s", fd, filename->name); Does this need to handle the case where the result exceeds PATH_MAX? --Andy ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv5 1/3] syscalls,x86: implement execveat() system call 2014-10-22 18:44 ` Andy Lutomirski @ 2014-10-27 18:03 ` David Drysdale 2014-10-27 18:47 ` Andy Lutomirski 0 siblings, 1 reply; 13+ messages in thread From: David Drysdale @ 2014-10-27 18:03 UTC (permalink / raw) To: Andy Lutomirski Cc: Eric W. Biederman, Alexander Viro, Meredydd Luff, linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, X86 ML, linux-arch, Linux API On Wed, Oct 22, 2014 at 7:44 PM, Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <drysdale@google.com> wrote: >> Add a new system execveat(2) syscall. execveat() is to execve() as >> openat() is to open(): it takes a file descriptor that refers to a >> directory, and resolves the filename relative to that. >> > >> bprm->file = file; >> - bprm->filename = bprm->interp = filename->name; >> + if (fd == AT_FDCWD || filename->name[0] == '/') { >> + bprm->filename = filename->name; >> + } else { >> + /* >> + * Build a pathname that reflects how we got to the file, >> + * either "/dev/fd/<fd>" (for an empty filename) or >> + * "/dev/fd/<fd>/<filename>". >> + */ >> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY); >> + if (!pathbuf) { >> + retval = -ENOMEM; >> + goto out_unmark; >> + } >> + bprm->filename = pathbuf; >> + if (filename->name[0] == '\0') >> + sprintf(pathbuf, "/dev/fd/%d", fd); > > If the fd is O_CLOEXEC, then this will result in a confused child > process. Should we fail exec attempts like that for non-static > programs? (E.g. set filename to "" or something and fix up the binfmt > drivers to handle that?) Isn't it just scripts that get confused here (as normal executables don't get to see brpm->filename)? Given that we don't know which we have at this point, I'd suggest carrying on regardless. Or we could fall back to use the previous best-effort d_path() code for O_CLOEXEC fds. Thoughts? >> + else >> + snprintf(pathbuf, PATH_MAX, >> + "/dev/fd/%d/%s", fd, filename->name); > > Does this need to handle the case where the result exceeds PATH_MAX? I guess we could kmalloc(strlen(filename->name) + 19) to avoid the possibility of failure, but that just defers the inevitable -- the interpreter won't be able to open the script file anyway. But it would at least then generate the appropriate error (ENAMETOOLONG rather than ENOENT). ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv5 1/3] syscalls,x86: implement execveat() system call 2014-10-27 18:03 ` David Drysdale @ 2014-10-27 18:47 ` Andy Lutomirski 2014-10-28 17:30 ` David Drysdale 0 siblings, 1 reply; 13+ messages in thread From: Andy Lutomirski @ 2014-10-27 18:47 UTC (permalink / raw) To: David Drysdale Cc: Eric W. Biederman, Alexander Viro, Meredydd Luff, linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, X86 ML, linux-arch, Linux API On Mon, Oct 27, 2014 at 11:03 AM, David Drysdale <drysdale@google.com> wrote: > On Wed, Oct 22, 2014 at 7:44 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <drysdale@google.com> wrote: >>> Add a new system execveat(2) syscall. execveat() is to execve() as >>> openat() is to open(): it takes a file descriptor that refers to a >>> directory, and resolves the filename relative to that. >>> >> >>> bprm->file = file; >>> - bprm->filename = bprm->interp = filename->name; >>> + if (fd == AT_FDCWD || filename->name[0] == '/') { >>> + bprm->filename = filename->name; >>> + } else { >>> + /* >>> + * Build a pathname that reflects how we got to the file, >>> + * either "/dev/fd/<fd>" (for an empty filename) or >>> + * "/dev/fd/<fd>/<filename>". >>> + */ >>> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY); >>> + if (!pathbuf) { >>> + retval = -ENOMEM; >>> + goto out_unmark; >>> + } >>> + bprm->filename = pathbuf; >>> + if (filename->name[0] == '\0') >>> + sprintf(pathbuf, "/dev/fd/%d", fd); >> >> If the fd is O_CLOEXEC, then this will result in a confused child >> process. Should we fail exec attempts like that for non-static >> programs? (E.g. set filename to "" or something and fix up the binfmt >> drivers to handle that?) > > Isn't it just scripts that get confused here (as normal executables don't > get to see brpm->filename)? > > Given that we don't know which we have at this point, I'd suggest > carrying on regardless. Or we could fall back to use the previous > best-effort d_path() code for O_CLOEXEC fds. Thoughts? How hard would it be to mark the bprm as not having a path for the binary? Then we could fail later on if and when we actually need the path. I don't really have a strong opinion here, though. I do prefer actually failing the execveat call over succeeding but invoking a script interpreter than can't possibly work. > >>> + else >>> + snprintf(pathbuf, PATH_MAX, >>> + "/dev/fd/%d/%s", fd, filename->name); >> >> Does this need to handle the case where the result exceeds PATH_MAX? > > I guess we could kmalloc(strlen(filename->name) + 19) to avoid the > possibility of failure, but that just defers the inevitable -- the interpreter > won't be able to open the script file anyway. But it would at least then > generate the appropriate error (ENAMETOOLONG rather than ENOENT). Depends whether anyone cares about bprm->filename. But I think the code should either return an error or allocate enough space. -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv5 1/3] syscalls,x86: implement execveat() system call 2014-10-27 18:47 ` Andy Lutomirski @ 2014-10-28 17:30 ` David Drysdale 2014-10-28 17:48 ` Andy Lutomirski 0 siblings, 1 reply; 13+ messages in thread From: David Drysdale @ 2014-10-28 17:30 UTC (permalink / raw) To: Andy Lutomirski Cc: Eric W. Biederman, Alexander Viro, Meredydd Luff, linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, X86 ML, linux-arch, Linux API [Oops, re-send remembering to turn on plaintext mode -- sorry] On Mon, Oct 27, 2014 at 6:47 PM, Andy Lutomirski <luto@amacapital.net> wrote: > On Mon, Oct 27, 2014 at 11:03 AM, David Drysdale <drysdale@google.com> wrote: >> On Wed, Oct 22, 2014 at 7:44 PM, Andy Lutomirski <luto@amacapital.net> wrote: >>> On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <drysdale@google.com> wrote: >>>> Add a new system execveat(2) syscall. execveat() is to execve() as >>>> openat() is to open(): it takes a file descriptor that refers to a >>>> directory, and resolves the filename relative to that. >>>> >>> >>>> bprm->file = file; >>>> - bprm->filename = bprm->interp = filename->name; >>>> + if (fd == AT_FDCWD || filename->name[0] == '/') { >>>> + bprm->filename = filename->name; >>>> + } else { >>>> + /* >>>> + * Build a pathname that reflects how we got to the file, >>>> + * either "/dev/fd/<fd>" (for an empty filename) or >>>> + * "/dev/fd/<fd>/<filename>". >>>> + */ >>>> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY); >>>> + if (!pathbuf) { >>>> + retval = -ENOMEM; >>>> + goto out_unmark; >>>> + } >>>> + bprm->filename = pathbuf; >>>> + if (filename->name[0] == '\0') >>>> + sprintf(pathbuf, "/dev/fd/%d", fd); >>> >>> If the fd is O_CLOEXEC, then this will result in a confused child >>> process. Should we fail exec attempts like that for non-static >>> programs? (E.g. set filename to "" or something and fix up the binfmt >>> drivers to handle that?) >> >> Isn't it just scripts that get confused here (as normal executables don't >> get to see brpm->filename)? >> >> Given that we don't know which we have at this point, I'd suggest >> carrying on regardless. Or we could fall back to use the previous >> best-effort d_path() code for O_CLOEXEC fds. Thoughts? > > How hard would it be to mark the bprm as not having a path for the > binary? Then we could fail later on if and when we actually need the > path. Adding a flag to bprm->interp_flags to indicate that the bprm->filename will be inaccessible after exec is straightforward. But I'm not sure who should/could make use of the flag... > I don't really have a strong opinion here, though. I do prefer > actually failing the execveat call over succeeding but invoking a > script interpreter than can't possibly work. Yeah, but that involves the kernel code (e.g. fs/binfmt_script.c) making an assumption about what the interpreter is going to do -- specifically that it's going to try to open its argv[1]. Admittedly, that's a very likely assumption, but I'm not sure it's one the kernel should make -- a script like "#!/bin/echo" wouldn't be very useful, but fexecve()ing it would still work OK on a name like "/dev/fd/7" after fd 7 is closed. (Also, we need some kind of non-empty name in bprm->filename, even if it's going to be inaccessible later, so that any LSM processing off of the bprm_set_creds()/bprm_check_security() hooks has something to work with; those hooks are pre-exec so the "/dev/fd/<fd>" part should still be OK at that point.) So I guess I lean towards keeping "/dev/fd/<fd>/<path>" regardless. >> >>>> + else >>>> + snprintf(pathbuf, PATH_MAX, >>>> + "/dev/fd/%d/%s", fd, filename->name); >>> >>> Does this need to handle the case where the result exceeds PATH_MAX? >> >> I guess we could kmalloc(strlen(filename->name) + 19) to avoid the >> possibility of failure, but that just defers the inevitable -- the interpreter >> won't be able to open the script file anyway. But it would at least then >> generate the appropriate error (ENAMETOOLONG rather than ENOENT). > > Depends whether anyone cares about bprm->filename. But I think the > code should either return an error or allocate enough space. I'll allocate enough space. > > -- > Andy Lutomirski > AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCHv5 1/3] syscalls,x86: implement execveat() system call 2014-10-28 17:30 ` David Drysdale @ 2014-10-28 17:48 ` Andy Lutomirski 0 siblings, 0 replies; 13+ messages in thread From: Andy Lutomirski @ 2014-10-28 17:48 UTC (permalink / raw) To: David Drysdale Cc: Eric W. Biederman, Alexander Viro, Meredydd Luff, linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, X86 ML, linux-arch, Linux API On Tue, Oct 28, 2014 at 10:30 AM, David Drysdale <drysdale@google.com> wrote: > [Oops, re-send remembering to turn on plaintext mode -- sorry] > > On Mon, Oct 27, 2014 at 6:47 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> On Mon, Oct 27, 2014 at 11:03 AM, David Drysdale <drysdale@google.com> wrote: >>> On Wed, Oct 22, 2014 at 7:44 PM, Andy Lutomirski <luto@amacapital.net> wrote: >>>> On Wed, Oct 22, 2014 at 4:44 AM, David Drysdale <drysdale@google.com> wrote: >>>>> Add a new system execveat(2) syscall. execveat() is to execve() as >>>>> openat() is to open(): it takes a file descriptor that refers to a >>>>> directory, and resolves the filename relative to that. >>>>> >>>> >>>>> bprm->file = file; >>>>> - bprm->filename = bprm->interp = filename->name; >>>>> + if (fd == AT_FDCWD || filename->name[0] == '/') { >>>>> + bprm->filename = filename->name; >>>>> + } else { >>>>> + /* >>>>> + * Build a pathname that reflects how we got to the file, >>>>> + * either "/dev/fd/<fd>" (for an empty filename) or >>>>> + * "/dev/fd/<fd>/<filename>". >>>>> + */ >>>>> + pathbuf = kmalloc(PATH_MAX, GFP_TEMPORARY); >>>>> + if (!pathbuf) { >>>>> + retval = -ENOMEM; >>>>> + goto out_unmark; >>>>> + } >>>>> + bprm->filename = pathbuf; >>>>> + if (filename->name[0] == '\0') >>>>> + sprintf(pathbuf, "/dev/fd/%d", fd); >>>> >>>> If the fd is O_CLOEXEC, then this will result in a confused child >>>> process. Should we fail exec attempts like that for non-static >>>> programs? (E.g. set filename to "" or something and fix up the binfmt >>>> drivers to handle that?) >>> >>> Isn't it just scripts that get confused here (as normal executables don't >>> get to see brpm->filename)? >>> >>> Given that we don't know which we have at this point, I'd suggest >>> carrying on regardless. Or we could fall back to use the previous >>> best-effort d_path() code for O_CLOEXEC fds. Thoughts? >> >> How hard would it be to mark the bprm as not having a path for the >> binary? Then we could fail later on if and when we actually need the >> path. > > Adding a flag to bprm->interp_flags to indicate that the bprm->filename > will be inaccessible after exec is straightforward. But I'm not sure who > should/could make use of the flag... > >> I don't really have a strong opinion here, though. I do prefer >> actually failing the execveat call over succeeding but invoking a >> script interpreter than can't possibly work. > > Yeah, but that involves the kernel code (e.g. fs/binfmt_script.c) making > an assumption about what the interpreter is going to do -- specifically > that it's going to try to open its argv[1]. Admittedly, that's a very likely > assumption, but I'm not sure it's one the kernel should make -- a script > like "#!/bin/echo" wouldn't be very useful, but fexecve()ing it would still > work OK on a name like "/dev/fd/7" after fd 7 is closed. Hmm. I'm unconvinced. If an important part of executing the script is passing it an argv[0] that can be opened, then I think we shouldn't allow a known-bad argv[0]. > > (Also, we need some kind of non-empty name in bprm->filename, > even if it's going to be inaccessible later, so that any LSM processing > off of the bprm_set_creds()/bprm_check_security() hooks has something > to work with; those hooks are pre-exec so the "/dev/fd/<fd>" part should > still be OK at that point.) > > So I guess I lean towards keeping "/dev/fd/<fd>/<path>" regardless. > >>> >>>>> + else >>>>> + snprintf(pathbuf, PATH_MAX, >>>>> + "/dev/fd/%d/%s", fd, filename->name); >>>> >>>> Does this need to handle the case where the result exceeds PATH_MAX? >>> >>> I guess we could kmalloc(strlen(filename->name) + 19) to avoid the >>> possibility of failure, but that just defers the inevitable -- the interpreter >>> won't be able to open the script file anyway. But it would at least then >>> generate the appropriate error (ENAMETOOLONG rather than ENOENT). >> >> Depends whether anyone cares about bprm->filename. But I think the >> code should either return an error or allocate enough space. > > I'll allocate enough space. > >> >> -- >> Andy Lutomirski >> AMA Capital Management, LLC -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <1413978269-17274-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>]
* [PATCHv5 2/3] syscalls,x86: add selftest for execveat(2) [not found] ` <1413978269-17274-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> @ 2014-10-22 11:44 ` David Drysdale 2014-10-22 11:44 ` [PATCHv5 man-pages 3/3] execveat.2: initial man page " David Drysdale 1 sibling, 0 replies; 13+ messages in thread From: David Drysdale @ 2014-10-22 11:44 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, David Drysdale Signed-off-by: David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/exec/.gitignore | 6 + tools/testing/selftests/exec/Makefile | 22 +++ tools/testing/selftests/exec/execveat.c | 295 ++++++++++++++++++++++++++++++++ 4 files changed, 324 insertions(+) create mode 100644 tools/testing/selftests/exec/.gitignore create mode 100644 tools/testing/selftests/exec/Makefile create mode 100644 tools/testing/selftests/exec/execveat.c diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 36ff2e4c7b6f..210cf68b3511 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -14,6 +14,7 @@ TARGETS += powerpc TARGETS += user TARGETS += sysctl TARGETS += firmware +TARGETS += exec TARGETS_HOTPLUG = cpu-hotplug TARGETS_HOTPLUG += memory-hotplug diff --git a/tools/testing/selftests/exec/.gitignore b/tools/testing/selftests/exec/.gitignore new file mode 100644 index 000000000000..349a786899a7 --- /dev/null +++ b/tools/testing/selftests/exec/.gitignore @@ -0,0 +1,6 @@ +subdir* +script* +execveat +execveat.symlink +execveat.moved +execveat.ephemeral diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile new file mode 100644 index 000000000000..428cb1d2a06a --- /dev/null +++ b/tools/testing/selftests/exec/Makefile @@ -0,0 +1,22 @@ +CC = $(CROSS_COMPILE)gcc +CFLAGS = -Wall +BINARIES = execveat +DEPS = execveat.symlink script subdir +all: $(BINARIES) $(DEPS) + +subdir: + mkdir -p $@ +script: + echo '#!/bin/sh' > $@ + echo 'exit $$*' >> $@ + chmod +x $@ +execveat.symlink: execveat + ln -s -f $< $@ +%: %.c + $(CC) $(CFLAGS) -o $@ $^ + +run_tests: all + ./execveat + +clean: + rm -rf $(BINARIES) $(DEPS) subdir.moved execveat.moved diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c new file mode 100644 index 000000000000..1f2722ed3847 --- /dev/null +++ b/tools/testing/selftests/exec/execveat.c @@ -0,0 +1,295 @@ +/* + * Copyright (c) 2014 Google, Inc. + * + * Licensed under the terms of the GNU GPL License version 2 + * + * Selftests for execveat(2). + */ + +#define _GNU_SOURCE /* to get O_PATH, AT_EMPTY_PATH */ +#include <sys/sendfile.h> +#include <sys/stat.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <errno.h> +#include <fcntl.h> +#include <limits.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +static char *envp[] = { "IN_TEST=yes", NULL }; +static char *argv[] = { "execveat", "99", NULL }; + +static int execveat_(int fd, const char *path, char **argv, char **envp, + int flags) +{ +#ifdef __NR_execveat + return syscall(__NR_execveat, fd, path, argv, envp, flags); +#else + errno = -ENOSYS; + return -1; +#endif +} + +#define check_execveat_fail(fd, path, flags, errno) \ + _check_execveat_fail(fd, path, flags, errno, #errno) +static int _check_execveat_fail(int fd, const char *path, int flags, + int expected_errno, const char *errno_str) +{ + int rc; + + errno = 0; + printf("Check failure of execveat(%d, '%s', %d) with %s... ", + fd, path?:"(null)", flags, errno_str); + rc = execveat_(fd, path, argv, envp, flags); + + if (rc > 0) { + printf("[FAIL] (unexpected success from execveat(2))\n"); + return 1; + } + if (errno != expected_errno) { + printf("[FAIL] (expected errno %d (%s) not %d (%s)\n", + expected_errno, strerror(expected_errno), + errno, strerror(errno)); + return 1; + } + printf("[OK]\n"); + return 0; +} + +static int check_execveat_invoked_rc(int fd, const char *path, int flags, + int expected_rc) +{ + int status; + int rc; + pid_t child; + + printf("Check success of execveat(%d, '%s', %d)... ", + fd, path?:"(null)", flags); + child = fork(); + if (child < 0) { + printf("[FAIL] (fork() failed)\n"); + return 1; + } + if (child == 0) { + /* Child: do execveat(). */ + rc = execveat_(fd, path, argv, envp, flags); + printf("[FAIL]: execveat() failed, rc=%d errno=%d (%s)\n", + rc, errno, strerror(errno)); + exit(1); /* should not reach here */ + } + /* Parent: wait for & check child's exit status. */ + rc = waitpid(child, &status, 0); + if (rc != child) { + printf("[FAIL] (waitpid(%d,...) returned %d)\n", child, rc); + return 1; + } + if (!WIFEXITED(status)) { + printf("[FAIL] (child %d did not exit cleanly, status=%08x)\n", + child, status); + return 1; + } + if (WEXITSTATUS(status) != expected_rc) { + printf("[FAIL] (child %d exited with %d not %d)\n", + child, WEXITSTATUS(status), expected_rc); + return 1; + } + printf("[OK]\n"); + return 0; +} + +static int check_execveat(int fd, const char *path, int flags) +{ + return check_execveat_invoked_rc(fd, path, flags, 99); +} + +static char *concat(const char *left, const char *right) +{ + char *result = malloc(strlen(left) + strlen(right) + 1); + + strcpy(result, left); + strcat(result, right); + return result; +} + +static int open_or_die(const char *filename, int flags) +{ + int fd = open(filename, flags); + + if (fd < 0) { + printf("Failed to open '%s'; " + "check prerequisites are available\n", filename); + exit(1); + } + return fd; +} + +static int run_tests(void) +{ + int fail = 0; + char *fullname = realpath("execveat", NULL); + char *fullname_script = realpath("script", NULL); + char *fullname_symlink = concat(fullname, ".symlink"); + int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY); + int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral", + O_DIRECTORY|O_RDONLY); + int dot_dfd = open_or_die(".", O_DIRECTORY|O_RDONLY); + int dot_dfd_path = open_or_die(".", O_DIRECTORY|O_RDONLY|O_PATH); + int fd = open_or_die("execveat", O_RDONLY); + int fd_path = open_or_die("execveat", O_RDONLY|O_PATH); + int fd_symlink = open_or_die("execveat.symlink", O_RDONLY); + int fd_script = open_or_die("script", O_RDONLY); + int fd_ephemeral = open_or_die("execveat.ephemeral", O_RDONLY); + int fd_script_ephemeral = open_or_die("script.ephemeral", O_RDONLY); + + /* Normal executable file: */ + /* dfd + path */ + fail |= check_execveat(subdir_dfd, "../execveat", 0); + fail |= check_execveat(dot_dfd, "execveat", 0); + fail |= check_execveat(dot_dfd_path, "execveat", 0); + /* absolute path */ + fail |= check_execveat(AT_FDCWD, fullname, 0); + /* absolute path with nonsense dfd */ + fail |= check_execveat(99, fullname, 0); + /* fd + no path */ + fail |= check_execveat(fd, "", AT_EMPTY_PATH); + + /* Mess with executable file that's already open: */ + /* fd + no path to a file that's been renamed */ + rename("execveat.ephemeral", "execveat.moved"); + fail |= check_execveat(fd_ephemeral, "", AT_EMPTY_PATH); + /* fd + no path to a file that's been deleted */ + unlink("execveat.moved"); /* remove the file now fd open */ + fail |= check_execveat(fd_ephemeral, "", AT_EMPTY_PATH); + + /* Invalid argument failures */ + fail |= check_execveat_fail(fd, "", 0, ENOENT); + fail |= check_execveat_fail(fd, NULL, AT_EMPTY_PATH, EFAULT); + + /* Symlink to executable file: */ + /* dfd + path */ + fail |= check_execveat(dot_dfd, "execveat.symlink", 0); + fail |= check_execveat(dot_dfd_path, "execveat.symlink", 0); + /* absolute path */ + fail |= check_execveat(AT_FDCWD, fullname_symlink, 0); + /* fd + no path, even with AT_SYMLINK_NOFOLLOW (already followed) */ + fail |= check_execveat(fd_symlink, "", AT_EMPTY_PATH); + fail |= check_execveat(fd_symlink, "", + AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW); + + /* Symlink fails when AT_SYMLINK_NOFOLLOW set: */ + /* dfd + path */ + fail |= check_execveat_fail(dot_dfd, "execveat.symlink", + AT_SYMLINK_NOFOLLOW, ELOOP); + fail |= check_execveat_fail(dot_dfd_path, "execveat.symlink", + AT_SYMLINK_NOFOLLOW, ELOOP); + /* absolute path */ + fail |= check_execveat_fail(AT_FDCWD, fullname_symlink, + AT_SYMLINK_NOFOLLOW, ELOOP); + + /* Shell script wrapping executable file: */ + /* dfd + path */ + fail |= check_execveat(subdir_dfd, "../script", 0); + fail |= check_execveat(dot_dfd, "script", 0); + fail |= check_execveat(dot_dfd_path, "script", 0); + /* absolute path */ + fail |= check_execveat(AT_FDCWD, fullname_script, 0); + /* fd + no path */ + fail |= check_execveat(fd_script, "", AT_EMPTY_PATH); + fail |= check_execveat(fd_script, "", + AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW); + + /* Mess with script file that's already open: */ + /* fd + no path to a file that's been renamed */ + rename("script.ephemeral", "script.moved"); + fail |= check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH); + /* fd + no path to a file that's been deleted */ + unlink("script.moved"); /* remove the file while fd open */ + fail |= check_execveat(fd_script_ephemeral, "", AT_EMPTY_PATH); + + /* Rename a subdirectory in the path: */ + rename("subdir.ephemeral", "subdir.moved"); + fail |= check_execveat(subdir_dfd_ephemeral, "../script", 0); + fail |= check_execveat(subdir_dfd_ephemeral, "script", 0); + /* Remove the subdir and its contents */ + unlink("subdir.moved/script"); + unlink("subdir.moved"); + /* Shell loads via deleted subdir OK because name starts with .. */ + fail |= check_execveat(subdir_dfd_ephemeral, "../script", 0); + fail |= check_execveat_fail(subdir_dfd_ephemeral, "script", 0, ENOENT); + + /* Flag values other than AT_SYMLINK_NOFOLLOW => EINVAL */ + fail |= check_execveat_fail(dot_dfd, "execveat", 0xFFFF, EINVAL); + /* Invalid path => ENOENT */ + fail |= check_execveat_fail(dot_dfd, "no-such-file", 0, ENOENT); + fail |= check_execveat_fail(dot_dfd_path, "no-such-file", 0, ENOENT); + fail |= check_execveat_fail(AT_FDCWD, "no-such-file", 0, ENOENT); + /* Attempt to execute directory => EACCES */ + fail |= check_execveat_fail(dot_dfd, "", AT_EMPTY_PATH, EACCES); + /* Attempt to execute non-executable => EACCES */ + fail |= check_execveat_fail(dot_dfd, "Makefile", 0, EACCES); + /* Attempt to execute file opened with O_PATH => EBADF */ + fail |= check_execveat_fail(dot_dfd_path, "", AT_EMPTY_PATH, EBADF); + fail |= check_execveat_fail(fd_path, "", AT_EMPTY_PATH, EBADF); + /* Attempt to execute nonsense FD => EBADF */ + fail |= check_execveat_fail(99, "", AT_EMPTY_PATH, EBADF); + fail |= check_execveat_fail(99, "execveat", 0, EBADF); + /* Attempt to execute relative to non-directory => ENOTDIR */ + fail |= check_execveat_fail(fd, "execveat", 0, ENOTDIR); + + return fail ? -1 : 0; +} + +void exe_cp(const char *src, const char *dest) +{ + int in_fd = open_or_die(src, O_RDONLY); + int out_fd = open(dest, O_RDWR|O_CREAT|O_TRUNC, 0755); + struct stat info; + + fstat(in_fd, &info); + sendfile(out_fd, in_fd, NULL, info.st_size); + close(in_fd); + close(out_fd); +} + +void prerequisites(void) +{ + int fd; + const char *script = "#!/bin/sh\nexit $*\n"; + + /* Create ephemeral copies of files */ + exe_cp("execveat", "execveat.ephemeral"); + exe_cp("script", "script.ephemeral"); + mkdir("subdir.ephemeral", 0755); + + fd = open("subdir.ephemeral/script", O_RDWR|O_CREAT|O_TRUNC, 0755); + write(fd, script, strlen(script)); + close(fd); +} + +int main(int argc, char **argv) +{ + int rc; + + if (argc >= 2) { + /* If we are invoked with an argument, exit immediately. */ + /* Check expected environment transferred. */ + const char *in_test = getenv("IN_TEST"); + + if (!in_test || strcmp(in_test, "yes") != 0) { + printf("[FAIL] (no IN_TEST=yes in env)\n"); + return 1; + } + + /* Use the final argument as an exit code. */ + rc = atoi(argv[argc - 1]); + fflush(stdout); + } else { + prerequisites(); + rc = run_tests(); + } + return rc; +} -- 2.1.0.rc2.206.gedb03e5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCHv5 man-pages 3/3] execveat.2: initial man page for execveat(2) [not found] ` <1413978269-17274-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2014-10-22 11:44 ` [PATCHv5 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale @ 2014-10-22 11:44 ` David Drysdale 2014-10-22 17:55 ` Eric W. Biederman 1 sibling, 1 reply; 13+ messages in thread From: David Drysdale @ 2014-10-22 11:44 UTC (permalink / raw) To: Eric W. Biederman, Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-arch-u79uwXL29TY76Z2rM5mHXA, linux-api-u79uwXL29TY76Z2rM5mHXA, David Drysdale Signed-off-by: David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> --- man2/execveat.2 | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 144 insertions(+) create mode 100644 man2/execveat.2 diff --git a/man2/execveat.2 b/man2/execveat.2 new file mode 100644 index 000000000000..d19571a3eb9d --- /dev/null +++ b/man2/execveat.2 @@ -0,0 +1,144 @@ +.\" Copyright (c) 2014 Google, Inc. +.\" +.\" %%%LICENSE_START(VERBATIM) +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of this +.\" manual under the conditions for verbatim copying, provided that the +.\" entire resulting derived work is distributed under the terms of a +.\" permission notice identical to this one. +.\" +.\" Since the Linux kernel and libraries are constantly changing, this +.\" manual page may be incorrect or out-of-date. The author(s) assume no +.\" responsibility for errors or omissions, or for damages resulting from +.\" the use of the information contained herein. The author(s) may not +.\" have taken the same level of care in the production of this manual, +.\" which is licensed free of charge, as they might when working +.\" professionally. +.\" +.\" Formatted or processed versions of this manual, if unaccompanied by +.\" the source, must acknowledge the copyright and authors of this work. +.\" %%%LICENSE_END +.\" +.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual" +.SH NAME +execveat \- execute program relative to a directory file descriptor +.SH SYNOPSIS +.B #include <unistd.h> +.sp +.BI "int execve(int " fd ", const char *" pathname "," +.br +.BI " char *const " argv "[], char *const " envp "[]," +.br +.BI " int " flags); +.SH DESCRIPTION +The +.BR execveat () +system call executes the program pointed to by the combination of \fIfd\fP and \fIpathname\fP. +The +.BR execveat () +system call operates in exactly the same way as +.BR execve (2), +except for the differences described in this manual page. + +If the pathname given in +.I pathname +is relative, then it is interpreted relative to the directory +referred to by the file descriptor +.I fd +(rather than relative to the current working directory of +the calling process, as is done by +.BR execve (2) +for a relative pathname). + +If +.I pathname +is relative and +.I fd +is the special value +.BR AT_FDCWD , +then +.I pathname +is interpreted relative to the current working +directory of the calling process (like +.BR execve (2)). + +If +.I pathname +is absolute, then +.I fd +is ignored. + +If +.I pathname +is an empty string and the +.BR AT_EMPTY_PATH +flag is specified, then the file descriptor +.I fd +specifies the file to be executed. + +.I flags +can either be 0, or include the following flags: +.TP +.BR AT_EMPTY_PATH +If +.I pathname +is an empty string, operate on the file referred to by +.IR fd +(which may have been obtained using the +.BR open (2) +.B O_PATH +flag). +.TP +.B AT_SYMLINK_NOFOLLOW +If the file identified by +.I fd +and a non-NULL +.I pathname +is a symbolic link, then the call fails with the error +.BR EINVAL . +.SH "RETURN VALUE" +On success, +.BR execveat () +does not return. On error \-1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +The same errors that occur for +.BR execve (2) +can also occur for +.BR execveat (). +The following additional errors can occur for +.BR execveat (): +.TP +.B EBADF +.I fd +is not a valid file descriptor. +.TP +.B EINVAL +Invalid flag specified in +.IR flags . +.TP +.B ENOTDIR +.I pathname +is relative and +.I fd +is a file descriptor referring to a file other than a directory. +.SH VERSIONS +.BR execveat () +was added to Linux in kernel 3.???. +.SH NOTES +In addition to the reasons explained in +.BR openat (2), +the +.BR execveat () +system call is also needed to allow +.BR fexecve (3) +to be implemented on systems that do not have the +.I /proc +filesystem mounted. +.SH SEE ALSO +.BR execve (2), +.BR fexecve (3) -- 2.1.0.rc2.206.gedb03e5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCHv5 man-pages 3/3] execveat.2: initial man page for execveat(2) 2014-10-22 11:44 ` [PATCHv5 man-pages 3/3] execveat.2: initial man page " David Drysdale @ 2014-10-22 17:55 ` Eric W. Biederman [not found] ` <87wq7sgfah.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Eric W. Biederman @ 2014-10-22 17:55 UTC (permalink / raw) To: David Drysdale Cc: Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, x86, linux-arch, linux-api David Drysdale <drysdale@google.com> writes: > Signed-off-by: David Drysdale <drysdale@google.com> > --- > man2/execveat.2 | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 144 insertions(+) > create mode 100644 man2/execveat.2 > > diff --git a/man2/execveat.2 b/man2/execveat.2 > new file mode 100644 > index 000000000000..d19571a3eb9d > --- /dev/null > +++ b/man2/execveat.2 > @@ -0,0 +1,144 @@ > +.\" Copyright (c) 2014 Google, Inc. > +.\" > +.\" %%%LICENSE_START(VERBATIM) > +.\" Permission is granted to make and distribute verbatim copies of this > +.\" manual provided the copyright notice and this permission notice are > +.\" preserved on all copies. > +.\" > +.\" Permission is granted to copy and distribute modified versions of this > +.\" manual under the conditions for verbatim copying, provided that the > +.\" entire resulting derived work is distributed under the terms of a > +.\" permission notice identical to this one. > +.\" > +.\" Since the Linux kernel and libraries are constantly changing, this > +.\" manual page may be incorrect or out-of-date. The author(s) assume no > +.\" responsibility for errors or omissions, or for damages resulting from > +.\" the use of the information contained herein. The author(s) may not > +.\" have taken the same level of care in the production of this manual, > +.\" which is licensed free of charge, as they might when working > +.\" professionally. > +.\" > +.\" Formatted or processed versions of this manual, if unaccompanied by > +.\" the source, must acknowledge the copyright and authors of this work. > +.\" %%%LICENSE_END > +.\" > +.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual" > +.SH NAME > +execveat \- execute program relative to a directory file descriptor > +.SH SYNOPSIS > +.B #include <unistd.h> > +.sp > +.BI "int execve(int " fd ", const char *" pathname "," ^^^^^^ That needs to be execveat > +.br > +.BI " char *const " argv "[], char *const " envp "[]," > +.br > +.BI " int " flags); > +.SH DESCRIPTION > +The > +.BR execveat () > +system call executes the program pointed to by the combination of \fIfd\fP and \fIpathname\fP. > +The > +.BR execveat () > +system call operates in exactly the same way as > +.BR execve (2), > +except for the differences described in this manual page. > + > +If the pathname given in > +.I pathname > +is relative, then it is interpreted relative to the directory > +referred to by the file descriptor > +.I fd > +(rather than relative to the current working directory of > +the calling process, as is done by > +.BR execve (2) > +for a relative pathname). > + > +If > +.I pathname > +is relative and > +.I fd > +is the special value > +.BR AT_FDCWD , > +then > +.I pathname > +is interpreted relative to the current working > +directory of the calling process (like > +.BR execve (2)). > + > +If > +.I pathname > +is absolute, then > +.I fd > +is ignored. > + > +If > +.I pathname > +is an empty string and the > +.BR AT_EMPTY_PATH > +flag is specified, then the file descriptor > +.I fd > +specifies the file to be executed. > + > +.I flags > +can either be 0, or include the following flags: > +.TP > +.BR AT_EMPTY_PATH > +If > +.I pathname > +is an empty string, operate on the file referred to by > +.IR fd > +(which may have been obtained using the > +.BR open (2) > +.B O_PATH > +flag). > +.TP > +.B AT_SYMLINK_NOFOLLOW > +If the file identified by > +.I fd > +and a non-NULL > +.I pathname > +is a symbolic link, then the call fails with the error > +.BR EINVAL . > +.SH "RETURN VALUE" > +On success, > +.BR execveat () > +does not return. On error \-1 is returned, and > +.I errno > +is set appropriately. > +.SH ERRORS > +The same errors that occur for > +.BR execve (2) > +can also occur for > +.BR execveat (). > +The following additional errors can occur for > +.BR execveat (): > +.TP > +.B EBADF > +.I fd > +is not a valid file descriptor. > +.TP > +.B EINVAL > +Invalid flag specified in > +.IR flags . > +.TP > +.B ENOTDIR > +.I pathname > +is relative and > +.I fd > +is a file descriptor referring to a file other than a directory. > +.SH VERSIONS > +.BR execveat () > +was added to Linux in kernel 3.???. > +.SH NOTES > +In addition to the reasons explained in > +.BR openat (2), > +the > +.BR execveat () > +system call is also needed to allow > +.BR fexecve (3) > +to be implemented on systems that do not have the > +.I /proc > +filesystem mounted. > +.SH SEE ALSO > +.BR execve (2), > +.BR fexecve (3) ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <87wq7sgfah.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>]
* Re: [PATCHv5 man-pages 3/3] execveat.2: initial man page for execveat(2) [not found] ` <87wq7sgfah.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> @ 2014-10-22 20:13 ` David Drysdale 0 siblings, 0 replies; 13+ messages in thread From: David Drysdale @ 2014-10-22 20:13 UTC (permalink / raw) To: Eric W. Biederman Cc: Andy Lutomirski, Alexander Viro, Meredydd Luff, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Andrew Morton, Kees Cook, Arnd Bergmann, X86 ML, linux-arch, Linux API On Wed, Oct 22, 2014 at 6:55 PM, Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote: > David Drysdale <drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> writes: >> +.TH EXECVEAT 2 2014-04-02 "Linux" "Linux Programmer's Manual" >> +.SH NAME >> +execveat \- execute program relative to a directory file descriptor >> +.SH SYNOPSIS >> +.B #include <unistd.h> >> +.sp >> +.BI "int execve(int " fd ", const char *" pathname "," > ^^^^^^ That needs to be execveat Ah yes, so it does -- thanks for spotting. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-10-28 17:48 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-22 11:44 [PATCHv5 0/3] syscalls,x86: Add execveat() system call David Drysdale 2014-10-22 11:44 ` [PATCHv5 1/3] syscalls,x86: implement " David Drysdale 2014-10-22 18:07 ` Eric W. Biederman 2014-10-23 6:40 ` David Drysdale [not found] ` <1413978269-17274-2-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2014-10-22 18:44 ` Andy Lutomirski 2014-10-27 18:03 ` David Drysdale 2014-10-27 18:47 ` Andy Lutomirski 2014-10-28 17:30 ` David Drysdale 2014-10-28 17:48 ` Andy Lutomirski [not found] ` <1413978269-17274-1-git-send-email-drysdale-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> 2014-10-22 11:44 ` [PATCHv5 2/3] syscalls,x86: add selftest for execveat(2) David Drysdale 2014-10-22 11:44 ` [PATCHv5 man-pages 3/3] execveat.2: initial man page " David Drysdale 2014-10-22 17:55 ` Eric W. Biederman [not found] ` <87wq7sgfah.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> 2014-10-22 20:13 ` David Drysdale
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).