* [RFC] new open flag O_NOSTD @ 2009-08-24 12:22 Eric Blake 2009-08-25 12:16 ` [PATCH] open: introduce O_NOSTD Eric Blake 2009-08-30 11:12 ` [RFC] new open flag O_NOSTD James Youngman 0 siblings, 2 replies; 15+ messages in thread From: Eric Blake @ 2009-08-24 12:22 UTC (permalink / raw) To: linux-kernel; +Cc: bug-coreutils Proposal ======== Add a new flag, O_NOSTD, to at least open and pipe2 (and an alternate spelling SOCK_NOSTD for socket, socketpair, accept4), with the following semantics: If the flag is specified and the function is successful, the returned fd (both fds for the pipe2 case) will be at least 3, regardless of whether the standard file descriptors 0, 1, or 2 are currently closed. Rationale ========= GNU Coreutils tries hard to protect itself from whatever weird environment may be thrown at it. One example is if the user runs: cp a b 2>&- If cp encounters an error, it prints a message to stderr, then regardless of whether the message was successfully printed, cp guarantees a non-zero exit status. In the case where fd 2 starts life closed, however, a naive implementation could end up opening a destination file for writing as fd 2, then encounter an error, such that the first use of stderr to print an error message will incorrectly modify the contents of a completely unrelated file. Therefore, the best approach for cp to take is to ensure that command-line arguments never occupy fd 0, 1, or 2, no matter what the cp process inherited from its parent. Of course, if cp were installed set-user-ID Or set-group-ID, then the OS could guarantee that cp would never start life with fd 0, 1, or 2 closed; but cp should not normally be installed with these permissions, and POSIX does not permit the OS to arbitrarily open these fds if these permissions are not present. One option is for cp to manually guarantee that fd 0, 1, and 2 are opened prior to parsing command line options. At one point, coreutils even used this approach, via a function stdopen: http://git.savannah.gnu.org/cgit/coreutils.git/diff/lib/stdopen.c?id=875cae47 However, this has a couple of drawbacks. It costs several syscalls at startup, even in the common case of all three std descriptors being provided by the parent process. It also ties up otherwise unused open file descriptors (perhaps the user intentionally closed some of the std fds in order to provide room for allowing more simultaneously open files without hitting EMFILE limits). Another option is what cp currently uses, which guarantees that any function call that creates a new fd is wrapped by a *_safer variant, which guarantees that the result will never collide with the standard descriptors. In the common case, the original open() returns 3 or larger, so the wrapper has no additional work to perform. But if the user started cp with fd 0, 1, or 2 closed, then the current implementation of the open_safer wrapper notices that the underlying open() call is in the wrong range, and provides a followup call to fcntl(fd,F_DUPFD,3) and close(fd), such that the overall result is again safely out of the std fd range: http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/open-safer.c http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/fd-safer.c http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/dup-safer.c Notice that with coreutils' current approach, the common case (all std descriptors provided by the parent) uses the minimal number of syscalls. However, in the corner case of starting life with a standard descriptor closed, the number of additional fcntl(F_DUPFD)/close() calls cause noticeable slowdown when copying large hierarchies (especially when compared with the stdopen approach of only suffering an up-front syscall penalty). And while coreutils does not keep fd 0, 1, or 2 tied open on a useless file all the time, it is still putting pressure on these descriptors during the window of the open_safer wrapper, so it has not completely eliminated the EMFILE avoidance. Also, the coreutils' approach works well for a single-threaded application, but it needs modifications to use the recently added POSIX 2008 open(O_CLOEXEC) and fcntl(F_DUPFD_CLOEXEC) flags if it is to avoid leaking a temporary fd 0, 1, or 2 into child process created by a fork/exec in another thread during the time that the first thread is calling open_safer. Therefore it makes sense to move this functionality into the kernel, via the addition of a new open() flag that informs the kernel that a successful fd-creation syscall must behave as if fd 0, 1, and 2 were already open. The idea is not new, since fcntl(fd, F_DUPFD, 3) already does just this. Then, on kernels where this is available, coreutils can alter its open_safer function to pass the new flag to the underlying open() syscall, and avoid having to use fcntl/close to sanitize any returned fd, with the result of no difference in the number of syscalls regardless of whether the parent process started cp with stderr open or closed. It also solves the EMFILE and multithreading fd leak issue, since a temporary fd 0, 1, or 2 is never opened in the first place. The name proposed in this mail is O_NOSTD (implying that a successful result will not be any of the standard file descriptors); other ideas mentioned on the bug-gnulib list were O_SAFER, O_NONSTD, O_NOSTDFD. http://lists.gnu.org/archive/html/bug-gnulib/2009-08/msg00358.html Thoughts? -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@byu.net ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH] open: introduce O_NOSTD 2009-08-24 12:22 [RFC] new open flag O_NOSTD Eric Blake @ 2009-08-25 12:16 ` Eric Blake 2009-08-25 21:53 ` Davide Libenzi 2009-08-30 11:12 ` [RFC] new open flag O_NOSTD James Youngman 1 sibling, 1 reply; 15+ messages in thread From: Eric Blake @ 2009-08-25 12:16 UTC (permalink / raw) To: linux-kernel Cc: bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar, Davide Libenzi, Eric Blake Many applications have subtle bugs if started with one or more of the STD*_FILENO file descriptors closed; although this is an uncommon case, it should be considered during security audits. For example, an attempt to write a message to stderr during 'cp a b >&- 2>&-' in a naive implementation of 'cp' could end up corrupting the contents of file 'b'. In general, if any of the standard streams are used by an application (including via its libraries), then the safest course of action is to ensure that all other fds created by the process will not interfere with the three standard fds. One solution is for the application to manually ensure that all three std fds are open (either inherited, or to an innocuous replacement like /dev/null) prior to opening fds during normal behavior; but this costs extra syscalls up front for all invocations, even though a closed standard fd is an uncommon case. Unfortunately, opening a dummy file interferes with the ability to conditionally warn on improper use of inheriting a closed standard fd; GNU coreutils intentionally distinguishes between the successful 'cp a b <&-' and the fatal 'cp -i a b <&-'. Another solution is for the application to sanitize all newly-created fds: GNU coreutils provides a wrapper open_safer, which does nothing extra in the common case that open() returned 3 or larger, but calls fcntl(n,F_DUPFD,3)/close(n) before returning if n was less than 3. However, this leads to triple the syscall cost for every open() call if the process starts life with a std fd closed; and if O_CLOEXEC is not used, still leaves a window of time where the fd can be leaked through another thread's use of fork/exec. With the recent addition of O_CLOEXEC support, all fd creation sites now take a flags parameter that can be used to provide this sanitization in the kernel, rather than via an application wrapper that requires additional syscalls. The new flag O_NOSTD guarantees that a created fd will be 3 or larger, regardless of whether any of the standard fds are currently closed. Adding this flag in the kernel allows for the absolute minimum in syscalls, while still guaranteeing safety that unrelated fds will never be confused with the standard streams. This patch implements O_NOSTD for open, pipe2, socket, socketpair, and accept4. It does not add support for Linux-only fd creation, such as inotify_init, although that can be done later. It intentionally does not support O_NOSTD for dup3, since that syscall does not return the next available fd. Signed-off-by: Eric Blake <ebb9@byu.net> --- arch/alpha/include/asm/fcntl.h | 1 + arch/parisc/include/asm/fcntl.h | 1 + arch/sparc/include/asm/fcntl.h | 1 + fs/file.c | 2 ++ fs/pipe.c | 2 +- include/asm-generic/fcntl.h | 3 +++ include/linux/net.h | 1 + net/socket.c | 12 +++++++----- 8 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/alpha/include/asm/fcntl.h b/arch/alpha/include/asm/fcntl.h index 25da001..5af6265 100644 --- a/arch/alpha/include/asm/fcntl.h +++ b/arch/alpha/include/asm/fcntl.h @@ -17,6 +17,7 @@ #define O_DIRECT 02000000 /* direct disk access - should check with OSF/1 */ #define O_NOATIME 04000000 #define O_CLOEXEC 010000000 /* set close_on_exec */ +#define O_NOSTD 020000000 /* avoid fd 0, 1, 2 */ #define F_GETLK 7 #define F_SETLK 8 diff --git a/arch/parisc/include/asm/fcntl.h b/arch/parisc/include/asm/fcntl.h index 1e1c824..688c5d2 100644 --- a/arch/parisc/include/asm/fcntl.h +++ b/arch/parisc/include/asm/fcntl.h @@ -15,6 +15,7 @@ #define O_RSYNC 002000000 /* HPUX only */ #define O_NOATIME 004000000 #define O_CLOEXEC 010000000 /* set close_on_exec */ +#define O_NOSTD 020000000 /* avoid fd 0, 1, 2 */ #define O_DIRECTORY 000010000 /* must be a directory */ #define O_NOFOLLOW 000000200 /* don't follow links */ diff --git a/arch/sparc/include/asm/fcntl.h b/arch/sparc/include/asm/fcntl.h index d4d9c9d..a7dae19 100644 --- a/arch/sparc/include/asm/fcntl.h +++ b/arch/sparc/include/asm/fcntl.h @@ -20,6 +20,7 @@ #define O_DIRECT 0x100000 /* direct disk access hint */ #define O_NOATIME 0x200000 #define O_CLOEXEC 0x400000 +#define O_NOSTD 0x800000 /* avoid fd 0, 1, 2 */ #define F_GETOWN 5 /* for sockets. */ #define F_SETOWN 6 /* for sockets. */ diff --git a/fs/file.c b/fs/file.c index f313314..94e1f67 100644 --- a/fs/file.c +++ b/fs/file.c @@ -444,6 +444,8 @@ int alloc_fd(unsigned start, unsigned flags) int error; struct fdtable *fdt; + if (start < 3 && (flags & O_NOSTD)) + start = 3; spin_lock(&files->file_lock); repeat: fdt = files_fdtable(files); diff --git a/fs/pipe.c b/fs/pipe.c index 52c4151..d5f52bb 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1033,7 +1033,7 @@ int do_pipe_flags(int *fd, int flags) int error; int fdw, fdr; - if (flags & ~(O_CLOEXEC | O_NONBLOCK)) + if (flags & ~(O_CLOEXEC | O_NONBLOCK | O_NOSTD)) return -EINVAL; fw = create_write_pipe(flags); diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h index 4d3e483..b9b4290 100644 --- a/include/asm-generic/fcntl.h +++ b/include/asm-generic/fcntl.h @@ -51,6 +51,9 @@ #ifndef O_CLOEXEC #define O_CLOEXEC 02000000 /* set close_on_exec */ #endif +#ifndef O_NOSTD +#define O_NOSTD 04000000 /* avoid fd 0, 1, 2 */ +#endif #ifndef O_NDELAY #define O_NDELAY O_NONBLOCK #endif diff --git a/include/linux/net.h b/include/linux/net.h index 4fc2ffd..a489122 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -105,6 +105,7 @@ enum sock_type { #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif +#define SOCK_NOSTD O_NOSTD #endif /* ARCH_HAS_SOCKET_TYPES */ diff --git a/net/socket.c b/net/socket.c index 6d47165..177bfb2 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1271,12 +1271,14 @@ SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol) /* Check the SOCK_* constants for consistency. */ BUILD_BUG_ON(SOCK_CLOEXEC != O_CLOEXEC); + BUILD_BUG_ON(SOCK_NOSTD != O_NOSTD); BUILD_BUG_ON((SOCK_MAX | SOCK_TYPE_MASK) != SOCK_TYPE_MASK); BUILD_BUG_ON(SOCK_CLOEXEC & SOCK_TYPE_MASK); BUILD_BUG_ON(SOCK_NONBLOCK & SOCK_TYPE_MASK); + BUILD_BUG_ON(SOCK_NOSTD & SOCK_TYPE_MASK); flags = type & ~SOCK_TYPE_MASK; - if (flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK)) + if (flags & ~(SOCK_CLOEXEC | SOCK_NONBLOCK | SOCK_NOSTD)) return -EINVAL; type &= SOCK_TYPE_MASK; @@ -1287,7 +1289,7 @@ SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol) if (retval < 0) goto out; - retval = sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK)); + retval = sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK | O_NOSTD)); if (retval < 0) goto out_release; @@ -1337,13 +1339,13 @@ SYSCALL_DEFINE4(socketpair, int, family, int, type, int, protocol, if (err < 0) goto out_release_both; - fd1 = sock_alloc_fd(&newfile1, flags & O_CLOEXEC); + fd1 = sock_alloc_fd(&newfile1, flags & (O_CLOEXEC | O_NOSTD)); if (unlikely(fd1 < 0)) { err = fd1; goto out_release_both; } - fd2 = sock_alloc_fd(&newfile2, flags & O_CLOEXEC); + fd2 = sock_alloc_fd(&newfile2, flags & (O_CLOEXEC | O_NOSTD)); if (unlikely(fd2 < 0)) { err = fd2; put_filp(newfile1); @@ -1498,7 +1500,7 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr, */ __module_get(newsock->ops->owner); - newfd = sock_alloc_fd(&newfile, flags & O_CLOEXEC); + newfd = sock_alloc_fd(&newfile, flags & (O_CLOEXEC | O_NOSTD)); if (unlikely(newfd < 0)) { err = newfd; sock_release(newsock); -- 1.6.3.3.334.g916e1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-25 12:16 ` [PATCH] open: introduce O_NOSTD Eric Blake @ 2009-08-25 21:53 ` Davide Libenzi 2009-08-27 13:54 ` Eric Blake 0 siblings, 1 reply; 15+ messages in thread From: Davide Libenzi @ 2009-08-25 21:53 UTC (permalink / raw) To: Eric Blake Cc: Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar On Tue, 25 Aug 2009, Eric Blake wrote: > Another solution is for the application to sanitize all newly-created > fds: GNU coreutils provides a wrapper open_safer, which does nothing > extra in the common case that open() returned 3 or larger, but calls > fcntl(n,F_DUPFD,3)/close(n) before returning if n was less than 3. > However, this leads to triple the syscall cost for every open() call > if the process starts life with a std fd closed; and if O_CLOEXEC is > not used, still leaves a window of time where the fd can be leaked > through another thread's use of fork/exec. I think we can say that the vast majority of the software is not going to notice the proposed open_safer(), performance-wise, since the first three fds are always filled. So IMO the performance impact argument is a weak one. If CLOEXEC semantics are needed in the open operation, F_DUPFD_CLOEXEC can be used to match it. While the patch is simple, IMO this is something that can be easily taken care in glibc layers w/out huge drawbacks. - Davide ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-25 21:53 ` Davide Libenzi @ 2009-08-27 13:54 ` Eric Blake 2009-08-27 14:22 ` Ulrich Drepper ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Eric Blake @ 2009-08-27 13:54 UTC (permalink / raw) To: Davide Libenzi Cc: Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Davide Libenzi on 8/25/2009 3:53 PM: >> Another solution is for the application to sanitize all newly-created >> fds: GNU coreutils provides a wrapper open_safer, which does nothing >> extra in the common case that open() returned 3 or larger, but calls >> fcntl(n,F_DUPFD,3)/close(n) before returning if n was less than 3. >> However, this leads to triple the syscall cost for every open() call >> if the process starts life with a std fd closed; and if O_CLOEXEC is >> not used, still leaves a window of time where the fd can be leaked >> through another thread's use of fork/exec. > > I think we can say that the vast majority of the software is not going to > notice the proposed open_safer(), performance-wise, since the first three > fds are always filled. So IMO the performance impact argument is a weak one. > If CLOEXEC semantics are needed in the open operation, F_DUPFD_CLOEXEC can > be used to match it. The current gnulib implementation of open_safer (pre-O_CLOEXEC support) is (roughly): /* Wrap open, to guarantee that a successful return value is >= 3. */ int open_safer (const char *name, int flags, int mode) { int fd = open (name, flags, mode); if (0 <= fd && fd <= 2) { int dup = fcntl (fd, F_DUPFD, 3); int saved_errno = errno; close (fd); errno = saved_errno; fd = dup; } return fd; } which has the desired property of no overhead in the common case of all standard fds open. But it obviously mishandles the O_CLOEXEC flag. Here's a first cut at supporting it: int open_safer (const char *name, int flags, int mode) { int fd = open (name, flags, mode); if (0 <= fd && fd <= 2) { int dup = fcntl (fd, ((flags & O_CLOEXEC) ? F_DUPFD_CLOEXEC : F_DUPFD), 3); int saved_errno = errno; close (fd); errno = saved_errno; fd = dup; } return fd; } If the user requested open_safer(O_CLOEXEC), then we still have the desired property of no syscall overhead and no fd leak. But if the user intentionally does not pass O_CLOEXEC because they wanted to create an inheritable fd, but without stomping on standard fds, then this version still has a window for an fd leak. So let's try this version, which guarantees no fd leak, while still keeping the semantics of giving the user an inheritable fd outside the std fd range: int open_safer (const char *name, int flags, int mode) { int fd = open (name, flags | O_CLOEXEC, mode); if (0 <= fd && fd <= 2) { int dup = fcntl (fd, ((flags & O_CLOEXEC) ? F_DUPFD_CLOEXEC : F_DUPFD), 3); int saved_errno = errno; close (fd); errno = saved_errno; fd = dup; } else if (!(flags & O_CLOEXEC)) { if ((flags = fcntl (fd, F_GETFD)) < 0 || fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC) == -1) { int saved_errno = errno; close (fd); fd = -1; errno = saved_errno; } } return fd; } This solves the fd leak, and open_safer(O_CLOEXEC) is still cheap in the common case. But now the case of open_safer without O_CLOEXEC costs 3 syscalls, regardless of whether the standard fds were already open (if we assumed there were no possibility of new FD_* flags, we could cut the common-case penalty from three to two by skipping the fcntl(fd,F_GETFD) and just using fcntl(fd,F_SETFD,0), but that's asking for problems). > While the patch is simple, IMO this is something that can be easily taken > care in glibc layers w/out huge drawbacks. I hope that my example shows why doing it in the kernel is desirable - there is no safe way to keep the pre-O_CLOEXEC efficiency using just the library, but there IS a way to do it with kernel support: int open_safer (const char *name, int flags, int mode) { return open (name, flags | O_NOSTD, mode); } Also, remember this statement from Ulrich in the series that first introduced O_CLOEXEC/O_NONBLOCK support across all the fd creation APIs: "In future there will be other uses. Like a O_* flag to enable the use of non-sequential descriptors." http://marc.info/?l=linux-kernel&m=121010907127190&w=2 - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqWj/gACgkQ84KuGfSFAYClIgCdEw6/7+PiFhR7aKGyuLUd5RdR lbAAmgKLqCC5zlhkOm/x4K+Om7nqeD0i =Ibq4 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-27 13:54 ` Eric Blake @ 2009-08-27 14:22 ` Ulrich Drepper 2009-08-28 12:28 ` Eric Blake 2009-08-27 14:35 ` Florian Weimer 2009-08-27 22:55 ` Davide Libenzi 2 siblings, 1 reply; 15+ messages in thread From: Ulrich Drepper @ 2009-08-27 14:22 UTC (permalink / raw) To: Eric Blake Cc: Davide Libenzi, Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ingo Molnar On 08/27/2009 06:54 AM, Eric Blake wrote: > I hope that my example shows why doing it in the kernel is desirable - > there is no safe way to keep the pre-O_CLOEXEC efficiency using just the > library, but there IS a way to do it with kernel support: You're describing a very special case where the performance implications are really minimal and try to argue that is a good enough reason? I don't think so. If a program really has to do thousands of these safe open calls then it can invest time into opening /dev/null for any of the unallocated descriptors < 3. You can even embed this logic in the safer_open function. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-27 14:22 ` Ulrich Drepper @ 2009-08-28 12:28 ` Eric Blake 0 siblings, 0 replies; 15+ messages in thread From: Eric Blake @ 2009-08-28 12:28 UTC (permalink / raw) To: Ulrich Drepper Cc: Davide Libenzi, Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ingo Molnar -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Ulrich Drepper on 8/27/2009 8:22 AM: >> I hope that my example shows why doing it in the kernel is desirable - >> there is no safe way to keep the pre-O_CLOEXEC efficiency using just the >> library, but there IS a way to do it with kernel support: > > You're describing a very special case where the performance implications > are really minimal and try to argue that is a good enough reason? I > don't think so. > > If a program really has to do thousands of these safe open calls then it > can invest time into opening /dev/null for any of the unallocated > descriptors < 3. You can even embed this logic in the safer_open function. But that's what I've been trying to explain. There are cases where opening a dummy /dev/null descriptor interferes with behavior, and where you NEED to leave the std descriptors closed. Again, let's look at coreutils, which already is using a version of open_safer in many of its utilities. Consider 'cp -i a b <&-'. If b does not exist, then there is no reason for cp to prompt, so it never reads from stdin, and the fact that stdin was closed must not interfere with execution. Which means cp cannot just blindly warn if it detects that stdin was closed when it started; it has to wait until conditions warrant reading from stdin. On the other hand, if b exists, coreutils' cp correctly warns the user about the inability to read from stdin: $ cp -i a b <&- cp: overwrite `b'? cp: error closing file: Bad file descriptor $ echo $? 1 If coreutils were to tie fd 0 to /dev/null, then it can no longer distinguish a read failure when reading the prompt from stdin, and would no longer be able to alert the user to the failed interactive copy. Next, consider 'cp -uv a b >&-'. Coreutils uses printf() to print status messages, but only if an update warranted a message. Again, that means coreutils wants to know whether stdout is a valid file, and tying fd 1 to /dev/null can get in the way: $ cp -uv a b >&- $ rm b $ cp -uv a b >&- cp: write error: Bad file descriptor Now, couple this with the fact that 'cp -r' can visit thousands of files during its course of operation. There is no sane way to tie dummy fds to the standard descriptors, and still be able to distinguish failure to use a standard descriptor but only on the execution paths that actually used that standard descriptor. Hence, there is a definite use case for keeping the std fds closed, rather than putting dummies in their place; but to preserve this setup without kernel support, every single open() (and dup(), pipe(), socket(), ...) must be wrapped to use fcntl/close to move new fds back out of this range. And cp is an example of an existing program that really DOES end up paying a penalty of thousands of fcntl/close calls if started with a standard fd closed, in order to preserve its semantics. Furthermore, while the coreutils' implementation of open_safer operates correctly in a single-threaded environment, it cannot close the race in a multi-threaded context if the application started life with fd 2 closed, since there is a window of time during open_safer where another thread can attempt to use stderr and see thread 1's newly opened file before it can be moved safely out of the way: thread 1 thread 2 open_safer... open -> 2 fcntl(2,F_DUPFD,3) fputc(stderr) fflush(stderr) -> clobbers thread 1's file close(2) open_safer -> 3 - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqXzXgACgkQ84KuGfSFAYA+ZwCgvuQ3PpDDLhLozqCUm3qyhIVj aqQAnR+MgdLg5apNEdHdIn9nlr4yRISI =0tcQ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-27 13:54 ` Eric Blake 2009-08-27 14:22 ` Ulrich Drepper @ 2009-08-27 14:35 ` Florian Weimer 2009-08-28 12:45 ` Eric Blake 2009-08-27 22:55 ` Davide Libenzi 2 siblings, 1 reply; 15+ messages in thread From: Florian Weimer @ 2009-08-27 14:35 UTC (permalink / raw) To: Eric Blake Cc: Davide Libenzi, Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar * Eric Blake: > int open_safer (const char *name, int flags, int mode) > { > int fd = open (name, flags | O_CLOEXEC, mode); > if (0 <= fd && fd <= 2) > { > int dup = fcntl (fd, ((flags & O_CLOEXEC) > ? F_DUPFD_CLOEXEC : F_DUPFD), 3); > int saved_errno = errno; > close (fd); > errno = saved_errno; > fd = dup; > } > else if (!(flags & O_CLOEXEC)) > { > if ((flags = fcntl (fd, F_GETFD)) < 0 > || fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC) == -1) > { > int saved_errno = errno; > close (fd); > fd = -1; > errno = saved_errno; > } > } > return fd; > } > This solves the fd leak, It's still buggy. You need something like this: int open_safer(const char *name, int flags, int mode) { int opened_fd[3] = {0, 0, 0}; int fd, i, errno_saved; while (1) { fd = open(name, flags | O_CLOEXEC, mode); if (fd < 0 || fd > 2) { break; } opened_fd[fd] = 1; } for (int i = 0; i <= 2; ++i) { if (opened_fd[i]) { errno_saved = errno; close(i); errno = errno_saved; } } return fd; } It's untested, so it's probably still buggy. (O_CLOEXEC should have been a thread attribute, like the base path in the *_at functions. *sigh*) -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-27 14:35 ` Florian Weimer @ 2009-08-28 12:45 ` Eric Blake 2009-08-28 12:52 ` Florian Weimer 0 siblings, 1 reply; 15+ messages in thread From: Eric Blake @ 2009-08-28 12:45 UTC (permalink / raw) To: Florian Weimer Cc: Davide Libenzi, Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Florian Weimer on 8/27/2009 8:35 AM: > * Eric Blake: > >> int open_safer (const char *name, int flags, int mode) >> { >> int fd = open (name, flags | O_CLOEXEC, mode); >> if (0 <= fd && fd <= 2) >> { >> int dup = fcntl (fd, ((flags & O_CLOEXEC) >> ? F_DUPFD_CLOEXEC : F_DUPFD), 3); >> int saved_errno = errno; >> close (fd); >> errno = saved_errno; >> fd = dup; >> } >> else if (!(flags & O_CLOEXEC)) >> { >> if ((flags = fcntl (fd, F_GETFD)) < 0 >> || fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC) == -1) >> { >> int saved_errno = errno; >> close (fd); >> fd = -1; >> errno = saved_errno; >> } >> } >> return fd; >> } > >> This solves the fd leak, > > It's still buggy. In what manner? Are you stating that F_DUPFD_CLOEXEC is not atomic? > You need something like this: > > int open_safer(const char *name, int flags, int mode) > { > int opened_fd[3] = {0, 0, 0}; > int fd, i, errno_saved; > while (1) { > fd = open(name, flags | O_CLOEXEC, mode); > if (fd < 0 || fd > 2) { > break; > } > opened_fd[fd] = 1; > } > for (int i = 0; i <= 2; ++i) { > if (opened_fd[i]) { > errno_saved = errno; > close(i); > errno = errno_saved; > } > } > return fd; > } > > It's untested, so it's probably still buggy. Your version fails to clear the cloexec bit of the final fd if the original caller didn't request O_CLOEXEC. If the caller requested O_CLOEXEC, then your version takes 3, 5, or 7 syscalls depending on how many std fds were closed, while my version takes 3 syscalls regardless of how many std fds were closed. Also, your suggestion has a definite race in that you are calling open() multiple times rather than cloning an existing fd after the first open(), such that another process could alter which file is visited between your first and last open(). - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqX0W0ACgkQ84KuGfSFAYDYdwCeOB8dt0Rx2QYJkfIsfP452AYj V7QAoL1FACwnRPhhQ2aTh2EB38i+d34o =ouPs -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-28 12:45 ` Eric Blake @ 2009-08-28 12:52 ` Florian Weimer 2009-08-28 12:58 ` Eric Blake 2009-08-28 13:04 ` Eric Blake 0 siblings, 2 replies; 15+ messages in thread From: Florian Weimer @ 2009-08-28 12:52 UTC (permalink / raw) To: Eric Blake Cc: Davide Libenzi, Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar * Eric Blake: > Your version fails to clear the cloexec bit of the final fd if the > original caller didn't request O_CLOEXEC. Okay, but you can fix that in a race-free manner (but I thought that this was implied by open_safer). > If the caller requested O_CLOEXEC, then your version takes 3, 5, or > 7 syscalls depending on how many std fds were closed, while my > version takes 3 syscalls regardless of how many std fds were closed. I really don't see a way around that. You can't pick a descriptor and hope that it's unused. > Also, your suggestion has a definite race in that you are calling > open() multiple times rather than cloning an existing fd after the > first open(), such that another process could alter which file is > visited between your first and last open(). Sure, but this is an unobservable differen.ce -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-28 12:52 ` Florian Weimer @ 2009-08-28 12:58 ` Eric Blake 2009-08-28 13:04 ` Eric Blake 1 sibling, 0 replies; 15+ messages in thread From: Eric Blake @ 2009-08-28 12:58 UTC (permalink / raw) To: Florian Weimer Cc: Davide Libenzi, Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Florian Weimer on 8/28/2009 6:52 AM: >> If the caller requested O_CLOEXEC, then your version takes 3, 5, or >> 7 syscalls depending on how many std fds were closed, while my >> version takes 3 syscalls regardless of how many std fds were closed. > > I really don't see a way around that. You can't pick a descriptor and > hope that it's unused. fcntl(,F_DUPFD,3) is NOT like dup2. It picks the next available descriptor that is at least 3. - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqX1HEACgkQ84KuGfSFAYAzYQCgnwad5k39bZJqw3Wg1vMED/Fd zjYAmwenJlImBUE33geRdHC6aoVSx7JG =omNI -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-28 12:52 ` Florian Weimer 2009-08-28 12:58 ` Eric Blake @ 2009-08-28 13:04 ` Eric Blake 1 sibling, 0 replies; 15+ messages in thread From: Eric Blake @ 2009-08-28 13:04 UTC (permalink / raw) To: Florian Weimer Cc: Davide Libenzi, Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 According to Florian Weimer on 8/28/2009 6:52 AM: > * Eric Blake: > >> Your version fails to clear the cloexec bit of the final fd if the >> original caller didn't request O_CLOEXEC. > > Okay, but you can fix that in a race-free manner (but I thought that > this was implied by open_safer). The current semantics of gnulib's open_safer is that the result is guaranteed to be 3 or larger. It would require an audit of all gnulib clients of the open_safer method to see whether it also makes sense to change the semantics of open_safer to also guarantee that fds start life with the cloexec bit set. But maybe that is a change worth making in gnulib, with applications intending to give an fd to a child process being required to explicitly clear the cloexec bit. >> Also, your suggestion has a definite race in that you are calling >> open() multiple times rather than cloning an existing fd after the >> first open(), such that another process could alter which file is >> visited between your first and last open(). > > Sure, but this is an unobservable differen.ce It is absolutely observable - if the user passed O_CREAT|O_EXCL as part of their flags, then the second open() will inappropriately fail. - -- Don't work too hard, make some time for fun as well! Eric Blake ebb9@byu.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkqX1egACgkQ84KuGfSFAYDKWACeMM4spqCsmgVVwME9+C/1tdpU g7wAnR9FetGPGr7acXLfLIVvzYZ7tpz3 =VjUY -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-27 13:54 ` Eric Blake 2009-08-27 14:22 ` Ulrich Drepper 2009-08-27 14:35 ` Florian Weimer @ 2009-08-27 22:55 ` Davide Libenzi 2009-08-27 23:11 ` Ulrich Drepper 2 siblings, 1 reply; 15+ messages in thread From: Davide Libenzi @ 2009-08-27 22:55 UTC (permalink / raw) To: Eric Blake Cc: Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar On Thu, 27 Aug 2009, Eric Blake wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > According to Davide Libenzi on 8/25/2009 3:53 PM: > >> Another solution is for the application to sanitize all newly-created > >> fds: GNU coreutils provides a wrapper open_safer, which does nothing > >> extra in the common case that open() returned 3 or larger, but calls > >> fcntl(n,F_DUPFD,3)/close(n) before returning if n was less than 3. > >> However, this leads to triple the syscall cost for every open() call > >> if the process starts life with a std fd closed; and if O_CLOEXEC is > >> not used, still leaves a window of time where the fd can be leaked > >> through another thread's use of fork/exec. > > > > I think we can say that the vast majority of the software is not going to > > notice the proposed open_safer(), performance-wise, since the first three > > fds are always filled. So IMO the performance impact argument is a weak one. > > If CLOEXEC semantics are needed in the open operation, F_DUPFD_CLOEXEC can > > be used to match it. > > The current gnulib implementation of open_safer (pre-O_CLOEXEC support) is > (roughly): > > /* Wrap open, to guarantee that a successful return value is >= 3. */ > int open_safer (const char *name, int flags, int mode) > { > int fd = open (name, flags, mode); > if (0 <= fd && fd <= 2) > { > int dup = fcntl (fd, F_DUPFD, 3); > int saved_errno = errno; > close (fd); > errno = saved_errno; > fd = dup; > } > return fd; > } > > which has the desired property of no overhead in the common case of all > standard fds open. But it obviously mishandles the O_CLOEXEC flag. > Here's a first cut at supporting it: > > int open_safer (const char *name, int flags, int mode) > { > int fd = open (name, flags, mode); > if (0 <= fd && fd <= 2) > { > int dup = fcntl (fd, ((flags & O_CLOEXEC) > ? F_DUPFD_CLOEXEC : F_DUPFD), 3); > int saved_errno = errno; > close (fd); > errno = saved_errno; > fd = dup; > } > return fd; > } > > If the user requested open_safer(O_CLOEXEC), then we still have the > desired property of no syscall overhead and no fd leak. But if the user > intentionally does not pass O_CLOEXEC because they wanted to create an > inheritable fd, but without stomping on standard fds, then this version > still has a window for an fd leak. So let's try this version, which > guarantees no fd leak, while still keeping the semantics of giving the > user an inheritable fd outside the std fd range: > > int open_safer (const char *name, int flags, int mode) > { > int fd = open (name, flags | O_CLOEXEC, mode); > if (0 <= fd && fd <= 2) > { > int dup = fcntl (fd, ((flags & O_CLOEXEC) > ? F_DUPFD_CLOEXEC : F_DUPFD), 3); > int saved_errno = errno; > close (fd); > errno = saved_errno; > fd = dup; > } > else if (!(flags & O_CLOEXEC)) > { > if ((flags = fcntl (fd, F_GETFD)) < 0 > || fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC) == -1) > { > int saved_errno = errno; > close (fd); > fd = -1; > errno = saved_errno; > } > } > return fd; > } > > This solves the fd leak, and open_safer(O_CLOEXEC) is still cheap in the > common case. But now the case of open_safer without O_CLOEXEC costs 3 > syscalls, regardless of whether the standard fds were already open (if we > assumed there were no possibility of new FD_* flags, we could cut the > common-case penalty from three to two by skipping the fcntl(fd,F_GETFD) > and just using fcntl(fd,F_SETFD,0), but that's asking for problems). > > > While the patch is simple, IMO this is something that can be easily taken > > care in glibc layers w/out huge drawbacks. > > I hope that my example shows why doing it in the kernel is desirable - > there is no safe way to keep the pre-O_CLOEXEC efficiency using just the > library, but there IS a way to do it with kernel support: > > int open_safer (const char *name, int flags, int mode) > { > return open (name, flags | O_NOSTD, mode); > } Can't the handling be done on close(), like (modulo some errno save/restore): int safer_close(int fd) { int error = close(fd); if (fd < 3 && fd >= 0) { if ((fd = open("/dev/null", O_RDWR)) > 2) close(fd); } return error; } - Davide ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] open: introduce O_NOSTD 2009-08-27 22:55 ` Davide Libenzi @ 2009-08-27 23:11 ` Ulrich Drepper 0 siblings, 0 replies; 15+ messages in thread From: Ulrich Drepper @ 2009-08-27 23:11 UTC (permalink / raw) To: Davide Libenzi Cc: Eric Blake, Linux Kernel Mailing List, bug-coreutils, bug-gnulib, Ulrich Drepper, Ingo Molnar On Thu, Aug 27, 2009 at 15:55, Davide Libenzi<davidel@xmailserver.org> wrote: > Can't the handling be done on close(), like (modulo some errno save/restore): No. You can have any file descriptor closed when the process is started. No close in the process with the special close. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] new open flag O_NOSTD 2009-08-24 12:22 [RFC] new open flag O_NOSTD Eric Blake 2009-08-25 12:16 ` [PATCH] open: introduce O_NOSTD Eric Blake @ 2009-08-30 11:12 ` James Youngman 2009-08-30 11:18 ` Jim Meyering 1 sibling, 1 reply; 15+ messages in thread From: James Youngman @ 2009-08-30 11:12 UTC (permalink / raw) To: Eric Blake; +Cc: linux-kernel, bug-coreutils On Mon, Aug 24, 2009 at 1:22 PM, Eric Blake<ebb9@byu.net> wrote: > The name proposed in this mail is O_NOSTD (implying that a successful > result will not be any of the standard file descriptors); other ideas > mentioned on the bug-gnulib list were O_SAFER, O_NONSTD, O_NOSTDFD. > http://lists.gnu.org/archive/html/bug-gnulib/2009-08/msg00358.html > > Thoughts? Mostly yes, I like it. I'd like to explicitly vote against O_SAFER, because there may be (for some systems or in the future) some other way of making open(2) safer. I'd vote positively for O_NOSTDFD or O_NOSTD. James. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC] new open flag O_NOSTD 2009-08-30 11:12 ` [RFC] new open flag O_NOSTD James Youngman @ 2009-08-30 11:18 ` Jim Meyering 0 siblings, 0 replies; 15+ messages in thread From: Jim Meyering @ 2009-08-30 11:18 UTC (permalink / raw) To: Eric Blake; +Cc: bug-coreutils, linux-kernel James Youngman wrote: > On Mon, Aug 24, 2009 at 1:22 PM, Eric Blake<ebb9@byu.net> wrote: >> The name proposed in this mail is O_NOSTD (implying that a successful >> result will not be any of the standard file descriptors); other ideas >> mentioned on the bug-gnulib list were O_SAFER, O_NONSTD, O_NOSTDFD. >> http://lists.gnu.org/archive/html/bug-gnulib/2009-08/msg00358.html >> >> Thoughts? > > Mostly yes, I like it. > > I'd like to explicitly vote against O_SAFER, because there may be (for > some systems or in the future) some other way of making open(2) safer. > I'd vote positively for O_NOSTDFD or O_NOSTD. I like this addition, too. However, I would prefer O_NOSTDFD, since O_NOSTD evokes "non-standard". ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-08-30 11:19 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-24 12:22 [RFC] new open flag O_NOSTD Eric Blake 2009-08-25 12:16 ` [PATCH] open: introduce O_NOSTD Eric Blake 2009-08-25 21:53 ` Davide Libenzi 2009-08-27 13:54 ` Eric Blake 2009-08-27 14:22 ` Ulrich Drepper 2009-08-28 12:28 ` Eric Blake 2009-08-27 14:35 ` Florian Weimer 2009-08-28 12:45 ` Eric Blake 2009-08-28 12:52 ` Florian Weimer 2009-08-28 12:58 ` Eric Blake 2009-08-28 13:04 ` Eric Blake 2009-08-27 22:55 ` Davide Libenzi 2009-08-27 23:11 ` Ulrich Drepper 2009-08-30 11:12 ` [RFC] new open flag O_NOSTD James Youngman 2009-08-30 11:18 ` Jim Meyering
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox