* New system call wanted: fdreopen
@ 2012-12-09 15:03 Tristan Wibberley
2012-12-09 16:27 ` Theodore Ts'o
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Tristan Wibberley @ 2012-12-09 15:03 UTC (permalink / raw)
To: linux-kernel
Hello,
I'd like to propose a system call called "fdreopen":
int fdreopen(int src_fd, int dst_fd, int flags);
I am willing to try implementing this system call given some suggestions
where to start and what locking to watch out for. I have given a brief of
the behaviour below, and a description of the class of problem that it
solves at the end.
Does anybody know any reasons why this system call would be impossible/
impractical or otherwise unacceptable?
Any improvements I should consider before trying to implement it?
Behaviour
=========
This system call would be like dup3 except for these things:
- if dst_fd is -1 then the lowest available file descriptor is allocated
rather than returning EBADF as dup3 does.
- the new file descriptor points to a *new* entry in the file table much
as if the original file had been opened again via open or openat. This
means that two large independent libraries can seek and read without
synchronising even when they cannot open a file by its path.
- O_RDWR access can be reduced to O_RDONLY or O_WRONLY:
int src_fd = open("/file", O_RDWR | O_CLOEXEC);
new_fd = fdreopen(src_fd, -1, O_CLOEXEC | O_RDONLY);
- it would be async signal safe.
Why
===
A common idiom on Linux is to open a file and keep the fd open so that
the underlying file can be unlinked from its directory. But if the file
needs to be read from several different parts of the codebase then due to
the file descriptor having exactly one read pointer those different parts
must be synchronised which is a relatively difficult task.
I think that this new system call is required to achieve that neatly and
simply:
- dup does not solve this problem because it only allows the new file
descriptor to have its own flags (eg O_CLOEXEC).
- /proc/self/fd/* does not solve this problem because the file might no
longer be available at the same place in the filesystem. In some
otherwise simple message passing or ReSTful IPC a different file will
be available at that path.
I suspect that user space has been solving this problem with otherwise
unnecessary levels of either synchronisation or difficult to reproduce
occasional bugs.
--
Tristan Wibberley
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: New system call wanted: fdreopen 2012-12-09 15:03 New system call wanted: fdreopen Tristan Wibberley @ 2012-12-09 16:27 ` Theodore Ts'o 2012-12-09 17:18 ` Tristan Wibberley 2012-12-09 19:37 ` Chris Adams 2012-12-10 10:18 ` Kevin Easton 2 siblings, 1 reply; 6+ messages in thread From: Theodore Ts'o @ 2012-12-09 16:27 UTC (permalink / raw) To: Tristan Wibberley; +Cc: linux-kernel On Sun, Dec 09, 2012 at 03:03:30PM +0000, Tristan Wibberley wrote: > > - /proc/self/fd/* does not solve this problem because the file might no > longer be available at the same place in the filesystem. In some > otherwise simple message passing or ReSTful IPC a different file will > be available at that path. Actually, /proc/self/fd/* _will_ work. When you do a ls -l, it looks like a symlink, but the files in /proc/self/fd (and /proc/<pid>/fd more generally) are magic. If you open files in /proc/<pid>/fd/*, it will do what you want. See for yourself: % cat > /tmp/foo.test foo bar ^Z % jobs -l # (and note the pid, hereafter <pid>) % ls -l /proc/<pid>/fd % mv /tmp/foo.test /tmp/foo2.test % ls -l /proc/<pid>/fd # note that the symlink now points at /tmp/foo2.test % cat /proc/<pid>/fd/1 # note that it works! % rm /tmp/foo2.test % ls -l /proc/<pid>/fd # note that the symlink now has "(deleted)" at the end % cat /proc/<pid>/fd/1 # note that it works! This is of course horribly Linux-specific, but so would be a new system call like your proposed fdrepon. Better yet, using /proc/self/fd/* will work *now*. You don't have to wait for a new system call in a future verison of the kernel to start shipping in new distributions. Regards, - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New system call wanted: fdreopen 2012-12-09 16:27 ` Theodore Ts'o @ 2012-12-09 17:18 ` Tristan Wibberley 0 siblings, 0 replies; 6+ messages in thread From: Tristan Wibberley @ 2012-12-09 17:18 UTC (permalink / raw) To: linux-kernel On Sun, 09 Dec 2012 11:27:46 -0500, Theodore Ts'o wrote: > On Sun, Dec 09, 2012 at 03:03:30PM +0000, Tristan Wibberley wrote: >> >> - /proc/self/fd/* does not solve this problem because the file might no ... > Actually, /proc/self/fd/* _will_ work. When you do a ls -l, it looks > like a symlink, but the files in /proc/self/fd (and /proc/<pid>/fd more > generally) are magic. If you open files in /proc/<pid>/fd/*, it will do > what you want. Oh! Cool! By reading how to solve this class of problem with google search results I couldn't find any indication that /proc/pid/fd/* worked differently than a symlink. Thanks for your quick reply. -- Tristan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New system call wanted: fdreopen 2012-12-09 15:03 New system call wanted: fdreopen Tristan Wibberley 2012-12-09 16:27 ` Theodore Ts'o @ 2012-12-09 19:37 ` Chris Adams 2012-12-09 20:47 ` Al Viro 2012-12-10 10:18 ` Kevin Easton 2 siblings, 1 reply; 6+ messages in thread From: Chris Adams @ 2012-12-09 19:37 UTC (permalink / raw) To: linux-kernel Once upon a time, Tristan Wibberley <tristan.wibberley@gmail.com> said: >A common idiom on Linux is to open a file and keep the fd open so that >the underlying file can be unlinked from its directory. But if the file >needs to be read from several different parts of the codebase then due to >the file descriptor having exactly one read pointer those different parts >must be synchronised which is a relatively difficult task. I think you can get similar behavior entirely in user space and in a fashion portable to at least BSD systems. You could fork() (which would create a separate FD in the child), pass the FD back to the parent over a socket, and then have the child exit. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New system call wanted: fdreopen 2012-12-09 19:37 ` Chris Adams @ 2012-12-09 20:47 ` Al Viro 0 siblings, 0 replies; 6+ messages in thread From: Al Viro @ 2012-12-09 20:47 UTC (permalink / raw) To: Chris Adams; +Cc: linux-kernel On Sun, Dec 09, 2012 at 01:37:33PM -0600, Chris Adams wrote: > Once upon a time, Tristan Wibberley <tristan.wibberley@gmail.com> said: > >A common idiom on Linux is to open a file and keep the fd open so that > >the underlying file can be unlinked from its directory. But if the file > >needs to be read from several different parts of the codebase then due to > >the file descriptor having exactly one read pointer those different parts > >must be synchronised which is a relatively difficult task. > > I think you can get similar behavior entirely in user space and in a > fashion portable to at least BSD systems. You could fork() (which would > create a separate FD in the child), pass the FD back to the parent over > a socket, and then have the child exit. ... and here's your well-earned F for UNIX101. fork() will *NOT* do anything of that sort. Not on anything starting at v1 - the bug you want to rely upon had been killed very early, possibly even before the migration from PDP-8 to PDP-11. Think how would something like (ls;ls) >/tmp/a work if you didn't have current file offset shared across fork(2). Parent doesn't do any IO here; both children inherit stdout from it when they are forked and write to said stdout. We want the output of the second not at the offset 0, obviously. That's *the* reason why we (every Unix out there) have a distinction between file descriptors and opened files. open() yields a new IO channel (aka opened file). It also creates a new descriptor and associates that channel with it. fork() does *not* create new opened files. It creates new descriptor table, populating it with additional references to opened files the parent had descriptors for. It's the same difference as between doing open() of the same file twice and doing open() + dup(). In fact, what you've described is a very obfuscated way to do dup(2) - SCM_RIGHTS descriptor passing will take a descriptor in sender, acquire an extra reference to opened file corresponding to it and store that reference in datagram. Recepient will allocate a new descriptor and associate it with the reference to opened file it has found in datagram. Seriously, this is as basic as it gets - understanding how redirects work and what file descriptors are really ought to be covered by any introductory course on Unix, let alone anything that touches descriptor-passing. Current IO offset is a property of opened file, not of a descriptor. What the original poster has described is clearly a new opened file instance associated with the same filesystem object. dup() or fork() will do nothing of that sort; not on Linux, not on *BSD, not on any Unix. open() on /proc/<pid>/fd/<n> will, but it's Linux-specific. Whether it's a good idea or not, depends on the program in question, obviously. In any case, you don't need a new syscall for that. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: New system call wanted: fdreopen 2012-12-09 15:03 New system call wanted: fdreopen Tristan Wibberley 2012-12-09 16:27 ` Theodore Ts'o 2012-12-09 19:37 ` Chris Adams @ 2012-12-10 10:18 ` Kevin Easton 2 siblings, 0 replies; 6+ messages in thread From: Kevin Easton @ 2012-12-10 10:18 UTC (permalink / raw) To: Tristan Wibberley; +Cc: linux-kernel Quoting Tristan Wibberley <tristan.wibberley@gmail.com>: > Why > === > > A common idiom on Linux is to open a file and keep the fd open so that > the underlying file can be unlinked from its directory. But if the file > needs to be read from several different parts of the codebase then due to > the file descriptor having exactly one read pointer those different parts > must be synchronised which is a relatively difficult task. Another alternative is to use pwrite() / pread(), which do not affect the file pointer. They're in POSIX. - Kevin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-12-10 10:29 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-09 15:03 New system call wanted: fdreopen Tristan Wibberley 2012-12-09 16:27 ` Theodore Ts'o 2012-12-09 17:18 ` Tristan Wibberley 2012-12-09 19:37 ` Chris Adams 2012-12-09 20:47 ` Al Viro 2012-12-10 10:18 ` Kevin Easton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox