public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* New system call wanted: fdreopen
@ 2012-12-09 15:03 Tristan Wibberley
  2012-12-09 16:27 ` Theodore Ts'o
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Tristan Wibberley @ 2012-12-09 15:03 UTC (permalink / raw)
  To: linux-kernel

Hello,

I'd like to propose a system call called "fdreopen":

  int fdreopen(int src_fd, int dst_fd, int flags);


I am willing to try implementing this system call given some suggestions 
where to start and what locking to watch out for. I have given a brief of 
the behaviour below, and a description of the class of problem that it 
solves at the end.

Does anybody know any reasons why this system call would be impossible/
impractical or otherwise unacceptable?

Any improvements I should consider before trying to implement it?


Behaviour
=========

This system call would be like dup3 except for these things:

 - if dst_fd is -1 then the lowest available file descriptor is allocated
   rather than returning EBADF as dup3 does.

 - the new file descriptor points to a *new* entry in the file table much
   as if the original file had been opened again via open or openat. This
   means that two large independent libraries can seek and read without
   synchronising even when they cannot open a file by its path.

 - O_RDWR access can be reduced to O_RDONLY or O_WRONLY:
    int src_fd = open("/file", O_RDWR | O_CLOEXEC);
    new_fd = fdreopen(src_fd, -1, O_CLOEXEC | O_RDONLY);

 - it would be async signal safe.


Why
===

A common idiom on Linux is to open a file and keep the fd open so that 
the underlying file can be unlinked from its directory. But if the file 
needs to be read from several different parts of the codebase then due to 
the file descriptor having exactly one read pointer those different parts 
must be synchronised which is a relatively difficult task.

I think that this new system call is required to achieve that neatly and 
simply:

- dup does not solve this problem because it only allows the new file
  descriptor to have its own flags (eg O_CLOEXEC).

- /proc/self/fd/* does not solve this problem because the file might no
  longer be available at the same place in the filesystem. In some
  otherwise simple message passing or ReSTful IPC a different file will
  be available at that path.

I suspect that user space has been solving this problem with otherwise 
unnecessary levels of either synchronisation or difficult to reproduce 
occasional bugs.

-- 
Tristan Wibberley


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New system call wanted: fdreopen
  2012-12-09 15:03 New system call wanted: fdreopen Tristan Wibberley
@ 2012-12-09 16:27 ` Theodore Ts'o
  2012-12-09 17:18   ` Tristan Wibberley
  2012-12-09 19:37 ` Chris Adams
  2012-12-10 10:18 ` Kevin Easton
  2 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2012-12-09 16:27 UTC (permalink / raw)
  To: Tristan Wibberley; +Cc: linux-kernel

On Sun, Dec 09, 2012 at 03:03:30PM +0000, Tristan Wibberley wrote:
> 
> - /proc/self/fd/* does not solve this problem because the file might no
>   longer be available at the same place in the filesystem. In some
>   otherwise simple message passing or ReSTful IPC a different file will
>   be available at that path.

Actually, /proc/self/fd/* _will_ work.  When you do a ls -l, it looks
like a symlink, but the files in /proc/self/fd (and /proc/<pid>/fd
more generally) are magic.  If you open files in /proc/<pid>/fd/*, it
will do what you want.

See for yourself:

% cat > /tmp/foo.test
foo
bar
^Z
% jobs -l			# (and note the pid, hereafter <pid>)
% ls -l /proc/<pid>/fd 
% mv /tmp/foo.test /tmp/foo2.test
% ls -l /proc/<pid>/fd	# note that the symlink now points at /tmp/foo2.test
% cat /proc/<pid>/fd/1	# note that it works!
% rm /tmp/foo2.test
% ls -l /proc/<pid>/fd	# note that the symlink now has "(deleted)" at the end
% cat /proc/<pid>/fd/1	# note that it works!

This is of course horribly Linux-specific, but so would be a new
system call like your proposed fdrepon.  Better yet, using
/proc/self/fd/* will work *now*.  You don't have to wait for a new
system call in a future verison of the kernel to start shipping in new
distributions.

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New system call wanted: fdreopen
  2012-12-09 16:27 ` Theodore Ts'o
@ 2012-12-09 17:18   ` Tristan Wibberley
  0 siblings, 0 replies; 6+ messages in thread
From: Tristan Wibberley @ 2012-12-09 17:18 UTC (permalink / raw)
  To: linux-kernel

On Sun, 09 Dec 2012 11:27:46 -0500, Theodore Ts'o wrote:

> On Sun, Dec 09, 2012 at 03:03:30PM +0000, Tristan Wibberley wrote:
>> 
>> - /proc/self/fd/* does not solve this problem because the file might no

...

> Actually, /proc/self/fd/* _will_ work.  When you do a ls -l, it looks
> like a symlink, but the files in /proc/self/fd (and /proc/<pid>/fd more
> generally) are magic.  If you open files in /proc/<pid>/fd/*, it will do
> what you want.

Oh! Cool! By reading how to solve this class of problem with google 
search results I couldn't find any indication that /proc/pid/fd/* worked 
differently than a symlink.

Thanks for your quick reply.

-- 
Tristan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New system call wanted: fdreopen
  2012-12-09 15:03 New system call wanted: fdreopen Tristan Wibberley
  2012-12-09 16:27 ` Theodore Ts'o
@ 2012-12-09 19:37 ` Chris Adams
  2012-12-09 20:47   ` Al Viro
  2012-12-10 10:18 ` Kevin Easton
  2 siblings, 1 reply; 6+ messages in thread
From: Chris Adams @ 2012-12-09 19:37 UTC (permalink / raw)
  To: linux-kernel

Once upon a time, Tristan Wibberley  <tristan.wibberley@gmail.com> said:
>A common idiom on Linux is to open a file and keep the fd open so that 
>the underlying file can be unlinked from its directory. But if the file 
>needs to be read from several different parts of the codebase then due to 
>the file descriptor having exactly one read pointer those different parts 
>must be synchronised which is a relatively difficult task.

I think you can get similar behavior entirely in user space and in a
fashion portable to at least BSD systems.  You could fork() (which would
create a separate FD in the child), pass the FD back to the parent over
a socket, and then have the child exit.

-- 
Chris Adams <cmadams@hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New system call wanted: fdreopen
  2012-12-09 19:37 ` Chris Adams
@ 2012-12-09 20:47   ` Al Viro
  0 siblings, 0 replies; 6+ messages in thread
From: Al Viro @ 2012-12-09 20:47 UTC (permalink / raw)
  To: Chris Adams; +Cc: linux-kernel

On Sun, Dec 09, 2012 at 01:37:33PM -0600, Chris Adams wrote:
> Once upon a time, Tristan Wibberley  <tristan.wibberley@gmail.com> said:
> >A common idiom on Linux is to open a file and keep the fd open so that 
> >the underlying file can be unlinked from its directory. But if the file 
> >needs to be read from several different parts of the codebase then due to 
> >the file descriptor having exactly one read pointer those different parts 
> >must be synchronised which is a relatively difficult task.
> 
> I think you can get similar behavior entirely in user space and in a
> fashion portable to at least BSD systems.  You could fork() (which would
> create a separate FD in the child), pass the FD back to the parent over
> a socket, and then have the child exit.

... and here's your well-earned F for UNIX101.  fork() will *NOT* do anything
of that sort.  Not on anything starting at v1 - the bug you want to rely upon
had been killed very early, possibly even before the migration from PDP-8 to
PDP-11.

Think how would something like (ls;ls) >/tmp/a work if you didn't have
current file offset shared across fork(2).  Parent doesn't do any IO
here; both children inherit stdout from it when they are forked and
write to said stdout.  We want the output of the second not at the
offset 0, obviously.

That's *the* reason why we (every Unix out there) have a distinction between
file descriptors and opened files.  open() yields a new IO channel (aka
opened file).  It also creates a new descriptor and associates that channel
with it.  fork() does *not* create new opened files.  It creates new
descriptor table, populating it with additional references to opened files
the parent had descriptors for.

It's the same difference as between doing open() of the same file twice and
doing open() + dup().  In fact, what you've described is a very obfuscated
way to do dup(2) - SCM_RIGHTS descriptor passing will take a descriptor in
sender, acquire an extra reference to opened file corresponding to it and
store that reference in datagram.  Recepient will allocate a new descriptor
and associate it with the reference to opened file it has found in datagram.

Seriously, this is as basic as it gets - understanding how redirects work
and what file descriptors are really ought to be covered by any introductory
course on Unix, let alone anything that touches descriptor-passing.  Current
IO offset is a property of opened file, not of a descriptor.  What the original
poster has described is clearly a new opened file instance associated with
the same filesystem object.  dup() or fork() will do nothing of that sort;
not on Linux, not on *BSD, not on any Unix.

open() on /proc/<pid>/fd/<n> will, but it's Linux-specific.  Whether it's
a good idea or not, depends on the program in question, obviously.  In any
case, you don't need a new syscall for that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: New system call wanted: fdreopen
  2012-12-09 15:03 New system call wanted: fdreopen Tristan Wibberley
  2012-12-09 16:27 ` Theodore Ts'o
  2012-12-09 19:37 ` Chris Adams
@ 2012-12-10 10:18 ` Kevin Easton
  2 siblings, 0 replies; 6+ messages in thread
From: Kevin Easton @ 2012-12-10 10:18 UTC (permalink / raw)
  To: Tristan Wibberley; +Cc: linux-kernel

Quoting Tristan Wibberley <tristan.wibberley@gmail.com>:

> Why
> ===
>
> A common idiom on Linux is to open a file and keep the fd open so that
> the underlying file can be unlinked from its directory. But if the file
> needs to be read from several different parts of the codebase then due to
> the file descriptor having exactly one read pointer those different parts
> must be synchronised which is a relatively difficult task.

Another alternative is to use pwrite() / pread(), which do not affect the file
pointer.  They're in POSIX.

     - Kevin



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-12-10 10:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-09 15:03 New system call wanted: fdreopen Tristan Wibberley
2012-12-09 16:27 ` Theodore Ts'o
2012-12-09 17:18   ` Tristan Wibberley
2012-12-09 19:37 ` Chris Adams
2012-12-09 20:47   ` Al Viro
2012-12-10 10:18 ` Kevin Easton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox