F_DUPFD_CLOEXEC implementation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* F_DUPFD_CLOEXEC implementation
@ 2007-09-28 17:34 Ulrich Drepper
  2007-09-28 18:19 ` Davide Libenzi
  2007-09-30  0:31 ` Denys Vlasenko
  0 siblings, 2 replies; 15+ messages in thread
From: Ulrich Drepper @ 2007-09-28 17:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm

One more small change to extend the availability of creation of
file descriptors with FD_CLOEXEC set.  Adding a new command to
fcntl() requires no new system call and the overall impact on
code size if minimal.

If this patch gets accepted we will also add this change to the
next revision of the POSIX spec.

To test the patch, use the following little program.  Adjust the
value of F_DUPFD_CLOEXEC appropriately.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#ifndef F_DUPFD_CLOEXEC
# define F_DUPFD_CLOEXEC 12
#endif

int
main (int argc, char *argv[])
{
  if  (argc > 1)
    {
      if (fcntl (3, F_GETFD) == 0)
	{
	  puts ("descriptor not closed");
	  exit (1);
	}
      if (errno != EBADF)
	{
	  puts ("error not EBADF");
	  exit (1);
	}

      exit (0);
    }
  int fd = fcntl (STDOUT_FILENO, F_DUPFD_CLOEXEC, 0);
  if (fd == -1 && errno == EINVAL)
    {
      puts ("F_DUPFD_CLOEXEC not supported");
      return 0;
    }
  if (fd != 3)
    {
      puts ("program called with descriptors other than 0,1,2");
      return 1;
    }

  execl ("/proc/self/exe", "/proc/self/exe", "1", NULL);
  puts ("execl failed");
  return 1;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper <drepper@redhat.com>

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 78b2ff0..c9db73f 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -110,7 +110,7 @@ out:
 	return error;
 }
 
-static int dupfd(struct file *file, unsigned int start)
+static int dupfd(struct file *file, unsigned int start, int cloexec)
 {
 	struct files_struct * files = current->files;
 	struct fdtable *fdt;
@@ -122,7 +122,10 @@ static int dupfd(struct file *file, unsigned int start)
 		/* locate_fd() may have expanded fdtable, load the ptr */
 		fdt = files_fdtable(files);
 		FD_SET(fd, fdt->open_fds);
-		FD_CLR(fd, fdt->close_on_exec);
+		if (cloexec)
+			FD_SET(fd, fdt->close_on_exec);
+		else
+			FD_CLR(fd, fdt->close_on_exec);
 		spin_unlock(&files->file_lock);
 		fd_install(fd, file);
 	} else {
@@ -195,7 +198,7 @@ asmlinkage long sys_dup(unsigned int fildes)
 	struct file * file = fget(fildes);
 
 	if (file)
-		ret = dupfd(file, 0);
+		ret = dupfd(file, 0, 0);
 	return ret;
 }
 
@@ -319,8 +322,9 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 
 	switch (cmd) {
 	case F_DUPFD:
+	case F_DUPFD_CLOEXEC:
 		get_file(filp);
-		err = dupfd(filp, arg);
+		err = dupfd(filp, arg, cmd == F_DUPFD_CLOEXEC);
 		break;
 	case F_GETFD:
 		err = get_close_on_exec(fd) ? FD_CLOEXEC : 0;
diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index b847741..b01408a 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -73,6 +73,9 @@
 #define F_SETSIG	10	/* for sockets. */
 #define F_GETSIG	11	/* for sockets. */
 #endif
+#ifndef F_DUPFD_CLOEXEC
+#define F_DUPFD_CLOEXEC	12
+#endif
 
 /* for F_[GET|SET]FL */
 #define FD_CLOEXEC	1	/* actually anything with low bit set goes */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-09-28 17:34 F_DUPFD_CLOEXEC implementation Ulrich Drepper
@ 2007-09-28 18:19 ` Davide Libenzi
  2007-09-28 18:23   ` Ulrich Drepper
  2007-09-30  0:31 ` Denys Vlasenko
  1 sibling, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2007-09-28 18:19 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux Kernel Mailing List, Andrew Morton

On Fri, 28 Sep 2007, Ulrich Drepper wrote:

> One more small change to extend the availability of creation of
> file descriptors with FD_CLOEXEC set.  Adding a new command to
> fcntl() requires no new system call and the overall impact on
> code size if minimal.
> 
> If this patch gets accepted we will also add this change to the
> next revision of the POSIX spec.
> 
> To test the patch, use the following little program.  Adjust the
> value of F_DUPFD_CLOEXEC appropriately.

I think new system calls would have been a cleaner way to accomplish this. 
The "small pill at a time" may have better chance to go in, but will 
likely result in an uglier userspace interface.
In any case, this is better than *nothing*, if it makes it easier to use 
fds inside system libraries.



- Davide



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-09-28 18:19 ` Davide Libenzi
@ 2007-09-28 18:23   ` Ulrich Drepper
  0 siblings, 0 replies; 15+ messages in thread
From: Ulrich Drepper @ 2007-09-28 18:23 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Linux Kernel Mailing List, Andrew Morton

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Davide Libenzi wrote:
> I think new system calls would have been a cleaner way to accomplish this. 
> The "small pill at a time" may have better chance to go in, but will 
> likely result in an uglier userspace interface.

We'd need this call anyway since neither dup nor dup2 provides the
functionality of F_DUPFD (but F_DUPFD can be used to implement dup).

For dup2() I will wait until we have a sys_indirect implementation.
I'll try to get this soon.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG/Ua02ijCOnn/RHQRAgOQAKCfQ9H4VYau6nVGuVXyJ7IfBXK+QgCfYQxv
k4esG379v8VBceFIECDybk0=
=dvhX
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-09-28 17:34 F_DUPFD_CLOEXEC implementation Ulrich Drepper
  2007-09-28 18:19 ` Davide Libenzi
@ 2007-09-30  0:31 ` Denys Vlasenko
  2007-09-30 23:11   ` Davide Libenzi
  2007-10-01  0:59   ` Miquel van Smoorenburg
  1 sibling, 2 replies; 15+ messages in thread
From: Denys Vlasenko @ 2007-09-30  0:31 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: linux-kernel, akpm

Hi Ulrich,

On Friday 28 September 2007 18:34, Ulrich Drepper wrote:
> One more small change to extend the availability of creation of
> file descriptors with FD_CLOEXEC set.  Adding a new command to
> fcntl() requires no new system call and the overall impact on
> code size if minimal.

Tangential question: do you have any idea how userspace can
safely do nonblocking read or write on a potentially-shared fd?

IIUC, currently it cannot be done without races:

old_flags = fcntl(fd, F_GETFL);
...other process may change flags!...
fcntl(fd, F_SETFL, old_flags | O_NONBLOCK);
read(fd, ...)
...other process may see flags changed under its feet!...
fcntl(fd, F_SETFL, old_flags);

Can this be fixed?
--
vda

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-09-30  0:31 ` Denys Vlasenko
@ 2007-09-30 23:11   ` Davide Libenzi
  2007-09-30 23:58     ` Denys Vlasenko
  2007-10-01  0:59   ` Miquel van Smoorenburg
  1 sibling, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2007-09-30 23:11 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Ulrich Drepper, linux-kernel, akpm

On Sun, 30 Sep 2007, Denys Vlasenko wrote:

> Hi Ulrich,
> 
> On Friday 28 September 2007 18:34, Ulrich Drepper wrote:
> > One more small change to extend the availability of creation of
> > file descriptors with FD_CLOEXEC set.  Adding a new command to
> > fcntl() requires no new system call and the overall impact on
> > code size if minimal.
> 
> Tangential question: do you have any idea how userspace can
> safely do nonblocking read or write on a potentially-shared fd?
> 
> IIUC, currently it cannot be done without races:
> 
> old_flags = fcntl(fd, F_GETFL);
> ...other process may change flags!...
> fcntl(fd, F_SETFL, old_flags | O_NONBLOCK);
> read(fd, ...)
> ...other process may see flags changed under its feet!...
> fcntl(fd, F_SETFL, old_flags);
> 
> Can this be fixed?

I'm not sure I understood correctly your use case. But, if you have two 
processes/threads randomly switching O_NONBLOCK on/off, your problems 
arise not only the F_SETFL time.
If one of the tasks is not expecting an fd to be O_NONBLOCK, that will 
likely end up not handling correctly read/write-miss situations.
In that case it'd be better to keep the fd as O_NONBLOCK, and manually 
create blocking behaviour (when needed) with poll+read/write.



- Davide



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-09-30 23:11   ` Davide Libenzi
@ 2007-09-30 23:58     ` Denys Vlasenko
  2007-10-01  3:15       ` Davide Libenzi
  0 siblings, 1 reply; 15+ messages in thread
From: Denys Vlasenko @ 2007-09-30 23:58 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Ulrich Drepper, linux-kernel, akpm

On Monday 01 October 2007 00:11, Davide Libenzi wrote:
> On Sun, 30 Sep 2007, Denys Vlasenko wrote:
> 
> > Hi Ulrich,
> > 
> > On Friday 28 September 2007 18:34, Ulrich Drepper wrote:
> > > One more small change to extend the availability of creation of
> > > file descriptors with FD_CLOEXEC set.  Adding a new command to
> > > fcntl() requires no new system call and the overall impact on
> > > code size if minimal.
> > 
> > Tangential question: do you have any idea how userspace can
> > safely do nonblocking read or write on a potentially-shared fd?
> > 
> > IIUC, currently it cannot be done without races:
> > 
> > old_flags = fcntl(fd, F_GETFL);
> > ...other process may change flags!...
> > fcntl(fd, F_SETFL, old_flags | O_NONBLOCK);
> > read(fd, ...)
> > ...other process may see flags changed under its feet!...
> > fcntl(fd, F_SETFL, old_flags);
> > 
> > Can this be fixed?
> 
> I'm not sure I understood correctly your use case. But, if you have two 
> processes/threads randomly switching O_NONBLOCK on/off, your problems 
> arise not only the F_SETFL time.

My use case is: I want to do a nonblocking read on descriptor 0 (stdin).
It may be a pipe or a socket.

There may be other processes which share this descriptor with me,
I simply cannot know that. And they, too, may want to do reads on it.

I want to do nonblocking read in such a way that neither those other
processes will ever see fd switching to O_NONBLOCK and back, and
I also want to be safe from other processes doing the same.

I don't see how this can be done using standard unix primitives.
--
vda

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-09-30  0:31 ` Denys Vlasenko
  2007-09-30 23:11   ` Davide Libenzi
@ 2007-10-01  0:59   ` Miquel van Smoorenburg
  1 sibling, 0 replies; 15+ messages in thread
From: Miquel van Smoorenburg @ 2007-10-01  0:59 UTC (permalink / raw)
  To: linux-kernel

In article <200709300131.49320.vda.linux@googlemail.com>,
Denys Vlasenko  <vda.linux@googlemail.com> wrote:
>Hi Ulrich,
>
>On Friday 28 September 2007 18:34, Ulrich Drepper wrote:
>> One more small change to extend the availability of creation of
>> file descriptors with FD_CLOEXEC set.  Adding a new command to
>> fcntl() requires no new system call and the overall impact on
>> code size if minimal.
>
>Tangential question: do you have any idea how userspace can
>safely do nonblocking read or write on a potentially-shared fd?
>
>IIUC, currently it cannot be done without races:
>
>old_flags = fcntl(fd, F_GETFL);
>...other process may change flags!...
>fcntl(fd, F_SETFL, old_flags | O_NONBLOCK);
>read(fd, ...)
>...other process may see flags changed under its feet!...
>fcntl(fd, F_SETFL, old_flags);
>
>Can this be fixed?

This is for sockets, right ? Just use revc() instead of read().

	n = recv(filedesc, buffer, buflen, MSG_DONTWAIT);

.. is equivalent to setting O_NONBLOCK. See "man recv".

Mike.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-09-30 23:58     ` Denys Vlasenko
@ 2007-10-01  3:15       ` Davide Libenzi
  2007-10-01 10:07         ` Denys Vlasenko
  0 siblings, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2007-10-01  3:15 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Ulrich Drepper, Linux Kernel Mailing List, Andrew Morton

On Mon, 1 Oct 2007, Denys Vlasenko wrote:

> My use case is: I want to do a nonblocking read on descriptor 0 (stdin).
> It may be a pipe or a socket.
> 
> There may be other processes which share this descriptor with me,
> I simply cannot know that. And they, too, may want to do reads on it.
> 
> I want to do nonblocking read in such a way that neither those other
> processes will ever see fd switching to O_NONBLOCK and back, and
> I also want to be safe from other processes doing the same.
> 
> I don't see how this can be done using standard unix primitives.

Indeed. You could simulate non-blocking using poll with zero timeout, but 
if another task may read/write on it, your following read/write may end up 
blocking even after a poll returned the required events.
One way to solve this would be some sort of readx/writex where you pass an 
extra flags parameter (this could be done with sys_indirect, assuming 
we'll ever get that mainline) where you specify the non-blocking 
requirement for-this-call, and not as global per-file flag. Then, of 
course, you'll have to modify all the "file->f_flags & O_NONBLOCK" tests 
(and there are many of them) to check for that flag too (that can be a 
per task_struct flag).

- Davide

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-10-01  3:15       ` Davide Libenzi
@ 2007-10-01 10:07         ` Denys Vlasenko
  2007-10-01 18:16           ` Al Viro
  0 siblings, 1 reply; 15+ messages in thread
From: Denys Vlasenko @ 2007-10-01 10:07 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Ulrich Drepper, Linux Kernel Mailing List, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

On Monday 01 October 2007 04:15, Davide Libenzi wrote:
> On Mon, 1 Oct 2007, Denys Vlasenko wrote:
> 
> > My use case is: I want to do a nonblocking read on descriptor 0 (stdin).
> > It may be a pipe or a socket.
> > 
> > There may be other processes which share this descriptor with me,
> > I simply cannot know that. And they, too, may want to do reads on it.
> > 
> > I want to do nonblocking read in such a way that neither those other
> > processes will ever see fd switching to O_NONBLOCK and back, and
> > I also want to be safe from other processes doing the same.
> > 
> > I don't see how this can be done using standard unix primitives.
> 
> Indeed. You could simulate non-blocking using poll with zero timeout, but 
> if another task may read/write on it, your following read/write may end up 
> blocking even after a poll returned the required events.
> One way to solve this would be some sort of readx/writex where you pass an 
> extra flags parameter

We have that already. They are called send and recv. ;)

> (this could be done with sys_indirect, assuming  
> we'll ever get that mainline) where you specify the non-blocking 
> requirement for-this-call, and not as global per-file flag. Then, of 
> course, you'll have to modify all the "file->f_flags & O_NONBLOCK" tests 
> (and there are many of them) to check for that flag too (that can be a 
> per task_struct flag).

Attached patch detects send/recv(fd, buf, size, MSG_DONTWAIT) on
non-sockets and turns them into non-blocking write/read.
Since filp->f_flags appear to be read and modified without any locking,
I cannot modify it without potentially affecting other processes
accessing the same file through shared struct file.

Therefore I simply make a temporary copy of struct file, set
O_NONBLOCK in it and pass it to vfs_read/write.
Is this heresy? ;) I see only one spinlock in struct file:

#ifdef CONFIG_EPOLL
        spinlock_t              f_ep_lock;
#endif /* #ifdef CONFIG_EPOLL */

Do I need to take it?

Also attached is ndelaytest.c which can be used to test that
send(MSG_DONTWAIT) indeed is failing with EAGAIN if write would block
and that other processes never see O_NONBLOCK set.

Comments?
--
vda

[-- Attachment #2: ndelaytest.c --]
[-- Type: text/x-csrc, Size: 1463 bytes --]

#include <sys/types.h>
#include <sys/socket.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <time.h>
#include <signal.h>

#define SECONDS 10

#define STR "."
//#define STR "123456789 123456789 123456789 123456789 "

/* To see send() resulting in EAGAIN:
 * strace -ff -o log ndelaytest | while sleep 11; do break; done
 * log.$PID:
 * send(1, "123456789 123456789 123456789 12"..., 40, MSG_DONTWAIT)
 *                = -1 EAGAIN (Resource temporarily unavailable)
 */

int main()
{
	pid_t pid;
	time_t t;
	int fl;

	puts("starting");
	t = time(0);

	pid = fork();
	if (pid == 0) {
		/* child */
		while ((time(0) - t) < SECONDS-1) {
#if 0 
			/* Uncomment this part and simply run the executable
			 * to see race detection code in action */
#define OP "write"
			fcntl(1, F_SETFL, fcntl(1, F_GETFL) | O_NONBLOCK);
			fl = write(1, STR, sizeof(STR) - 1);
			fcntl(1, F_SETFL, fcntl(1, F_GETFL) & ~O_NONBLOCK);
#else
			/* This part tests whether send(MSG_DONTWAIT)
			 * is racy or not */
#define OP "send"
			fl = send(1, STR, sizeof(STR) - 1, MSG_DONTWAIT);
#endif
			if (fl < 0) {
				perror(OP);
				kill(getppid(), SIGKILL);
				return 1;
			}
		}
		return 0;
	}

	while ((time(0) - t) < SECONDS) {
		fl = fcntl(1, F_GETFL);
		if (fl & O_NONBLOCK) {
			fprintf(stderr, "NONBLOCK:1\n");
			kill(pid, SIGKILL);
			fcntl(1, F_SETFL, fl & ~O_NONBLOCK);
			return 1;
		}
	}
	fprintf(stderr, "NONBLOCK:0\n");
	return 0;
}

[-- Attachment #3: nonblock_linux-2.6.22-rc6.patch --]
[-- Type: text/x-diff, Size: 2902 bytes --]

--- linux-2.6.22-rc6.src/fs/read_write.c	Fri Jun 15 19:30:05 2007
+++ linux-2.6.22-rc6_ndelay/fs/read_write.c	Sun Aug 19 10:43:24 2007
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/syscalls.h>
 #include <linux/pagemap.h>
+#include <linux/socket.h>
 #include "read_write.h"
 
 #include <asm/uaccess.h>
@@ -351,6 +352,36 @@
 static inline void file_pos_write(struct file *file, loff_t pos)
 {
 	file->f_pos = pos;
+}
+
+/* Helper for send/recv on non-sockets */
+ssize_t rw_with_flags(struct file *file, int fput_needed, void __user *buf, size_t count, unsigned flags)
+{
+	int err;
+	loff_t pos;
+	struct file *file_copy;
+
+	file_copy = file;
+	if (flags & MSG_DONTWAIT) {
+		/* We make copy even if O_NONBLOCK is already set. */
+		/* We don't want it to change under our feet. */
+		file_copy = kmalloc(sizeof(*file_copy), GFP_KERNEL);
+		memcpy(file_copy, file, sizeof(*file_copy));
+		file_copy->f_flags |= O_NONBLOCK;
+	}
+
+	pos = file_pos_read(file);
+	if (flags & MSG_OOB) /* MSG_OOB is reused to mean 'write' */
+		err = vfs_write(file_copy, buf, count, &pos);
+	else
+		err = vfs_read(file_copy, buf, count, &pos);
+	file_pos_write(file, pos);
+
+	if (flags & MSG_DONTWAIT) {
+		kfree(file_copy);
+	}
+	fput_light(file, fput_needed);
+	return err;
 }
 
 asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)
--- linux-2.6.22-rc6.src/include/linux/fs.h	Wed Jun 27 21:24:18 2007
+++ linux-2.6.22-rc6_ndelay/include/linux/fs.h	Sun Aug 19 10:32:20 2007
@@ -1154,6 +1154,9 @@
 extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
 		unsigned long, loff_t *);
 
+extern ssize_t rw_with_flags(struct file *, int, void __user *, size_t,
+		unsigned);
+
 /*
  * NOTE: write_inode, delete_inode, clear_inode, put_inode can be called
  * without the big kernel lock held in all filesystems.
--- linux-2.6.22-rc6.src/net/socket.c	Fri Jun 15 19:30:08 2007
+++ linux-2.6.22-rc6_ndelay/net/socket.c	Sun Aug 19 11:34:07 2007
@@ -1585,8 +1585,17 @@
 		goto out;
 
 	sock = sock_from_file(sock_file, &err);
-	if (!sock)
-		goto out_put;
+	if (!sock) {
+		if (addr)
+			goto out_put;
+		if (flags & ~MSG_DONTWAIT)
+			goto out_put;
+		/* it's not a socket, but we support a special case:
+		 * send(fd, buf, count, MSG_DONTWAIT)
+		 * (MSG_OOB is reused to mean 'write') */
+		return rw_with_flags(sock_file, fput_needed, buff, len, flags | MSG_OOB);
+	}
+
 	iov.iov_base = buff;
 	iov.iov_len = len;
 	msg.msg_name = NULL;
@@ -1646,8 +1655,15 @@
 		goto out;
 
 	sock = sock_from_file(sock_file, &err);
-	if (!sock)
-		goto out_put;
+	if (!sock) {
+		if (addr)
+			goto out_put;
+		if (flags & ~MSG_DONTWAIT)
+			goto out_put;
+		/* it's not a socket, but we support a special case:
+		 * recv(fd, ubuf, size, MSG_DONTWAIT) */
+		return rw_with_flags(sock_file, fput_needed, ubuf, size, flags);
+	}
 
 	msg.msg_control = NULL;
 	msg.msg_controllen = 0;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-10-01 10:07         ` Denys Vlasenko
@ 2007-10-01 18:16           ` Al Viro
  2007-10-01 18:49             ` Denys Vlasenko
  2007-10-01 18:53             ` Michael Tokarev
  0 siblings, 2 replies; 15+ messages in thread
From: Al Viro @ 2007-10-01 18:16 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Davide Libenzi, Ulrich Drepper, Linux Kernel Mailing List,
	Andrew Morton

On Mon, Oct 01, 2007 at 11:07:15AM +0100, Denys Vlasenko wrote:
> Also attached is ndelaytest.c which can be used to test that
> send(MSG_DONTWAIT) indeed is failing with EAGAIN if write would block
> and that other processes never see O_NONBLOCK set.
> 
> Comments?

Never send patches during or approaching hangover?
	* it's on a bunch of cyclic lists.  Have its neighbor
go away while you are doing all that crap => boom
	* there's that thing call current position...  It gets buggered.
	* overwriting it while another task might be in the middle of
syscall involving it => boom
	* non-cooperative tasks reading *in* *parallel* from the same
opened file are going to have a lot more serious problems than agreeing
on O_NONBLOCK anyway, so I really don't understand what the hell is that for.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-10-01 18:16           ` Al Viro
@ 2007-10-01 18:49             ` Denys Vlasenko
  2007-10-01 19:04               ` Davide Libenzi
  2007-10-01 18:53             ` Michael Tokarev
  1 sibling, 1 reply; 15+ messages in thread
From: Denys Vlasenko @ 2007-10-01 18:49 UTC (permalink / raw)
  To: Al Viro
  Cc: Davide Libenzi, Ulrich Drepper, Linux Kernel Mailing List,
	Andrew Morton

On Monday 01 October 2007 19:16, Al Viro wrote:
> 	* it's on a bunch of cyclic lists.  Have its neighbor
> go away while you are doing all that crap => boom
> 	* there's that thing call current position...  It gets buggered.
> 	* overwriting it while another task might be in the middle of
> syscall involving it => boom

Hm, I suspected that it's herecy. Any idea how to do it cleanly?

> 	* non-cooperative tasks reading *in* *parallel* from the same
> opened file are going to have a lot more serious problems than agreeing
> on O_NONBLOCK anyway, so I really don't understand what the hell is that for.

They don't even need to read in parallel, just having shared fd is enough.
Think about pipes, sockets and terminals. A real-world scenario:

* a process started from shell (interactive or shell script)
* it sets O_NONBLOCK and does a read from fd 0...
* it gets killed (kill -9, whatever)
* shell suddenly has it's fd 0 in O_NONBLOCK mode
* shell and all subsequent commands started from it unexpectedly have
  O_NONBLOCKed stdin.
--
vda

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-10-01 18:16           ` Al Viro
  2007-10-01 18:49             ` Denys Vlasenko
@ 2007-10-01 18:53             ` Michael Tokarev
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Tokarev @ 2007-10-01 18:53 UTC (permalink / raw)
  To: Al Viro
  Cc: Denys Vlasenko, Davide Libenzi, Ulrich Drepper,
	Linux Kernel Mailing List, Andrew Morton

Al Viro wrote:
> On Mon, Oct 01, 2007 at 11:07:15AM +0100, Denys Vlasenko wrote:
>> Also attached is ndelaytest.c which can be used to test that
>> send(MSG_DONTWAIT) indeed is failing with EAGAIN if write would block
>> and that other processes never see O_NONBLOCK set.
>>
>> Comments?
> 
> Never send patches during or approaching hangover?
> 	* it's on a bunch of cyclic lists.  Have its neighbor
> go away while you are doing all that crap => boom
> 	* there's that thing call current position...  It gets buggered.
> 	* overwriting it while another task might be in the middle of
> syscall involving it => boom
> 	* non-cooperative tasks reading *in* *parallel* from the same
> opened file are going to have a lot more serious problems than agreeing
> on O_NONBLOCK anyway, so I really don't understand what the hell is that for.

Good summary... ;)

But for the last part of the last item - sometimes, definitely more than
once, I wondered why there's no equivalent to recv(MSG_DONTWAIT) for
non-sockets -- why for sockets it's as simple as adding an option (a
single bit), while for all the rest it requires two fcntl calls...
Sometimes it's handy. ;)

Not that I'm arguing for or against such a feature anyway..

/mjt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-10-01 18:49             ` Denys Vlasenko
@ 2007-10-01 19:04               ` Davide Libenzi
  2007-10-02  9:28                 ` Denys Vlasenko
  0 siblings, 1 reply; 15+ messages in thread
From: Davide Libenzi @ 2007-10-01 19:04 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Al Viro, Ulrich Drepper, Linux Kernel Mailing List, Andrew Morton

On Mon, 1 Oct 2007, Denys Vlasenko wrote:

> On Monday 01 October 2007 19:16, Al Viro wrote:
> > 	* it's on a bunch of cyclic lists.  Have its neighbor
> > go away while you are doing all that crap => boom
> > 	* there's that thing call current position...  It gets buggered.
> > 	* overwriting it while another task might be in the middle of
> > syscall involving it => boom
> 
> Hm, I suspected that it's herecy. Any idea how to do it cleanly?
> 
> > 	* non-cooperative tasks reading *in* *parallel* from the same
> > opened file are going to have a lot more serious problems than agreeing
> > on O_NONBLOCK anyway, so I really don't understand what the hell is that for.
> 
> They don't even need to read in parallel, just having shared fd is enough.
> Think about pipes, sockets and terminals. A real-world scenario:
> 
> * a process started from shell (interactive or shell script)
> * it sets O_NONBLOCK and does a read from fd 0...
> * it gets killed (kill -9, whatever)
> * shell suddenly has it's fd 0 in O_NONBLOCK mode
> * shell and all subsequent commands started from it unexpectedly have
>   O_NONBLOCKed stdin.

I told you how in the previous email. You cannot use the:

1) set O_NONBLOCK
2) read/write
3) unset O_NONBLOCK

in a racy-free fashion, w/out wrapping it with a lock (thing that we 
don't want to do).



PS: send/recv are socket functions, and you really don't want to overload 
    them for other fds.



- Davide



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-10-01 19:04               ` Davide Libenzi
@ 2007-10-02  9:28                 ` Denys Vlasenko
  2007-10-02 19:52                   ` Davide Libenzi
  0 siblings, 1 reply; 15+ messages in thread
From: Denys Vlasenko @ 2007-10-02  9:28 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Al Viro, Ulrich Drepper, Linux Kernel Mailing List, Andrew Morton

On Monday 01 October 2007 20:04, Davide Libenzi wrote:
> > They don't even need to read in parallel, just having shared fd is enough.
> > Think about pipes, sockets and terminals. A real-world scenario:
> > 
> > * a process started from shell (interactive or shell script)
> > * it sets O_NONBLOCK and does a read from fd 0...
> > * it gets killed (kill -9, whatever)
> > * shell suddenly has it's fd 0 in O_NONBLOCK mode
> > * shell and all subsequent commands started from it unexpectedly have
> >   O_NONBLOCKed stdin.
> 
> I told you how in the previous email. You cannot use the:
> 
> 1) set O_NONBLOCK
> 2) read/write
> 3) unset O_NONBLOCK
> 
> in a racy-free fashion, w/out wrapping it with a lock (thing that we 
> don't want to do).

I'm confused. I am saying exactly this same thing: that I cannot
do it atomically using standard unix operations, but I still need
to do a nonblocking read. Why are you explaining to me that it
cannot be done? I *know*. I'm asking what API should be
added/extended to make it possible.

I have following proposals:

* make recv(..., MSG_DONTWAIT) work on any fd

Sounds neat, but not trivial to implement in current kernel.

* new fcntl command F_DUPFL: fcntl(fd, F_DUPFL, n):
  Analogous to F_DUPFD, but gives you *unshared* copy of the fd.
  Further seeks, fcntl(fd, F_SETFL, O_NONBLOCK), etc won't affect
  any other process.

How hard would it be implement F_DUPFL in current kernel?
--
vda

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: F_DUPFD_CLOEXEC implementation
  2007-10-02  9:28                 ` Denys Vlasenko
@ 2007-10-02 19:52                   ` Davide Libenzi
  0 siblings, 0 replies; 15+ messages in thread
From: Davide Libenzi @ 2007-10-02 19:52 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Al Viro, Ulrich Drepper, Linux Kernel Mailing List, Andrew Morton

On Tue, 2 Oct 2007, Denys Vlasenko wrote:

> I have following proposals:
> 
> * make recv(..., MSG_DONTWAIT) work on any fd
> 
> Sounds neat, but not trivial to implement in current kernel.

This is mildly ugly, if you ask me. Those are socket functions, and the 
flags parameter contain some pretty specific network meanings.

> * new fcntl command F_DUPFL: fcntl(fd, F_DUPFL, n):
>   Analogous to F_DUPFD, but gives you *unshared* copy of the fd.
>   Further seeks, fcntl(fd, F_SETFL, O_NONBLOCK), etc won't affect
>   any other process.

You'll need an ad-hoc copy function though, since your memcpy-based one is 
gonna explode even before memcpy returns ;) You'll have problems with 
ref-counting too. And that layer is not designed to cleanly support that 
operation.
Unfortunately the "clean" solution would involve changing a whole bunch of 
code, and I don't feel exactly sure it'd be worth it.

- Davide

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-10-02 19:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-28 17:34 F_DUPFD_CLOEXEC implementation Ulrich Drepper
2007-09-28 18:19 ` Davide Libenzi
2007-09-28 18:23   ` Ulrich Drepper
2007-09-30  0:31 ` Denys Vlasenko
2007-09-30 23:11   ` Davide Libenzi
2007-09-30 23:58     ` Denys Vlasenko
2007-10-01  3:15       ` Davide Libenzi
2007-10-01 10:07         ` Denys Vlasenko
2007-10-01 18:16           ` Al Viro
2007-10-01 18:49             ` Denys Vlasenko
2007-10-01 19:04               ` Davide Libenzi
2007-10-02  9:28                 ` Denys Vlasenko
2007-10-02 19:52                   ` Davide Libenzi
2007-10-01 18:53             ` Michael Tokarev
2007-10-01  0:59   ` Miquel van Smoorenburg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox