linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC] F_SETFL/Fasync BKL removal, now without unsightly global locks
@ 2009-02-02 18:20 Jonathan Corbet
       [not found] ` <1233598811-6871-1-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Jonathan Corbet @ 2009-02-02 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andi Kleen, Oleg Nesterov, Andrew Morton, Al Viro, Davide Libenzi,
	David Miller, Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Matt Mackall, Alan Cox

Here is yet another attempt to remove the BKL from the handling of changes
to filp->f_flags and FASYNC changes.  This approach is a bit more complex,
and has been split into four patches.  The end-result is the removal of the
BKL and some excess code.

1) Use bitops for f_flags: turn f_flags into an unsigned long and
   manipulate its contents with atomic bitops.  New NR_* macros have been
   defined for the bits which are actually changed after open/create time.

2) Use a bitlock in epoll: use one of the f_flags bit to replace the
   ep_lock spinlock.  This (more than) mitigates the growth in struct file
   created by part 1).  The bitop will be a bit slower, but it should not
   be a performance-critical part of the epoll implementation.  (Ideal
   courtesy of Matt Mackall).

3) Move FASYNC bit handling into ->fasync(): this is how changes to that
   bit and calls to f_op->fasync() are kept atomic in the absence of the
   BKL.  Almost every FASYNC implementation uses fasync_helper(), so the
   actual bit manipulation is done there.  The one exception (for sockets)
   has been fixed up.  At this point, there is no more BKL in these paths.
   (Idea-suggested-by: Andi Kleen).

4) Rationalize fasync_helper() return value handling.  Some drivers mapped
   positive return values from fasync_helper() onto zero, most others did
   not bother.  This optional cleanup patch moves that mapping into the VFS
   code, making things consistent and enabling the removal of some 50 lines
   of code.  It's worth noting that this is technically an ABI change: some
   drivers which returned positive values from fcntl(..., F_SETFL, ...)
   will now return zero instead.  I sure *hope* that wouldn't break
   anything... 

To be extra-sure that I wasn't breaking anything subtle, I went and figured
out how to run the Linux Test Project suite.  Pretty amazing stuff - I've
never run a program which would put my system's load average at 7400
before.  If there are any regressions here, LTP did not find them.

As always, comments welcome.

jon


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] Use bit operations for file->f_flags
       [not found] ` <1233598811-6871-1-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
@ 2009-02-02 18:20   ` Jonathan Corbet
       [not found]     ` <1233598811-6871-2-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
  2009-02-02 18:20   ` [PATCH 2/4] Convert epoll to a bitlock Jonathan Corbet
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Jonathan Corbet @ 2009-02-02 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andi Kleen, Oleg Nesterov, Andrew Morton, Al Viro, Davide Libenzi,
	David Miller, Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Matt Mackall, Alan Cox, Jonathan Corbet

Turn struct file->f_flags into an unsigned long and use bitops to change
its contents.  This allows the safe manipulation of these flags without the
need for locks.  BKL use is removed where possible, but manipulations of
the FASYNC bit still require it (that will be fixed in a later patch).

Signed-off-by: Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>
---
 arch/alpha/include/asm/fcntl.h    |    5 +++++
 arch/arm/include/asm/fcntl.h      |    1 +
 arch/blackfin/include/asm/fcntl.h |    1 +
 arch/h8300/include/asm/fcntl.h    |    1 +
 arch/m68k/include/asm/fcntl.h     |    1 +
 arch/mips/include/asm/fcntl.h     |    5 +++++
 arch/parisc/include/asm/fcntl.h   |    4 ++++
 arch/powerpc/include/asm/fcntl.h  |    1 +
 arch/sparc/include/asm/fcntl.h    |    6 ++++++
 drivers/char/tty_io.c             |    7 ++-----
 drivers/usb/gadget/file_storage.c |    6 ++++--
 fs/fcntl.c                        |   20 ++++++++++++++++++--
 fs/ioctl.c                        |   21 ++++++++++-----------
 fs/nfsd/vfs.c                     |    3 ++-
 include/asm-generic/fcntl.h       |    7 +++++++
 include/linux/fs.h                |    7 ++++++-
 ipc/mqueue.c                      |    5 +++--
 sound/core/oss/pcm_oss.c          |    3 ++-
 sound/oss/au1550_ac97.c           |    2 +-
 sound/oss/audio.c                 |    3 ++-
 sound/oss/sh_dac_audio.c          |    4 +++-
 sound/oss/swarm_cs4297a.c         |    2 +-
 sound/oss/vwsnd.c                 |    3 ++-
 23 files changed, 88 insertions(+), 30 deletions(-)

diff --git a/arch/alpha/include/asm/fcntl.h b/arch/alpha/include/asm/fcntl.h
index 25da001..de7172f 100644
--- a/arch/alpha/include/asm/fcntl.h
+++ b/arch/alpha/include/asm/fcntl.h
@@ -9,13 +9,18 @@
 #define O_NOCTTY	010000	/* not fcntl */
 
 #define O_NONBLOCK	 00004
+#define NR_O_NONBLOCK	 2
 #define O_APPEND	 00010
+#define NR_O_APPEND	 3
 #define O_SYNC		040000
+#define NR_O_SYNC	14
 #define O_DIRECTORY	0100000	/* must be a directory */
 #define O_NOFOLLOW	0200000 /* don't follow links */
 #define O_LARGEFILE	0400000 /* will be set by the kernel on every open */
 #define O_DIRECT	02000000 /* direct disk access - should check with OSF/1 */
+#define NR_O_DIRECT	19
 #define O_NOATIME	04000000
+#define NR_O_NOATIME	20
 #define O_CLOEXEC	010000000 /* set close_on_exec */
 
 #define F_GETLK		7
diff --git a/arch/arm/include/asm/fcntl.h b/arch/arm/include/asm/fcntl.h
index a80b660..e74b8d1 100644
--- a/arch/arm/include/asm/fcntl.h
+++ b/arch/arm/include/asm/fcntl.h
@@ -4,6 +4,7 @@
 #define O_DIRECTORY	 040000	/* must be a directory */
 #define O_NOFOLLOW	0100000	/* don't follow links */
 #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
+#define NR_O_DIRECT	16
 #define O_LARGEFILE	0400000
 
 #include <asm-generic/fcntl.h>
diff --git a/arch/blackfin/include/asm/fcntl.h b/arch/blackfin/include/asm/fcntl.h
index 9c40371..fe4c643 100644
--- a/arch/blackfin/include/asm/fcntl.h
+++ b/arch/blackfin/include/asm/fcntl.h
@@ -6,6 +6,7 @@
 #define O_DIRECTORY	 040000	/* must be a directory */
 #define O_NOFOLLOW	0100000	/* don't follow links */
 #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
+#define NR_O_DIRECT	16
 #define O_LARGEFILE	0400000
 
 #include <asm-generic/fcntl.h>
diff --git a/arch/h8300/include/asm/fcntl.h b/arch/h8300/include/asm/fcntl.h
index 1952cb2..c1cfbdd 100644
--- a/arch/h8300/include/asm/fcntl.h
+++ b/arch/h8300/include/asm/fcntl.h
@@ -4,6 +4,7 @@
 #define O_DIRECTORY	040000	/* must be a directory */
 #define O_NOFOLLOW	0100000	/* don't follow links */
 #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
+#define NR_O_DIRECT	16
 #define O_LARGEFILE	0400000
 
 #include <asm-generic/fcntl.h>
diff --git a/arch/m68k/include/asm/fcntl.h b/arch/m68k/include/asm/fcntl.h
index 1c369b2..a199f04 100644
--- a/arch/m68k/include/asm/fcntl.h
+++ b/arch/m68k/include/asm/fcntl.h
@@ -4,6 +4,7 @@
 #define O_DIRECTORY	040000	/* must be a directory */
 #define O_NOFOLLOW	0100000	/* don't follow links */
 #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
+#define NR_O_DIRECT	16
 #define O_LARGEFILE	0400000
 
 #include <asm-generic/fcntl.h>
diff --git a/arch/mips/include/asm/fcntl.h b/arch/mips/include/asm/fcntl.h
index 2a52333..6bbcbf9 100644
--- a/arch/mips/include/asm/fcntl.h
+++ b/arch/mips/include/asm/fcntl.h
@@ -10,15 +10,20 @@
 
 
 #define O_APPEND	0x0008
+#define NR_O_APPEND	3
 #define O_SYNC		0x0010
+#define NR_O_SYNC	4
 #define O_NONBLOCK	0x0080
+#define NR_O_NONBLOCK	7
 #define O_CREAT         0x0100	/* not fcntl */
 #define O_TRUNC		0x0200	/* not fcntl */
 #define O_EXCL		0x0400	/* not fcntl */
 #define O_NOCTTY	0x0800	/* not fcntl */
 #define FASYNC		0x1000	/* fcntl, for BSD compatibility */
+#define NR_FASYNC	12
 #define O_LARGEFILE	0x2000	/* allow large file opens */
 #define O_DIRECT	0x8000	/* direct disk access hint */
+#define NR_O_DIRECT	15
 
 #define F_GETLK		14
 #define F_SETLK		6
diff --git a/arch/parisc/include/asm/fcntl.h b/arch/parisc/include/asm/fcntl.h
index 1e1c824..7d33fe7 100644
--- a/arch/parisc/include/asm/fcntl.h
+++ b/arch/parisc/include/asm/fcntl.h
@@ -4,12 +4,16 @@
 /* open/fcntl - O_SYNC is only implemented on blocks devices and on files
    located on an ext2 file system */
 #define O_APPEND	000000010
+#define NR_O_APPEND	3
 #define O_BLKSEEK	000000100 /* HPUX only */
 #define O_CREAT		000000400 /* not fcntl */
 #define O_EXCL		000002000 /* not fcntl */
 #define O_LARGEFILE	000004000
 #define O_SYNC		000100000
+#define NR_O_SYNC	15
 #define O_NONBLOCK	000200004 /* HPUX has separate NDELAY & NONBLOCK */
+#define NR_O_NONBLOCK	16	  /* WTF, assign one to each... */
+#define NR_O_NDELAY	 3
 #define O_NOCTTY	000400000 /* not fcntl */
 #define O_DSYNC		001000000 /* HPUX only */
 #define O_RSYNC		002000000 /* HPUX only */
diff --git a/arch/powerpc/include/asm/fcntl.h b/arch/powerpc/include/asm/fcntl.h
index ce5c451..2453257 100644
--- a/arch/powerpc/include/asm/fcntl.h
+++ b/arch/powerpc/include/asm/fcntl.h
@@ -5,6 +5,7 @@
 #define O_NOFOLLOW      0100000	/* don't follow links */
 #define O_LARGEFILE     0200000
 #define O_DIRECT	0400000	/* direct disk access hint */
+#define NR_O_DIRECT	17
 
 #include <asm-generic/fcntl.h>
 
diff --git a/arch/sparc/include/asm/fcntl.h b/arch/sparc/include/asm/fcntl.h
index d4d9c9d..ffc0268 100644
--- a/arch/sparc/include/asm/fcntl.h
+++ b/arch/sparc/include/asm/fcntl.h
@@ -4,12 +4,16 @@
 /* open/fcntl - O_SYNC is only implemented on blocks devices and on files
    located on an ext2 file system */
 #define O_APPEND	0x0008
+#define NR_O_APPEND	3
 #define FASYNC		0x0040	/* fcntl, for BSD compatibility */
+#define NR_FASYNC	6
 #define O_CREAT		0x0200	/* not fcntl */
 #define O_TRUNC		0x0400	/* not fcntl */
 #define O_EXCL		0x0800	/* not fcntl */
 #define O_SYNC		0x2000
+#define NR_O_SYNC	13
 #define O_NONBLOCK	0x4000
+#define NR_O_NONBLOCK	14
 #if defined(__sparc__) && defined(__arch64__)
 #define O_NDELAY	0x0004
 #else
@@ -18,7 +22,9 @@
 #define O_NOCTTY	0x8000	/* not fcntl */
 #define O_LARGEFILE	0x40000
 #define O_DIRECT        0x100000 /* direct disk access hint */
+#define NR_O_DIRECT	20
 #define O_NOATIME	0x200000
+#define NR_O_NOATIME	21
 #define O_CLOEXEC	0x400000
 
 #define F_GETOWN	5	/*  for sockets. */
diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index bc84e12..ad8d884 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -2162,13 +2162,10 @@ static int fionbio(struct file *file, int __user *p)
 	if (get_user(nonblock, p))
 		return -EFAULT;
 
-	/* file->f_flags is still BKL protected in the fs layer - vomit */
-	lock_kernel();
 	if (nonblock)
-		file->f_flags |= O_NONBLOCK;
+		set_bit(NR_O_NONBLOCK, &file->f_flags);
 	else
-		file->f_flags &= ~O_NONBLOCK;
-	unlock_kernel();
+		clear_bit(NR_O_NONBLOCK, &file->f_flags);
 	return 0;
 }
 
diff --git a/drivers/usb/gadget/file_storage.c b/drivers/usb/gadget/file_storage.c
index b10fa31..ac99b58 100644
--- a/drivers/usb/gadget/file_storage.c
+++ b/drivers/usb/gadget/file_storage.c
@@ -247,6 +247,7 @@
 #include <linux/string.h>
 #include <linux/freezer.h>
 #include <linux/utsname.h>
+#include <linux/bitops.h>
 
 #include <linux/usb/ch9.h>
 #include <linux/usb/gadget.h>
@@ -1711,7 +1712,8 @@ static int do_write(struct fsg_dev *fsg)
 		curlun->sense_data = SS_WRITE_PROTECTED;
 		return -EINVAL;
 	}
-	curlun->filp->f_flags &= ~O_SYNC;	// Default is not to wait
+	/* Default is not to wait */
+	clear_bit(NR_O_SYNC, &curlun->filp->f_flags);
 
 	/* Get the starting Logical Block Address and check that it's
 	 * not too big */
@@ -1729,7 +1731,7 @@ static int do_write(struct fsg_dev *fsg)
 			return -EINVAL;
 		}
 		if (fsg->cmnd[1] & 0x08)	// FUA
-			curlun->filp->f_flags |= O_SYNC;
+			set_bit(NR_O_SYNC, &curlun->filp->f_flags);
 	}
 	if (lba >= curlun->num_sectors) {
 		curlun->sense_data = SS_LOGICAL_BLOCK_ADDRESS_OUT_OF_RANGE;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index bd215cc..3affd3e 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -20,6 +20,7 @@
 #include <linux/rcupdate.h>
 #include <linux/pid_namespace.h>
 #include <linux/smp_lock.h>
+#include <linux/bitops.h>
 
 #include <asm/poll.h>
 #include <asm/siginfo.h>
@@ -141,7 +142,14 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
 	return ret;
 }
 
-#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | FASYNC | O_DIRECT | O_NOATIME)
+static void tweak_flags_bit(int nr, int on, unsigned long *flags)
+{
+	if (on)
+		set_bit(nr, flags);
+	else
+		clear_bit(nr, flags);
+}
+
 
 static int setfl(int fd, struct file * filp, unsigned long arg)
 {
@@ -187,9 +195,17 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 			if (error < 0)
 				goto out;
 		}
+		change_bit(NR_FASYNC, &filp->f_flags);
 	}
 
-	filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK);
+	/*
+	 * Now that we use bitops we have to tweak each bit individually.
+	 */
+	tweak_flags_bit(NR_O_APPEND, arg & O_APPEND, &filp->f_flags);
+	tweak_flags_bit(NR_O_NONBLOCK, arg & O_NONBLOCK, &filp->f_flags);
+	tweak_flags_bit(NR_O_NDELAY, arg & O_NDELAY, &filp->f_flags);
+	tweak_flags_bit(NR_O_DIRECT, arg & O_DIRECT, &filp->f_flags);
+	tweak_flags_bit(NR_O_NOATIME, arg & O_NOATIME, &filp->f_flags);
  out:
 	unlock_kernel();
 	return error;
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 240ec63..fc0db36 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -392,22 +392,24 @@ static int file_ioctl(struct file *filp, unsigned int cmd,
 
 static int ioctl_fionbio(struct file *filp, int __user *argp)
 {
-	unsigned int flag;
 	int on, error;
 
 	error = get_user(on, argp);
 	if (error)
 		return error;
-	flag = O_NONBLOCK;
 #ifdef __sparc__
 	/* SunOS compatibility item. */
-	if (O_NONBLOCK != O_NDELAY)
-		flag |= O_NDELAY;
+	if (O_NONBLOCK != O_NDELAY) {
+		if (on)
+			set_bit(NR_O_NDELAY, &filp->f_flags);
+		else
+			clear_bit(NR_O_NDELAY, &filp->f_flags);
+	}
 #endif
 	if (on)
-		filp->f_flags |= flag;
+		set_bit(NR_O_NONBLOCK, &filp->f_flags);
 	else
-		filp->f_flags &= ~flag;
+		clear_bit(NR_O_NONBLOCK, &filp->f_flags);
 	return error;
 }
 
@@ -433,9 +435,9 @@ static int ioctl_fioasync(unsigned int fd, struct file *filp,
 		return error;
 
 	if (on)
-		filp->f_flags |= FASYNC;
+		set_bit(NR_FASYNC, &filp->f_flags);
 	else
-		filp->f_flags &= ~FASYNC;
+		clear_bit(NR_FASYNC, &filp->f_flags);
 	return error;
 }
 
@@ -499,10 +501,7 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
 		break;
 
 	case FIONBIO:
-		/* BKL needed to avoid races tweaking f_flags */
-		lock_kernel();
 		error = ioctl_fionbio(filp, argp);
-		unlock_kernel();
 		break;
 
 	case FIOASYNC:
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6e50aaa..1630a9e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -55,6 +55,7 @@
 #include <linux/security.h>
 #endif /* CONFIG_NFSD_V4 */
 #include <linux/jhash.h>
+#include <linux/bitops.h>
 
 #include <asm/uaccess.h>
 
@@ -999,7 +1000,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 	if (!EX_ISSYNC(exp))
 		stable = 0;
 	if (stable && !EX_WGATHER(exp))
-		file->f_flags |= O_SYNC;
+		set_bit(NR_O_SYNC, &file->f_flags);
 
 	/* Write the data. */
 	oldfs = get_fs(); set_fs(KERNEL_DS);
diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index b847741..41e9f88 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -23,18 +23,23 @@
 #endif
 #ifndef O_APPEND
 #define O_APPEND	00002000
+#define NR_O_APPEND	10
 #endif
 #ifndef O_NONBLOCK
 #define O_NONBLOCK	00004000
+#define NR_O_NONBLOCK	11
 #endif
 #ifndef O_SYNC
 #define O_SYNC		00010000
+#define NR_O_SYNC	12
 #endif
 #ifndef FASYNC
 #define FASYNC		00020000	/* fcntl, for BSD compatibility */
+#define NR_FASYNC	13
 #endif
 #ifndef O_DIRECT
 #define O_DIRECT	00040000	/* direct disk access hint */
+#define NR_O_DIRECT	14
 #endif
 #ifndef O_LARGEFILE
 #define O_LARGEFILE	00100000
@@ -47,12 +52,14 @@
 #endif
 #ifndef O_NOATIME
 #define O_NOATIME	01000000
+#define NR_O_NOATIME	18
 #endif
 #ifndef O_CLOEXEC
 #define O_CLOEXEC	02000000	/* set close_on_exec */
 #endif
 #ifndef O_NDELAY
 #define O_NDELAY	O_NONBLOCK
+#define NR_O_NDELAY	NR_O_NONBLOCK
 #endif
 
 #define F_DUPFD		0	/* dup */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6022f44..9d7cb0e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -829,6 +829,11 @@ static inline int ra_has_index(struct file_ra_state *ra, pgoff_t index)
 #define FILE_MNT_WRITE_TAKEN	1
 #define FILE_MNT_WRITE_RELEASED	2
 
+/*
+ * Locking: f_flags needs to be manipulated with atomic bitops.  The
+ * 	    one exception is when it's known that there are no other
+ *	    references - in the open/create path, primarily.
+ */
 struct file {
 	/*
 	 * fu_list becomes invalid after file_free is called and queued via
@@ -843,7 +848,7 @@ struct file {
 #define f_vfsmnt	f_path.mnt
 	const struct file_operations	*f_op;
 	atomic_long_t		f_count;
-	unsigned int 		f_flags;
+	unsigned long 		f_flags;
 	fmode_t			f_mode;
 	loff_t			f_pos;
 	struct fown_struct	f_owner;
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 54b4077..1be0a62 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -31,6 +31,7 @@
 #include <linux/mutex.h>
 #include <linux/nsproxy.h>
 #include <linux/pid.h>
+#include <linux/bitops.h>
 
 #include <net/sock.h>
 #include "util.h"
@@ -1157,9 +1158,9 @@ SYSCALL_DEFINE3(mq_getsetattr, mqd_t, mqdes,
 	if (u_mqstat) {
 		audit_mq_getsetattr(mqdes, &mqstat);
 		if (mqstat.mq_flags & O_NONBLOCK)
-			filp->f_flags |= O_NONBLOCK;
+			set_bit(NR_O_NONBLOCK, &filp->f_flags);
 		else
-			filp->f_flags &= ~O_NONBLOCK;
+			clear_bit(NR_O_NONBLOCK, &filp->f_flags);
 
 		inode->i_atime = inode->i_ctime = CURRENT_TIME;
 	}
diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c
index e178366..746e38e 100644
--- a/sound/core/oss/pcm_oss.c
+++ b/sound/core/oss/pcm_oss.c
@@ -26,6 +26,7 @@
 #define OSS_DEBUG
 #endif
 
+#include <linux/bitops.h>
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/time.h>
@@ -1895,7 +1896,7 @@ static int snd_pcm_oss_set_fragment(struct snd_pcm_oss_file *pcm_oss_file, unsig
 
 static int snd_pcm_oss_nonblock(struct file * file)
 {
-	file->f_flags |= O_NONBLOCK;
+	set_bit(NR_O_NONBLOCK, &file->f_flags);
 	return 0;
 }
 
diff --git a/sound/oss/au1550_ac97.c b/sound/oss/au1550_ac97.c
index 81e1f44..87305cc 100644
--- a/sound/oss/au1550_ac97.c
+++ b/sound/oss/au1550_ac97.c
@@ -1627,7 +1627,7 @@ au1550_ioctl(struct inode *inode, struct file *file, unsigned int cmd,
 				    sizeof(abinfo)) ? -EFAULT : 0;
 
 	case SNDCTL_DSP_NONBLOCK:
-		file->f_flags |= O_NONBLOCK;
+		set_bit(NR_O_NONBLOCK, &file->f_flags);
 		return 0;
 
 	case SNDCTL_DSP_GETODELAY:
diff --git a/sound/oss/audio.c b/sound/oss/audio.c
index 89bd27a..0837b06 100644
--- a/sound/oss/audio.c
+++ b/sound/oss/audio.c
@@ -29,6 +29,7 @@
 #include <linux/stddef.h>
 #include <linux/string.h>
 #include <linux/kmod.h>
+#include <linux/bitops.h>
 
 #include "sound_config.h"
 #include "ulaw.h"
@@ -433,7 +434,7 @@ int audio_ioctl(int dev, struct file *file, unsigned int cmd, void __user *arg)
 			return dma_ioctl(dev, cmd, arg);
 		
 		case SNDCTL_DSP_NONBLOCK:
-			file->f_flags |= O_NONBLOCK;
+			set_bit(NR_O_NONBLOCK, &file->f_flags);
 			return 0;
 
 		case SNDCTL_DSP_GETCAPS:
diff --git a/sound/oss/sh_dac_audio.c b/sound/oss/sh_dac_audio.c
index e5d4239..532f4ac 100644
--- a/sound/oss/sh_dac_audio.c
+++ b/sound/oss/sh_dac_audio.c
@@ -18,6 +18,8 @@
 #include <linux/sound.h>
 #include <linux/soundcard.h>
 #include <linux/interrupt.h>
+#include <linux/bitops.h>
+
 #include <asm/io.h>
 #include <asm/uaccess.h>
 #include <asm/irq.h>
@@ -135,7 +137,7 @@ static int dac_audio_ioctl(struct inode *inode, struct file *file,
 		return put_user(AFMT_U8, (int *)arg);
 
 	case SNDCTL_DSP_NONBLOCK:
-		file->f_flags |= O_NONBLOCK;
+		set_bit(NR_O_NONBLOCK, &file->f_flags);
 		return 0;
 
 	case SNDCTL_DSP_GETCAPS:
diff --git a/sound/oss/swarm_cs4297a.c b/sound/oss/swarm_cs4297a.c
index 41562ec..fdd9715 100644
--- a/sound/oss/swarm_cs4297a.c
+++ b/sound/oss/swarm_cs4297a.c
@@ -2200,7 +2200,7 @@ static int cs4297a_ioctl(struct inode *inode, struct file *file,
 				    sizeof(abinfo)) ? -EFAULT : 0;
 
 	case SNDCTL_DSP_NONBLOCK:
-		file->f_flags |= O_NONBLOCK;
+		set_bit(NR_O_NONBLOCK, &file->f_flags);
 		return 0;
 
 	case SNDCTL_DSP_GETODELAY:
diff --git a/sound/oss/vwsnd.c b/sound/oss/vwsnd.c
index 78b8acc..72b9a65 100644
--- a/sound/oss/vwsnd.c
+++ b/sound/oss/vwsnd.c
@@ -149,6 +149,7 @@
 #include <linux/wait.h>
 #include <linux/interrupt.h>
 #include <linux/mutex.h>
+#include <linux/bitops.h>
 
 #include <asm/visws/cobalt.h>
 
@@ -2673,7 +2674,7 @@ static int vwsnd_audio_do_ioctl(struct inode *inode,
 
 	case SNDCTL_DSP_NONBLOCK:	/* _SIO  ('P',14) */
 		DBGX("SNDCTL_DSP_NONBLOCK\n");
-		file->f_flags |= O_NONBLOCK;
+		set_bit(NR_O_NONBLOCK, &file->f_flags);
 		return 0;
 
 	case SNDCTL_DSP_RESET:		/* _SIO  ('P', 0) */
-- 
1.6.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] Convert epoll to a bitlock
       [not found] ` <1233598811-6871-1-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
  2009-02-02 18:20   ` [PATCH 1/4] Use bit operations for file->f_flags Jonathan Corbet
@ 2009-02-02 18:20   ` Jonathan Corbet
       [not found]     ` <1233598811-6871-3-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
  2009-02-02 18:20   ` [PATCH 3/4] Move FASYNC bit handling to f_op->fasync() Jonathan Corbet
  2009-02-02 18:20   ` [PATCH 4/4] Rationalize fasync return values Jonathan Corbet
  3 siblings, 1 reply; 24+ messages in thread
From: Jonathan Corbet @ 2009-02-02 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andi Kleen, Oleg Nesterov, Andrew Morton, Al Viro, Davide Libenzi,
	David Miller, Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Matt Mackall, Alan Cox, Jonathan Corbet

Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way of
saving space in struct file.  This patch makes that change.

Signed-off-by: Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>
---
 fs/eventpoll.c              |   13 ++++++++-----
 fs/fcntl.c                  |    2 +-
 include/asm-generic/fcntl.h |    4 ++++
 include/linux/eventpoll.h   |    2 +-
 include/linux/fs.h          |    1 -
 5 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ba2f9ec..95fae01 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -25,6 +25,7 @@
 #include <linux/list.h>
 #include <linux/hash.h>
 #include <linux/spinlock.h>
+#include <linux/bit_spinlock.h>
 #include <linux/syscalls.h>
 #include <linux/rbtree.h>
 #include <linux/wait.h>
@@ -427,10 +428,10 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 	ep_unregister_pollwait(ep, epi);
 
 	/* Remove the current item from the list of epoll hooks */
-	spin_lock(&file->f_ep_lock);
+	bit_spin_lock(NR_EPOLL_FILE_LOCK, &file->f_flags);
 	if (ep_is_linked(&epi->fllink))
 		list_del_init(&epi->fllink);
-	spin_unlock(&file->f_ep_lock);
+	bit_spin_unlock(NR_EPOLL_FILE_LOCK, &file->f_flags);
 
 	rb_erase(&epi->rbn, &ep->rbr);
 
@@ -549,7 +550,7 @@ void eventpoll_release_file(struct file *file)
 	struct epitem *epi;
 
 	/*
-	 * We don't want to get "file->f_ep_lock" because it is not
+	 * We don't want to get EPOLL_FILE_LOCK because it is not
 	 * necessary. It is not necessary because we're in the "struct file"
 	 * cleanup path, and this means that noone is using this file anymore.
 	 * So, for example, epoll_ctl() cannot hit here sicne if we reach this
@@ -558,6 +559,8 @@ void eventpoll_release_file(struct file *file)
 	 * will correctly serialize the operation. We do need to acquire
 	 * "ep->mtx" after "epmutex" because ep_remove() requires it when called
 	 * from anywhere but ep_free().
+	 *
+	 * Besides, ep_remove() takes the lock.
 	 */
 	mutex_lock(&epmutex);
 
@@ -800,9 +803,9 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 		goto error_unregister;
 
 	/* Add the current item to the list of active epoll hook for this file */
-	spin_lock(&tfile->f_ep_lock);
+	bit_spin_lock(NR_EPOLL_FILE_LOCK, &tfile->f_flags);
 	list_add_tail(&epi->fllink, &tfile->f_ep_links);
-	spin_unlock(&tfile->f_ep_lock);
+	bit_spin_unlock(NR_EPOLL_FILE_LOCK, &tfile->f_flags);
 
 	/*
 	 * Add the current item to the RB tree. All RB tree operations are
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 3affd3e..756119e 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -299,7 +299,7 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
 		set_close_on_exec(fd, arg & FD_CLOEXEC);
 		break;
 	case F_GETFL:
-		err = filp->f_flags;
+		err = filp->f_flags & ~EPOLL_FILE_LOCK;
 		break;
 	case F_SETFL:
 		err = setfl(fd, filp, arg);
diff --git a/include/asm-generic/fcntl.h b/include/asm-generic/fcntl.h
index 41e9f88..d56a513 100644
--- a/include/asm-generic/fcntl.h
+++ b/include/asm-generic/fcntl.h
@@ -62,6 +62,10 @@
 #define NR_O_NDELAY	NR_O_NONBLOCK
 #endif
 
+/* Use bit 31 for epoll locking */
+#define NR_EPOLL_FILE_LOCK 31
+#define EPOLL_FILE_LOCK (1 << NR_EPOLL_FILE_LOCK)
+
 #define F_DUPFD		0	/* dup */
 #define F_GETFD		1	/* get close_on_exec */
 #define F_SETFD		2	/* set/clear close_on_exec */
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index f1e1d3c..7b5c58a 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -61,7 +61,7 @@ struct file;
 static inline void eventpoll_init_file(struct file *file)
 {
 	INIT_LIST_HEAD(&file->f_ep_links);
-	spin_lock_init(&file->f_ep_lock);
+	clear_bit(NR_EPOLL_FILE_LOCK, &file->f_flags);
 }
 
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9d7cb0e..ecd9a5b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -865,7 +865,6 @@ struct file {
 #ifdef CONFIG_EPOLL
 	/* Used by fs/eventpoll.c to link all the hooks to this file */
 	struct list_head	f_ep_links;
-	spinlock_t		f_ep_lock;
 #endif /* #ifdef CONFIG_EPOLL */
 	struct address_space	*f_mapping;
 #ifdef CONFIG_DEBUG_WRITECOUNT
-- 
1.6.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] Move FASYNC bit handling to f_op->fasync()
       [not found] ` <1233598811-6871-1-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
  2009-02-02 18:20   ` [PATCH 1/4] Use bit operations for file->f_flags Jonathan Corbet
  2009-02-02 18:20   ` [PATCH 2/4] Convert epoll to a bitlock Jonathan Corbet
@ 2009-02-02 18:20   ` Jonathan Corbet
  2009-02-02 18:20   ` [PATCH 4/4] Rationalize fasync return values Jonathan Corbet
  3 siblings, 0 replies; 24+ messages in thread
From: Jonathan Corbet @ 2009-02-02 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andi Kleen, Oleg Nesterov, Andrew Morton, Al Viro, Davide Libenzi,
	David Miller, Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Matt Mackall, Alan Cox, Jonathan Corbet

Removing the BKL from FASYNC handling ran into the challenge of keeping the
setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
the underlying fasync() function.  Andi Kleen suggested moving the handling
of that bit into fasync(); this patch does exactly that.

As it happens, every fasync() implementation in the kernel with one
exception calls fasync_helper().  So, if we make fasync_helper() set the
FASYNC bit, we can avoid making any changes to the other fasync()
functions - as long as those functions, themselves, have proper locking.
Most fasync() implementations do nothing but call fasync_helper() - which
has its own lock - so they are easily verified as correct.  The BKL had
already been pushed down into the few exceptions.

The networking code has its own version of fasync_helper(), so that code
has been augmented with explicit FASYNC bit handling.

Signed-off-by: Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>
---
 fs/fcntl.c   |   21 ++++++++-------------
 fs/ioctl.c   |   11 +----------
 net/socket.c |    2 ++
 3 files changed, 11 insertions(+), 23 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 756119e..7b641d9 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -185,19 +185,14 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 		return error;
 
 	/*
-	 * We still need a lock here for now to keep multiple FASYNC calls
-	 * from racing with each other.
+	 * ->fasync() is responsible for setting the FASYNC bit.
 	 */
-	lock_kernel();
-	if ((arg ^ filp->f_flags) & FASYNC) {
-		if (filp->f_op && filp->f_op->fasync) {
-			error = filp->f_op->fasync(fd, filp, (arg & FASYNC) != 0);
-			if (error < 0)
-				goto out;
-		}
-		change_bit(NR_FASYNC, &filp->f_flags);
+	if (((arg ^ filp->f_flags) & FASYNC) && filp->f_op &&
+			filp->f_op->fasync) {
+		error = filp->f_op->fasync(fd, filp, (arg & FASYNC) != 0);
+		if (error < 0)
+			goto out;
 	}
-
 	/*
 	 * Now that we use bitops we have to tweak each bit individually.
 	 */
@@ -207,7 +202,6 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 	tweak_flags_bit(NR_O_DIRECT, arg & O_DIRECT, &filp->f_flags);
 	tweak_flags_bit(NR_O_NOATIME, arg & O_NOATIME, &filp->f_flags);
  out:
-	unlock_kernel();
 	return error;
 }
 
@@ -532,7 +526,7 @@ static DEFINE_RWLOCK(fasync_lock);
 static struct kmem_cache *fasync_cache __read_mostly;
 
 /*
- * fasync_helper() is used by some character device drivers (mainly mice)
+ * fasync_helper() is used by almost all character device drivers
  * to set up the fasync queue. It returns negative on error, 0 if it did
  * no changes and positive if it added/deleted the entry.
  */
@@ -548,6 +542,7 @@ int fasync_helper(int fd, struct file * filp, int on, struct fasync_struct **fap
 			return -ENOMEM;
 	}
 	write_lock_irq(&fasync_lock);
+	tweak_flags_bit(NR_FASYNC, on, &filp->f_flags);
 	for (fp = fapp; (fa = *fp) != NULL; fp = &fa->fa_next) {
 		if (fa->fa_file == filp) {
 			if(on) {
diff --git a/fs/ioctl.c b/fs/ioctl.c
index fc0db36..bc32cb2 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -427,17 +427,11 @@ static int ioctl_fioasync(unsigned int fd, struct file *filp,
 	/* Did FASYNC state change ? */
 	if ((flag ^ filp->f_flags) & FASYNC) {
 		if (filp->f_op && filp->f_op->fasync)
+			/* fasync() adjusts filp->f_flags */
 			error = filp->f_op->fasync(fd, filp, on);
 		else
 			error = -ENOTTY;
 	}
-	if (error)
-		return error;
-
-	if (on)
-		set_bit(NR_FASYNC, &filp->f_flags);
-	else
-		clear_bit(NR_FASYNC, &filp->f_flags);
 	return error;
 }
 
@@ -505,10 +499,7 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
 		break;
 
 	case FIOASYNC:
-		/* BKL needed to avoid races tweaking f_flags */
-		lock_kernel();
 		error = ioctl_fioasync(fd, filp, argp);
-		unlock_kernel();
 		break;
 
 	case FIOQSIZE:
diff --git a/net/socket.c b/net/socket.c
index 35dd737..2b1e380 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1037,6 +1037,7 @@ static int sock_fasync(int fd, struct file *filp, int on)
 			break;
 
 	if (on) {
+		set_bit(NR_FASYNC, &filp->f_flags);
 		if (fa != NULL) {
 			write_lock_bh(&sk->sk_callback_lock);
 			fa->fa_fd = fd;
@@ -1053,6 +1054,7 @@ static int sock_fasync(int fd, struct file *filp, int on)
 		sock->fasync_list = fna;
 		write_unlock_bh(&sk->sk_callback_lock);
 	} else {
+		clear_bit(NR_FASYNC, &filp->f_flags);
 		if (fa != NULL) {
 			write_lock_bh(&sk->sk_callback_lock);
 			*prev = fa->fa_next;
-- 
1.6.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] Rationalize fasync return values
       [not found] ` <1233598811-6871-1-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
                     ` (2 preceding siblings ...)
  2009-02-02 18:20   ` [PATCH 3/4] Move FASYNC bit handling to f_op->fasync() Jonathan Corbet
@ 2009-02-02 18:20   ` Jonathan Corbet
  3 siblings, 0 replies; 24+ messages in thread
From: Jonathan Corbet @ 2009-02-02 18:20 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andi Kleen, Oleg Nesterov, Andrew Morton, Al Viro, Davide Libenzi,
	David Miller, Christoph Hellwig, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Matt Mackall, Alan Cox, Jonathan Corbet

Most fasync implementations do something like:

     return fasync_helper(...);

But fasync_helper() will return a positive value at times - a feature used
in at least one place.  Thus, a number of other drivers do:

     err = fasync_helper(...);
     if (err < 0)
             return err;
     return 0;

In the interests of consistency and more concise code, it makes sense to
map positive return values onto zero where ->fasync() is called.

Signed-off-by: Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org>
---
 drivers/char/sonypi.c              |    7 +------
 drivers/gpu/drm/drm_fops.c         |    6 +-----
 drivers/hid/usbhid/hiddev.c        |    5 +----
 drivers/ieee1394/dv1394.c          |    6 +-----
 drivers/input/evdev.c              |    5 +----
 drivers/input/joydev.c             |    5 +----
 drivers/input/mousedev.c           |    5 +----
 drivers/input/serio/serio_raw.c    |    4 +---
 drivers/net/wan/cosa.c             |    4 ++--
 drivers/platform/x86/sony-laptop.c |    7 +------
 drivers/scsi/sg.c                  |    4 +---
 fs/fcntl.c                         |    2 ++
 fs/ioctl.c                         |    2 +-
 fs/pipe.c                          |   17 +++--------------
 sound/core/control.c               |    7 ++-----
 sound/core/pcm_native.c            |    4 +---
 sound/core/timer.c                 |    6 +-----
 17 files changed, 22 insertions(+), 74 deletions(-)

diff --git a/drivers/char/sonypi.c b/drivers/char/sonypi.c
index f437443..fd3dced 100644
--- a/drivers/char/sonypi.c
+++ b/drivers/char/sonypi.c
@@ -888,12 +888,7 @@ found:
 
 static int sonypi_misc_fasync(int fd, struct file *filp, int on)
 {
-	int retval;
-
-	retval = fasync_helper(fd, filp, on, &sonypi_device.fifo_async);
-	if (retval < 0)
-		return retval;
-	return 0;
+	return fasync_helper(fd, filp, on, &sonypi_device.fifo_async);
 }
 
 static int sonypi_misc_release(struct inode *inode, struct file *file)
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
index b06a537..9ec9d5d 100644
--- a/drivers/gpu/drm/drm_fops.c
+++ b/drivers/gpu/drm/drm_fops.c
@@ -337,14 +337,10 @@ int drm_fasync(int fd, struct file *filp, int on)
 {
 	struct drm_file *priv = filp->private_data;
 	struct drm_device *dev = priv->minor->dev;
-	int retcode;
 
 	DRM_DEBUG("fd = %d, device = 0x%lx\n", fd,
 		  (long)old_encode_dev(priv->minor->device));
-	retcode = fasync_helper(fd, filp, on, &dev->buf_async);
-	if (retcode < 0)
-		return retcode;
-	return 0;
+	return fasync_helper(fd, filp, on, &dev->buf_async);
 }
 EXPORT_SYMBOL(drm_fasync);
 
diff --git a/drivers/hid/usbhid/hiddev.c b/drivers/hid/usbhid/hiddev.c
index d73eea3..55828da 100644
--- a/drivers/hid/usbhid/hiddev.c
+++ b/drivers/hid/usbhid/hiddev.c
@@ -227,12 +227,9 @@ void hiddev_report_event(struct hid_device *hid, struct hid_report *report)
  */
 static int hiddev_fasync(int fd, struct file *file, int on)
 {
-	int retval;
 	struct hiddev_list *list = file->private_data;
 
-	retval = fasync_helper(fd, file, on, &list->fasync);
-
-	return retval < 0 ? retval : 0;
+	return fasync_helper(fd, file, on, &list->fasync);
 }
 
 
diff --git a/drivers/ieee1394/dv1394.c b/drivers/ieee1394/dv1394.c
index a329e6b..22e5487 100644
--- a/drivers/ieee1394/dv1394.c
+++ b/drivers/ieee1394/dv1394.c
@@ -1325,11 +1325,7 @@ static int dv1394_fasync(int fd, struct file *file, int on)
 
 	struct video_card *video = file_to_video_card(file);
 
-	int retval = fasync_helper(fd, file, on, &video->fasync);
-
-	if (retval < 0)
-		return retval;
-        return 0;
+	return fasync_helper(fd, file, on, &video->fasync);
 }
 
 static ssize_t dv1394_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos)
diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index ed8baa0..7a7a026 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -94,11 +94,8 @@ static void evdev_event(struct input_handle *handle,
 static int evdev_fasync(int fd, struct file *file, int on)
 {
 	struct evdev_client *client = file->private_data;
-	int retval;
-
-	retval = fasync_helper(fd, file, on, &client->fasync);
 
-	return retval < 0 ? retval : 0;
+	return fasync_helper(fd, file, on, &client->fasync);
 }
 
 static int evdev_flush(struct file *file, fl_owner_t id)
diff --git a/drivers/input/joydev.c b/drivers/input/joydev.c
index 6f23662..4224f01 100644
--- a/drivers/input/joydev.c
+++ b/drivers/input/joydev.c
@@ -159,12 +159,9 @@ static void joydev_event(struct input_handle *handle,
 
 static int joydev_fasync(int fd, struct file *file, int on)
 {
-	int retval;
 	struct joydev_client *client = file->private_data;
 
-	retval = fasync_helper(fd, file, on, &client->fasync);
-
-	return retval < 0 ? retval : 0;
+	return fasync_helper(fd, file, on, &client->fasync);
 }
 
 static void joydev_free(struct device *dev)
diff --git a/drivers/input/mousedev.c b/drivers/input/mousedev.c
index ef99a7e..17fd6d4 100644
--- a/drivers/input/mousedev.c
+++ b/drivers/input/mousedev.c
@@ -403,12 +403,9 @@ static void mousedev_event(struct input_handle *handle,
 
 static int mousedev_fasync(int fd, struct file *file, int on)
 {
-	int retval;
 	struct mousedev_client *client = file->private_data;
 
-	retval = fasync_helper(fd, file, on, &client->fasync);
-
-	return retval < 0 ? retval : 0;
+	return fasync_helper(fd, file, on, &client->fasync);
 }
 
 static void mousedev_free(struct device *dev)
diff --git a/drivers/input/serio/serio_raw.c b/drivers/input/serio/serio_raw.c
index 06bbd0e..b03009b 100644
--- a/drivers/input/serio/serio_raw.c
+++ b/drivers/input/serio/serio_raw.c
@@ -58,10 +58,8 @@ static unsigned int serio_raw_no;
 static int serio_raw_fasync(int fd, struct file *file, int on)
 {
 	struct serio_raw_list *list = file->private_data;
-	int retval;
 
-	retval = fasync_helper(fd, file, on, &list->fasync);
-	return retval < 0 ? retval : 0;
+	return fasync_helper(fd, file, on, &list->fasync);
 }
 
 static struct serio_raw *serio_raw_locate(int minor)
diff --git a/drivers/net/wan/cosa.c b/drivers/net/wan/cosa.c
index d80b72e..ce753e9 100644
--- a/drivers/net/wan/cosa.c
+++ b/drivers/net/wan/cosa.c
@@ -993,8 +993,8 @@ static struct fasync_struct *fasync[256] = { NULL, };
 static int cosa_fasync(struct inode *inode, struct file *file, int on)
 {
         int port = iminor(inode);
-        int rv = fasync_helper(inode, file, on, &fasync[port]);
-        return rv < 0 ? rv : 0;
+
+	return fasync_helper(inode, file, on, &fasync[port]);
 }
 #endif
 
diff --git a/drivers/platform/x86/sony-laptop.c b/drivers/platform/x86/sony-laptop.c
index 537959d..bc8996c 100644
--- a/drivers/platform/x86/sony-laptop.c
+++ b/drivers/platform/x86/sony-laptop.c
@@ -1917,12 +1917,7 @@ static struct sonypi_compat_s sonypi_compat = {
 
 static int sonypi_misc_fasync(int fd, struct file *filp, int on)
 {
-	int retval;
-
-	retval = fasync_helper(fd, filp, on, &sonypi_compat.fifo_async);
-	if (retval < 0)
-		return retval;
-	return 0;
+	return fasync_helper(fd, filp, on, &sonypi_compat.fifo_async);
 }
 
 static int sonypi_misc_release(struct inode *inode, struct file *file)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 8f0bd3f..b656cde 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1154,7 +1154,6 @@ sg_poll(struct file *filp, poll_table * wait)
 static int
 sg_fasync(int fd, struct file *filp, int mode)
 {
-	int retval;
 	Sg_device *sdp;
 	Sg_fd *sfp;
 
@@ -1163,8 +1162,7 @@ sg_fasync(int fd, struct file *filp, int mode)
 	SCSI_LOG_TIMEOUT(3, printk("sg_fasync: %s, mode=%d\n",
 				   sdp->disk->disk_name, mode));
 
-	retval = fasync_helper(fd, filp, mode, &sfp->async_qp);
-	return (retval < 0) ? retval : 0;
+	return fasync_helper(fd, filp, mode, &sfp->async_qp);
 }
 
 static int
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 7b641d9..7a74571 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -192,6 +192,8 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 		error = filp->f_op->fasync(fd, filp, (arg & FASYNC) != 0);
 		if (error < 0)
 			goto out;
+		if (error > 0)
+			error = 0;
 	}
 	/*
 	 * Now that we use bitops we have to tweak each bit individually.
diff --git a/fs/ioctl.c b/fs/ioctl.c
index bc32cb2..fcfd0e6 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -432,7 +432,7 @@ static int ioctl_fioasync(unsigned int fd, struct file *filp,
 		else
 			error = -ENOTTY;
 	}
-	return error;
+	return error < 0 ? error : 0;
 }
 
 static int ioctl_fsfreeze(struct file *filp)
diff --git a/fs/pipe.c b/fs/pipe.c
index 3a48ba5..900de67 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -667,10 +667,7 @@ pipe_read_fasync(int fd, struct file *filp, int on)
 	retval = fasync_helper(fd, filp, on, &inode->i_pipe->fasync_readers);
 	mutex_unlock(&inode->i_mutex);
 
-	if (retval < 0)
-		return retval;
-
-	return 0;
+	return retval;
 }
 
 
@@ -684,10 +681,7 @@ pipe_write_fasync(int fd, struct file *filp, int on)
 	retval = fasync_helper(fd, filp, on, &inode->i_pipe->fasync_writers);
 	mutex_unlock(&inode->i_mutex);
 
-	if (retval < 0)
-		return retval;
-
-	return 0;
+	return retval;
 }
 
 
@@ -701,16 +695,11 @@ pipe_rdwr_fasync(int fd, struct file *filp, int on)
 	mutex_lock(&inode->i_mutex);
 
 	retval = fasync_helper(fd, filp, on, &pipe->fasync_readers);
-
 	if (retval >= 0)
 		retval = fasync_helper(fd, filp, on, &pipe->fasync_writers);
 
 	mutex_unlock(&inode->i_mutex);
-
-	if (retval < 0)
-		return retval;
-
-	return 0;
+	return retval;
 }
 
 
diff --git a/sound/core/control.c b/sound/core/control.c
index 636b3b5..4b20fa2 100644
--- a/sound/core/control.c
+++ b/sound/core/control.c
@@ -1373,12 +1373,9 @@ EXPORT_SYMBOL(snd_ctl_unregister_ioctl_compat);
 static int snd_ctl_fasync(int fd, struct file * file, int on)
 {
 	struct snd_ctl_file *ctl;
-	int err;
+
 	ctl = file->private_data;
-	err = fasync_helper(fd, file, on, &ctl->fasync);
-	if (err < 0)
-		return err;
-	return 0;
+	return fasync_helper(fd, file, on, &ctl->fasync);
 }
 
 /*
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index a789efc..a75c194 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -3246,9 +3246,7 @@ static int snd_pcm_fasync(int fd, struct file * file, int on)
 	err = fasync_helper(fd, file, on, &runtime->fasync);
 out:
 	unlock_kernel();
-	if (err < 0)
-		return err;
-	return 0;
+	return err;
 }
 
 /*
diff --git a/sound/core/timer.c b/sound/core/timer.c
index 7965320..3f0050d 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1825,13 +1825,9 @@ static long snd_timer_user_ioctl(struct file *file, unsigned int cmd,
 static int snd_timer_user_fasync(int fd, struct file * file, int on)
 {
 	struct snd_timer_user *tu;
-	int err;
 
 	tu = file->private_data;
-	err = fasync_helper(fd, file, on, &tu->fasync);
-        if (err < 0)
-		return err;
-	return 0;
+	return fasync_helper(fd, file, on, &tu->fasync);
 }
 
 static ssize_t snd_timer_user_read(struct file *file, char __user *buffer,
-- 
1.6.1.1

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] Use bit operations for file->f_flags
       [not found]     ` <1233598811-6871-2-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
@ 2009-02-03 21:37       ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2009-02-03 21:37 UTC (permalink / raw)
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, andi-Vw/NltI1exuRpAAqCnN02g,
	oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	corbet-T1hC0tSOHrs

On Mon,  2 Feb 2009 11:20:08 -0700
Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:

> Turn struct file->f_flags into an unsigned long and use bitops to change
> its contents.  This allows the safe manipulation of these flags without the
> need for locks.  BKL use is removed where possible, but manipulations of
> the FASYNC bit still require it (that will be fixed in a later patch).
> 
> ...
>
> diff --git a/arch/alpha/include/asm/fcntl.h b/arch/alpha/include/asm/fcntl.h
> index 25da001..de7172f 100644
> --- a/arch/alpha/include/asm/fcntl.h
> +++ b/arch/alpha/include/asm/fcntl.h
> @@ -9,13 +9,18 @@
>  #define O_NOCTTY	010000	/* not fcntl */
>  
>  #define O_NONBLOCK	 00004
> +#define NR_O_NONBLOCK	 2
>  #define O_APPEND	 00010
> +#define NR_O_APPEND	 3
>  #define O_SYNC		040000
> +#define NR_O_SYNC	14
>  #define O_DIRECTORY	0100000	/* must be a directory */
>  #define O_NOFOLLOW	0200000 /* don't follow links */
>  #define O_LARGEFILE	0400000 /* will be set by the kernel on every open */
>  #define O_DIRECT	02000000 /* direct disk access - should check with OSF/1 */
> +#define NR_O_DIRECT	19
>  #define O_NOATIME	04000000
> +#define NR_O_NOATIME	20
>  #define O_CLOEXEC	010000000 /* set close_on_exec */
>  
>  #define F_GETLK		7
> diff --git a/arch/arm/include/asm/fcntl.h b/arch/arm/include/asm/fcntl.h
> index a80b660..e74b8d1 100644
> --- a/arch/arm/include/asm/fcntl.h
> +++ b/arch/arm/include/asm/fcntl.h
> @@ -4,6 +4,7 @@
>  #define O_DIRECTORY	 040000	/* must be a directory */
>  #define O_NOFOLLOW	0100000	/* don't follow links */
>  #define O_DIRECT	0200000	/* direct disk access hint - currently ignored */
> +#define NR_O_DIRECT	16
>  #define O_LARGEFILE	0400000
>  
>  #include <asm-generic/fcntl.h>

hm, it's not exactly a vision of splendour, is it.

I suppose we could/should do

#define O_NONBLOCK	(1 << NR_O_NONBLOCK)

but that doesn't prevent people from accidentally doing open-coded &/|
in the future.  Perhaps renaming f_flags to f__flags_atomic (stupid
name?) would help.

> +static void tweak_flags_bit(int nr, int on, unsigned long *flags)
> +{
> +	if (on)
> +		set_bit(nr, flags);
> +	else
> +		clear_bit(nr, flags);
> +}

I see this construct fairly frequently.  I think.  Someone(tm) should
check ;) If correct I guess we should have some generic version, put it
in include/linux/bitops.h.

>  
>  static int setfl(int fd, struct file * filp, unsigned long arg)
>  {
> @@ -187,9 +195,17 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
>  			if (error < 0)
>  				goto out;
>  		}
> +		change_bit(NR_FASYNC, &filp->f_flags);
>  	}
>  
> -	filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK);
> +	/*
> +	 * Now that we use bitops we have to tweak each bit individually.
> +	 */
> +	tweak_flags_bit(NR_O_APPEND, arg & O_APPEND, &filp->f_flags);
> +	tweak_flags_bit(NR_O_NONBLOCK, arg & O_NONBLOCK, &filp->f_flags);
> +	tweak_flags_bit(NR_O_NDELAY, arg & O_NDELAY, &filp->f_flags);
> +	tweak_flags_bit(NR_O_DIRECT, arg & O_DIRECT, &filp->f_flags);
> +	tweak_flags_bit(NR_O_NOATIME, arg & O_NOATIME, &filp->f_flags);

Could pass the file* to tweak_flags_bit() - that would generate less
code.  Unless the compiler chooses to inline tweak_flags_bit().

>   out:
>  	unlock_kernel();
>  	return error;
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 240ec63..fc0db36 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -392,22 +392,24 @@ static int file_ioctl(struct file *filp, unsigned int cmd,
>  
>  static int ioctl_fionbio(struct file *filp, int __user *argp)
>  {
> -	unsigned int flag;
>  	int on, error;
>  
>  	error = get_user(on, argp);
>  	if (error)
>  		return error;
> -	flag = O_NONBLOCK;
>  #ifdef __sparc__
>  	/* SunOS compatibility item. */
> -	if (O_NONBLOCK != O_NDELAY)
> -		flag |= O_NDELAY;
> +	if (O_NONBLOCK != O_NDELAY) {
> +		if (on)
> +			set_bit(NR_O_NDELAY, &filp->f_flags);
> +		else
> +			clear_bit(NR_O_NDELAY, &filp->f_flags);

hey, I recognise that!

> +	}
>  #endif
>  	if (on)
> -		filp->f_flags |= flag;
> +		set_bit(NR_O_NONBLOCK, &filp->f_flags);
>  	else
> -		filp->f_flags &= ~flag;
> +		clear_bit(NR_O_NONBLOCK, &filp->f_flags);

and that.


as for the whole patchset: gee, that global lock you had is looking
attractive ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]     ` <1233598811-6871-3-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
@ 2009-02-03 21:39       ` Andrew Morton
       [not found]         ` <20090203133942.2ecec281.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
                           ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Andrew Morton @ 2009-02-03 21:39 UTC (permalink / raw)
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, andi-Vw/NltI1exuRpAAqCnN02g,
	oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
	corbet-T1hC0tSOHrs

On Mon,  2 Feb 2009 11:20:09 -0700
Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:

> Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way of
> saving space in struct file.  This patch makes that change.

hrm.  bit_spin_lock() makes people upset (large penguiny people).  iirc
it doesn't have all the correct/well-understood memory/compiler
ordering semantics which spinlocks have.  And lockdep doesn't know about
it.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]         ` <20090203133942.2ecec281.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2009-02-03 21:55           ` Eric Dumazet
       [not found]             ` <4988BD4E.8080206-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Dumazet @ 2009-02-03 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

Andrew Morton a écrit :
> On Mon,  2 Feb 2009 11:20:09 -0700
> Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:
> 
>> Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way of
>> saving space in struct file.  This patch makes that change.
> 
> hrm.  bit_spin_lock() makes people upset (large penguiny people).  iirc
> it doesn't have all the correct/well-understood memory/compiler
> ordering semantics which spinlocks have.  And lockdep doesn't know about
> it.
> 

In a previous attempt (2005), I suggested using a single global lock.

http://search.luky.org/linux-kernel.2005/msg50862.html

Probably an array of hashed spinlocks would be more than enough.



--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]             ` <4988BD4E.8080206-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
@ 2009-02-03 22:05               ` Andrew Morton
       [not found]                 ` <20090203140543.6e915f97.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2009-02-03 23:08               ` Davide Libenzi
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2009-02-03 22:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: corbet-T1hC0tSOHrs, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Tue, 03 Feb 2009 22:55:26 +0100
Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:

> Andrew Morton a __crit :
> > On Mon,  2 Feb 2009 11:20:09 -0700
> > Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:
> > 
> >> Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way of
> >> saving space in struct file.  This patch makes that change.
> > 
> > hrm.  bit_spin_lock() makes people upset (large penguiny people).  iirc
> > it doesn't have all the correct/well-understood memory/compiler
> > ordering semantics which spinlocks have.  And lockdep doesn't know about
> > it.
> > 
> 
> In a previous attempt (2005), I suggested using a single global lock.
> 
> http://search.luky.org/linux-kernel.2005/msg50862.html

ok..

> Probably an array of hashed spinlocks would be more than enough.
> 

yes, f_ep_lock is a teeny innermost lock.  Perhaps using
f->f_dentry->d_inode->i_lock would be a decent speed/space compromise.


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                 ` <20090203140543.6e915f97.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2009-02-03 22:22                   ` Matt Mackall
  2009-02-03 22:37                     ` Jonathan Corbet
  0 siblings, 1 reply; 24+ messages in thread
From: Matt Mackall @ 2009-02-03 22:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric Dumazet, corbet-T1hC0tSOHrs,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, andi-Vw/NltI1exuRpAAqCnN02g,
	oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Tue, 2009-02-03 at 14:05 -0800, Andrew Morton wrote:
> On Tue, 03 Feb 2009 22:55:26 +0100
> Eric Dumazet <dada1-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org> wrote:
> 
> > Andrew Morton a __crit :
> > > On Mon,  2 Feb 2009 11:20:09 -0700
> > > Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:
> > > 
> > >> Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way of
> > >> saving space in struct file.  This patch makes that change.
> > > 
> > > hrm.  bit_spin_lock() makes people upset (large penguiny people).  iirc
> > > it doesn't have all the correct/well-understood memory/compiler
> > > ordering semantics which spinlocks have.  And lockdep doesn't know about
> > > it.
> > > 
> > 
> > In a previous attempt (2005), I suggested using a single global lock.
> > 
> > http://search.luky.org/linux-kernel.2005/msg50862.html
> 
> ok..
> 
> > Probably an array of hashed spinlocks would be more than enough.
> > 
> 
> yes, f_ep_lock is a teeny innermost lock.  Perhaps using
> f->f_dentry->d_inode->i_lock would be a decent speed/space compromise.

That seems eminently reasonable.

But that re-opens the question of what to do about poor Jon's quest.

I got confused halfway through as he went from using a global fasync
spinlock to a non-locked but atomic flag bit. Not sure why using a
per-file or per-inode lock doesn't work for the fasync code.

-- 
http://selenic.com : development and support for Mercurial and Linux


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
  2009-02-03 22:22                   ` Matt Mackall
@ 2009-02-03 22:37                     ` Jonathan Corbet
       [not found]                       ` <20090203153740.363d0a04-vw3g6Xz/EtPk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Jonathan Corbet @ 2009-02-03 22:37 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Andrew Morton, Eric Dumazet, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Tue, 03 Feb 2009 16:22:02 -0600
Matt Mackall <mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org> wrote:

> But that re-opens the question of what to do about poor Jon's quest.
>
> I got confused halfway through as he went from using a global fasync
> spinlock to a non-locked but atomic flag bit. Not sure why using a
> per-file or per-inode lock doesn't work for the fasync code.

No per-file lock because (1) there is resistance to growing struct
file, and (2) lockless algorithms are all the rage now.  Additionally,
solving the fasync-atomicity problem with locks requires the use of a
mutex (or the BKL) rather than a spinlock.  I suppose we could combine
a global f_flags lock with the set-FASYNC-in-fasync_helper() bits.

At this point Poor Jon sees a fork in the road as he contemplates the
future of his quest:

- Go with this patch set, perhaps with a bit of cleanup as suggested by
  Andrew.

- Go back to the global lock.

- Go away, leave the BKL in place, and wait for somebody smarter to
  attack the problem.

Any wise guidance would be most welcome...

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                       ` <20090203153740.363d0a04-vw3g6Xz/EtPk1uMJSBkQmQ@public.gmane.org>
@ 2009-02-03 22:53                         ` Andrew Morton
       [not found]                           ` <20090203145346.8df40277.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2009-02-03 22:53 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, dada1-fPLkHRcR87vqlBn2x/YWAg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, andi-Vw/NltI1exuRpAAqCnN02g,
	oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Tue, 3 Feb 2009 15:37:40 -0700
Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:

> On Tue, 03 Feb 2009 16:22:02 -0600
> Matt Mackall <mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org> wrote:
> 
> > But that re-opens the question of what to do about poor Jon's quest.
> >
> > I got confused halfway through as he went from using a global fasync
> > spinlock to a non-locked but atomic flag bit. Not sure why using a
> > per-file or per-inode lock doesn't work for the fasync code.
> 
> No per-file lock because (1) there is resistance to growing struct
> file, and (2) lockless algorithms are all the rage now.  Additionally,
> solving the fasync-atomicity problem with locks requires the use of a
> mutex (or the BKL) rather than a spinlock.  I suppose we could combine
> a global f_flags lock with the set-FASYNC-in-fasync_helper() bits.
> 
> At this point Poor Jon sees a fork in the road as he contemplates the
> future of his quest:
> 
> - Go with this patch set, perhaps with a bit of cleanup as suggested by
>   Andrew.
> 
> - Go back to the global lock.
> 
> - Go away, leave the BKL in place, and wait for somebody smarter to
>   attack the problem.

Well.  We _could_ whack part of this nut with my usual hammer: protect
f_flags with file->f_dentry->d_inode->i_lock.  IIRC there was some
objection to that - performance?

One problem here seems to be that we're trying to change multiple
things at the same time.  We can blame the BKL for that.

Can we break the problem into manageable chunks?  Your patchset did
that, I guess.  What were those chunks again? ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]             ` <4988BD4E.8080206-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
  2009-02-03 22:05               ` Andrew Morton
@ 2009-02-03 23:08               ` Davide Libenzi
       [not found]                 ` <alpine.DEB.1.10.0902031446280.23050-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Davide Libenzi @ 2009-02-03 23:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Jonathan Corbet, Linux Kernel Mailing List,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn, David Miller,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, Alan Cox

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1264 bytes --]

On Tue, 3 Feb 2009, Eric Dumazet wrote:

> Andrew Morton a écrit :
> > On Mon,  2 Feb 2009 11:20:09 -0700
> > Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:
> > 
> >> Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way of
> >> saving space in struct file.  This patch makes that change.
> > 
> > hrm.  bit_spin_lock() makes people upset (large penguiny people).  iirc
> > it doesn't have all the correct/well-understood memory/compiler
> > ordering semantics which spinlocks have.  And lockdep doesn't know about
> > it.
> > 
> 
> In a previous attempt (2005), I suggested using a single global lock.
> 
> http://search.luky.org/linux-kernel.2005/msg50862.html
> 
> Probably an array of hashed spinlocks would be more than enough.

That could be done, although I'm not sure it's worth going that way to 
save 4 bytes. The effective saving rate is not even 4/sizeof(struct file) 
since struct file never comes alone, and when you allocate a struct file 
you always carry more allocations behind (at least for the cases where you 
tend to have a lot of them around, so size would matter).
The add/remove path in epoll is not a super-hot one, so it could be done. 
I dunno how this change matter with the patchset though.



- Davide

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                           ` <20090203145346.8df40277.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2009-02-03 23:09                             ` Davide Libenzi
       [not found]                               ` <alpine.DEB.1.10.0902031508280.23050-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
  2009-02-03 23:19                             ` Jonathan Corbet
  1 sibling, 1 reply; 24+ messages in thread
From: Davide Libenzi @ 2009-02-03 23:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ,
	dada1-fPLkHRcR87vqlBn2x/YWAg, Linux Kernel Mailing List,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn, Davide Libenzi,
	David Miller, hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox

On Tue, 3 Feb 2009, Andrew Morton wrote:

> Well.  We _could_ whack part of this nut with my usual hammer: protect
> f_flags with file->f_dentry->d_inode->i_lock.  IIRC there was some
> objection to that - performance?

If you meant it for epoll, that uses a single shared inode, that'd turn to 
be a global lock.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                               ` <alpine.DEB.1.10.0902031508280.23050-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
@ 2009-02-03 23:12                                 ` Davide Libenzi
  0 siblings, 0 replies; 24+ messages in thread
From: Davide Libenzi @ 2009-02-03 23:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ,
	dada1-fPLkHRcR87vqlBn2x/YWAg, Linux Kernel Mailing List,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn, David Miller,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA, Alan Cox

On Tue, 3 Feb 2009, Davide Libenzi wrote:

> On Tue, 3 Feb 2009, Andrew Morton wrote:
> 
> > Well.  We _could_ whack part of this nut with my usual hammer: protect
> > f_flags with file->f_dentry->d_inode->i_lock.  IIRC there was some
> > objection to that - performance?
> 
> If you meant it for epoll, that uses a single shared inode, that'd turn to 
> be a global lock.

Course you couldn't have meant it for epoll, so forget about the comment :)


- Davide


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                           ` <20090203145346.8df40277.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2009-02-03 23:09                             ` Davide Libenzi
@ 2009-02-03 23:19                             ` Jonathan Corbet
       [not found]                               ` <20090203161931.4054a25e-vw3g6Xz/EtPk1uMJSBkQmQ@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Jonathan Corbet @ 2009-02-03 23:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, dada1-fPLkHRcR87vqlBn2x/YWAg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, andi-Vw/NltI1exuRpAAqCnN02g,
	oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Tue, 3 Feb 2009 14:53:46 -0800
Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:

> Well.  We _could_ whack part of this nut with my usual hammer: protect
> f_flags with file->f_dentry->d_inode->i_lock.  IIRC there was some
> objection to that - performance?

Andi has objected to the addition of locks, but i_lock is maybe
sufficiently dispersed to pass muster there.  I had an instinctive
reaction to using a lock which is three pointers away, but I can get
over that.  I'll admit a bit of ignorance, though: if a given struct
file exists, do we know for sure that file->f_dentry->d_inode exists?

> One problem here seems to be that we're trying to change multiple
> things at the same time.  We can blame the BKL for that.
> 
> Can we break the problem into manageable chunks?  Your patchset did
> that, I guess.  What were those chunks again? ;)

I'm not really sure how to break it down any further.  If we take the
i_lock approach, the chunks would be something like:

 1) Use i_lock to protect accesses to f_flags.  This would enable some 
    BKL usage to be removed, but would not fix fasync.

 2) Move responsibility for the FASYNC bit into ->fasync(), with
    fasync_helper() doing it in almost all situations.  The remaining
    BKL usage would then go away.

 3) The same optional fasync() return values cleanup.

Make sense?

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                               ` <20090203161931.4054a25e-vw3g6Xz/EtPk1uMJSBkQmQ@public.gmane.org>
@ 2009-02-03 23:29                                 ` Andrew Morton
  2009-02-04  7:13                                 ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2009-02-03 23:29 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, dada1-fPLkHRcR87vqlBn2x/YWAg,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, andi-Vw/NltI1exuRpAAqCnN02g,
	oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Tue, 3 Feb 2009 16:19:31 -0700
Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:

> On Tue, 3 Feb 2009 14:53:46 -0800
> Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
> 
> > Well.  We _could_ whack part of this nut with my usual hammer: protect
> > f_flags with file->f_dentry->d_inode->i_lock.  IIRC there was some
> > objection to that - performance?
> 
> Andi has objected to the addition of locks, but i_lock is maybe
> sufficiently dispersed to pass muster there.

Hope so.

I'd wrap it in a lock_file_flags(file*) thing so we can change it later
on (add a lock to struct file, take a global, lock, etc).

>  I had an instinctive
> reaction to using a lock which is three pointers away, but I can get
> over that.  I'll admit a bit of ignorance, though: if a given struct
> file exists, do we know for sure that file->f_dentry->d_inode exists?

It should.  A NULL ->d_inode especially signifies a negative dentry.

> > One problem here seems to be that we're trying to change multiple
> > things at the same time.  We can blame the BKL for that.
> > 
> > Can we break the problem into manageable chunks?  Your patchset did
> > that, I guess.  What were those chunks again? ;)
> 
> I'm not really sure how to break it down any further.  If we take the
> i_lock approach, the chunks would be something like:
> 
>  1) Use i_lock to protect accesses to f_flags.  This would enable some 
>     BKL usage to be removed, but would not fix fasync.
> 
>  2) Move responsibility for the FASYNC bit into ->fasync(), with
>     fasync_helper() doing it in almost all situations.  The remaining
>     BKL usage would then go away.
> 
>  3) The same optional fasync() return values cleanup.
> 
> Make sense?

yup.

If the ->i_lock think is no good then we can trivially switch over to a
global lock.  Heck, we could even go back to lock_kernel() ;)

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
  2009-02-03 21:39       ` Andrew Morton
       [not found]         ` <20090203133942.2ecec281.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2009-02-04  1:00         ` wli
  2009-02-04  4:54         ` Nick Piggin
  2 siblings, 0 replies; 24+ messages in thread
From: wli @ 2009-02-04  1:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, linux-kernel, andi, oleg, viro, davidel, davem,
	hch, linux-api, mpm, alan

On Mon,  2 Feb 2009 11:20:09 -0700 Jonathan Corbet <corbet@lwn.net> wrote:
>> Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way of
>> saving space in struct file.  This patch makes that change.

On Tue, Feb 03, 2009 at 01:39:42PM -0800, Andrew Morton wrote:
> hrm.  bit_spin_lock() makes people upset (large penguiny people).  iirc
> it doesn't have all the correct/well-understood memory/compiler
> ordering semantics which spinlocks have.  And lockdep doesn't know about
> it.

ISTR the memory/compiler ordering semantics bits coming up when it was
still pte_chain_lock(), but not the entirety of it. I think
smp_mb__after_clear_bit() and/or smp_mb__before_clear_bit() turned out
to be needed in the unlock function, and they're there now with
clear_bit_unlock() et al. lockdep I'm less sure about, but suspect
the objects with bit locks embedded in them are too numerous to feasibly
track, e.g. there may be several bh's per-page, and how numerous they
are tends to be why they're using bit locks in the first place.


-- wli

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                 ` <alpine.DEB.1.10.0902031446280.23050-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
@ 2009-02-04  2:48                   ` Eric Dumazet
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Dumazet @ 2009-02-04  2:48 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Andrew Morton, Jonathan Corbet, Linux Kernel Mailing List,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn, David Miller,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ, Alan Cox

Davide Libenzi a écrit :
> On Tue, 3 Feb 2009, Eric Dumazet wrote:
> 
>> Andrew Morton a écrit :
>>> On Mon,  2 Feb 2009 11:20:09 -0700
>>> Jonathan Corbet <corbet-T1hC0tSOHrs@public.gmane.org> wrote:
>>>
>>>> Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way of
>>>> saving space in struct file.  This patch makes that change.
>>> hrm.  bit_spin_lock() makes people upset (large penguiny people).  iirc
>>> it doesn't have all the correct/well-understood memory/compiler
>>> ordering semantics which spinlocks have.  And lockdep doesn't know about
>>> it.
>>>
>> In a previous attempt (2005), I suggested using a single global lock.
>>
>> http://search.luky.org/linux-kernel.2005/msg50862.html
>>
>> Probably an array of hashed spinlocks would be more than enough.
> 
> That could be done, although I'm not sure it's worth going that way to 
> save 4 bytes. The effective saving rate is not even 4/sizeof(struct file) 
> since struct file never comes alone, and when you allocate a struct file 
> you always carry more allocations behind (at least for the cases where you 
> tend to have a lot of them around, so size would matter).
> The add/remove path in epoll is not a super-hot one, so it could be done. 
> I dunno how this change matter with the patchset though.

Back in 2005, I saved 4 bytes per file, and because of HWCACHE alignment, sizeof(struct file)
shrinked by 64 bytes. With more than 1.000.000 sockets opened on a busy server, it saved
64 MB of ram. At that time, this mattered (8GB of ram), but in 2009, 64 MB is so small
I dont care anymore about sizeof(struct file)

AFAIK, I just checked on x86_64 and got : sizeof(struct file)=0xc0 , so thats perfect :)

(Only thing I still do is to move private_data in the first cache line of struct file, because
it speedups a lot socket operation, when dealing with 1.000.000 sockets : one cache line miss
avoided per socket syscall)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6022f44..03b2227 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -842,6 +842,8 @@ struct file {
 #define f_dentry       f_path.dentry
 #define f_vfsmnt       f_path.mnt
        const struct file_operations    *f_op;
+       /* needed for tty driver, and maybe others */
+       void                    *private_data;
        atomic_long_t           f_count;
        unsigned int            f_flags;
        fmode_t                 f_mode;
@@ -854,8 +856,6 @@ struct file {
 #ifdef CONFIG_SECURITY
        void                    *f_security;
 #endif
-       /* needed for tty driver, and maybe others */
-       void                    *private_data;

 #ifdef CONFIG_EPOLL
        /* Used by fs/eventpoll.c to link all the hooks to this file */


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
  2009-02-03 21:39       ` Andrew Morton
       [not found]         ` <20090203133942.2ecec281.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  2009-02-04  1:00         ` wli
@ 2009-02-04  4:54         ` Nick Piggin
  2 siblings, 0 replies; 24+ messages in thread
From: Nick Piggin @ 2009-02-04  4:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jonathan Corbet, linux-kernel, andi, oleg, viro, davidel, davem,
	hch, linux-api, mpm, alan

On Wednesday 04 February 2009 08:39:42 Andrew Morton wrote:
> On Mon,  2 Feb 2009 11:20:09 -0700
>
> Jonathan Corbet <corbet@lwn.net> wrote:
> > Matt Mackall suggested converting epoll's ep_lock to a bitlock as a way
> > of saving space in struct file.  This patch makes that change.
>
> hrm.  bit_spin_lock() makes people upset (large penguiny people).  iirc
> it doesn't have all the correct/well-understood memory/compiler
> ordering semantics which spinlocks have.

Bit spinlocks are OK now they are using lock bitops. Of course, they are
even nicer (and become very comparable to spinlocks) if you can use the
non-atomic unlock __bit_spin_unlock().

But it seems probably like another approach being discussed is a better
idea anyway.

> And lockdep doesn't know about
> it.

This is a valid concern. Someone had a patch to add lockdep support to it,
which might be a good idea to do. But for small inner locks, maybe not a
huge problem.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                               ` <20090203161931.4054a25e-vw3g6Xz/EtPk1uMJSBkQmQ@public.gmane.org>
  2009-02-03 23:29                                 ` Andrew Morton
@ 2009-02-04  7:13                                 ` Christoph Hellwig
       [not found]                                   ` <20090204071320.GA19348-jcswGhMUV9g@public.gmane.org>
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2009-02-04  7:13 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Andrew Morton, mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ,
	dada1-fPLkHRcR87vqlBn2x/YWAg, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	hch-jcswGhMUV9g, linux-api-u79uwXL29TY76Z2rM5mHXA,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Tue, Feb 03, 2009 at 04:19:31PM -0700, Jonathan Corbet wrote:
>  1) Use i_lock to protect accesses to f_flags.  This would enable some 
>     BKL usage to be removed, but would not fix fasync.

What about just turning f_ep_lock into f_lock and using it?

>  2) Move responsibility for the FASYNC bit into ->fasync(), with
>     fasync_helper() doing it in almost all situations.  The remaining
>     BKL usage would then go away.
> 
>  3) The same optional fasync() return values cleanup.

These two sound like a good thing to do no matter what the final locking
looks like.  I think they should be moved to the front of the patch
series and queued up no matter what.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                                   ` <20090204071320.GA19348-jcswGhMUV9g@public.gmane.org>
@ 2009-02-04  7:20                                     ` Nick Piggin
       [not found]                                       ` <200902041820.20535.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
  2009-02-04 16:51                                     ` Davide Libenzi
  1 sibling, 1 reply; 24+ messages in thread
From: Nick Piggin @ 2009-02-04  7:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jonathan Corbet, Andrew Morton, mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ,
	dada1-fPLkHRcR87vqlBn2x/YWAg, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Wednesday 04 February 2009 18:13:20 Christoph Hellwig wrote:
> On Tue, Feb 03, 2009 at 04:19:31PM -0700, Jonathan Corbet wrote:
> >  1) Use i_lock to protect accesses to f_flags.  This would enable some
> >     BKL usage to be removed, but would not fix fasync.
>
> What about just turning f_ep_lock into f_lock and using it?

Ah, yes I was going to say that too, but I confused i_lock with i_mutex
because it sounded like Jon needed a sleeping lock here?

Agree f_lock would be good. It might come in handy for other things too.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                                       ` <200902041820.20535.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
@ 2009-02-04 13:34                                         ` Jonathan Corbet
  0 siblings, 0 replies; 24+ messages in thread
From: Jonathan Corbet @ 2009-02-04 13:34 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Christoph Hellwig, Andrew Morton, mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ,
	dada1-fPLkHRcR87vqlBn2x/YWAg, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	davidel-AhlLAIvw+VEjIGhXcJzhZg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	alan-qBU/x9rampVanCEyBjwyrvXRex20P6io

On Wed, 4 Feb 2009 18:20:18 +1100
Nick Piggin <nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org> wrote:

> On Wednesday 04 February 2009 18:13:20 Christoph Hellwig wrote:
> > On Tue, Feb 03, 2009 at 04:19:31PM -0700, Jonathan Corbet wrote:  
> > >  1) Use i_lock to protect accesses to f_flags.  This would enable
> > > some BKL usage to be removed, but would not fix fasync.  
> >
> > What about just turning f_ep_lock into f_lock and using it?  
> 
> Ah, yes I was going to say that too, but I confused i_lock with
> i_mutex because it sounded like Jon needed a sleeping lock here?

Sigh, obviously that's what I should do.  Sorry for being so dense.
Consider it done.

[About sleeping locks: *if* one puts a lock around ->fasync(), it needs
to be a sleeping lock.  But moving FASYNC bit handling down gets rid of
the need to do that, so f_lock would be fine.]

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] Convert epoll to a bitlock
       [not found]                                   ` <20090204071320.GA19348-jcswGhMUV9g@public.gmane.org>
  2009-02-04  7:20                                     ` Nick Piggin
@ 2009-02-04 16:51                                     ` Davide Libenzi
  1 sibling, 0 replies; 24+ messages in thread
From: Davide Libenzi @ 2009-02-04 16:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jonathan Corbet, Andrew Morton, mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ,
	dada1-fPLkHRcR87vqlBn2x/YWAg, Linux Kernel Mailing List,
	andi-Vw/NltI1exuRpAAqCnN02g, oleg-H+wXaHxf7aLQT0dZR+AlfA,
	viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn, David Miller,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Alan Cox

On Wed, 4 Feb 2009, Christoph Hellwig wrote:

> On Tue, Feb 03, 2009 at 04:19:31PM -0700, Jonathan Corbet wrote:
> >  1) Use i_lock to protect accesses to f_flags.  This would enable some 
> >     BKL usage to be removed, but would not fix fasync.
> 
> What about just turning f_ep_lock into f_lock and using it?

Looks like a good idea.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2009-02-04 16:51 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-02 18:20 [PATCH/RFC] F_SETFL/Fasync BKL removal, now without unsightly global locks Jonathan Corbet
     [not found] ` <1233598811-6871-1-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
2009-02-02 18:20   ` [PATCH 1/4] Use bit operations for file->f_flags Jonathan Corbet
     [not found]     ` <1233598811-6871-2-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
2009-02-03 21:37       ` Andrew Morton
2009-02-02 18:20   ` [PATCH 2/4] Convert epoll to a bitlock Jonathan Corbet
     [not found]     ` <1233598811-6871-3-git-send-email-corbet-T1hC0tSOHrs@public.gmane.org>
2009-02-03 21:39       ` Andrew Morton
     [not found]         ` <20090203133942.2ecec281.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-02-03 21:55           ` Eric Dumazet
     [not found]             ` <4988BD4E.8080206-fPLkHRcR87vqlBn2x/YWAg@public.gmane.org>
2009-02-03 22:05               ` Andrew Morton
     [not found]                 ` <20090203140543.6e915f97.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-02-03 22:22                   ` Matt Mackall
2009-02-03 22:37                     ` Jonathan Corbet
     [not found]                       ` <20090203153740.363d0a04-vw3g6Xz/EtPk1uMJSBkQmQ@public.gmane.org>
2009-02-03 22:53                         ` Andrew Morton
     [not found]                           ` <20090203145346.8df40277.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2009-02-03 23:09                             ` Davide Libenzi
     [not found]                               ` <alpine.DEB.1.10.0902031508280.23050-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
2009-02-03 23:12                                 ` Davide Libenzi
2009-02-03 23:19                             ` Jonathan Corbet
     [not found]                               ` <20090203161931.4054a25e-vw3g6Xz/EtPk1uMJSBkQmQ@public.gmane.org>
2009-02-03 23:29                                 ` Andrew Morton
2009-02-04  7:13                                 ` Christoph Hellwig
     [not found]                                   ` <20090204071320.GA19348-jcswGhMUV9g@public.gmane.org>
2009-02-04  7:20                                     ` Nick Piggin
     [not found]                                       ` <200902041820.20535.nickpiggin-/E1597aS9LT0CCvOHzKKcA@public.gmane.org>
2009-02-04 13:34                                         ` Jonathan Corbet
2009-02-04 16:51                                     ` Davide Libenzi
2009-02-03 23:08               ` Davide Libenzi
     [not found]                 ` <alpine.DEB.1.10.0902031446280.23050-GPJ85BhbkB8RepQJljzAVbITYcZ0+W3JAL8bYrjMMd8@public.gmane.org>
2009-02-04  2:48                   ` Eric Dumazet
2009-02-04  1:00         ` wli
2009-02-04  4:54         ` Nick Piggin
2009-02-02 18:20   ` [PATCH 3/4] Move FASYNC bit handling to f_op->fasync() Jonathan Corbet
2009-02-02 18:20   ` [PATCH 4/4] Rationalize fasync return values Jonathan Corbet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).