linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Improve lseek scalability
@ 2011-08-22 20:49 Andi Kleen
  2011-08-22 20:49 ` [PATCH 1/7] BTRFS: Fix lseek return value for error Andi Kleen
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-22 20:49 UTC (permalink / raw)
  To: viro; +Cc: hch, linux-fsdevel, linux-kernel

Currently generic_file_llseek users synchronize all on the inode i_mutex,
which is very heavy handed because it affects even different processes.

This patchkit attempts to make generic_file_llseek (mostly) lockless.

For details see the individual patches.

This is a repost of the earlier version which got some reviews.

v2: Forward ported to recent kernel. Add SEEK_DATA/HOLE support.
Fix a nasty SEEK_END bug in the previous version.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/7] BTRFS: Fix lseek return value for error
  2011-08-22 20:49 Improve lseek scalability Andi Kleen
@ 2011-08-22 20:49 ` Andi Kleen
  2011-08-22 20:49 ` [PATCH 2/7] VFS: Do (nearly) lockless generic_file_llseek Andi Kleen
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-22 20:49 UTC (permalink / raw)
  To: viro; +Cc: hch, linux-fsdevel, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Introduced by 9a4327ca1f45f82edad7dc0a4e52ce9316e0950c

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/btrfs/file.c |   13 +++++++------
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e7872e4..c6e493f 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1814,19 +1814,17 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int origin)
 	case SEEK_DATA:
 	case SEEK_HOLE:
 		ret = find_desired_extent(inode, &offset, origin);
-		if (ret) {
-			mutex_unlock(&inode->i_mutex);
-			return ret;
-		}
+		if (ret)
+			goto error;
 	}
 
 	if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET)) {
 		ret = -EINVAL;
-		goto out;
+		goto error;
 	}
 	if (offset > inode->i_sb->s_maxbytes) {
 		ret = -EINVAL;
-		goto out;
+		goto error;
 	}
 
 	/* Special lock needed here? */
@@ -1837,6 +1835,9 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int origin)
 out:
 	mutex_unlock(&inode->i_mutex);
 	return offset;
+error:
+	mutex_unlock(&inode->i_mutex);
+	return ret;
 }
 
 const struct file_operations btrfs_file_operations = {
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/7] VFS: Do (nearly) lockless generic_file_llseek
  2011-08-22 20:49 Improve lseek scalability Andi Kleen
  2011-08-22 20:49 ` [PATCH 1/7] BTRFS: Fix lseek return value for error Andi Kleen
@ 2011-08-22 20:49 ` Andi Kleen
  2011-08-22 20:49 ` [PATCH 3/7] VFS: Make generic lseek lockless safe Andi Kleen
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-22 20:49 UTC (permalink / raw)
  To: viro; +Cc: hch, linux-fsdevel, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The i_mutex lock use of generic _file_llseek hurts.  Independent processes
accessing the same file synchronize over a single lock, even though
they have no need for synchronization at all.

Under high utilization this can cause llseek to scale very poorly on larger
systems.

This patch does some rethinking of the llseek locking model:

First the 64bit f_pos is not necessarily atomic without locks
on 32bit systems. This can already cause races with read() today.
This was discussed on linux-kernel in the past and deemed acceptable.
The patch does not change that.

Let's look at the different seek variants:

SEEK_SET: Doesn't really need any locking.
If there's a race one writer wins, the other loses.

For 32bit the non atomic update races against read()
stay the same. Without a lock they can also happen
against write() now.  The read() race was deemed
acceptable in past discussions, and I think if it's
ok for read it's ok for write too.

=> Don't need a lock.

SEEK_END: This behaves like SEEK_SET plus it reads
the maximum size too. Reading the maximum size would have the
32bit atomic problem. But luckily we already have a way to read
the maximum size without locking (i_size_read), so we
can just use that instead.

Without i_mutex there is no synchronization with write() anymore,
however since the write() update is atomic on 64bit it just behaves
like another racy SEEK_SET.  On non atomic 32bit it's the same
as SEEK_SET.

=> Don't need a lock, but need to use i_size_read()

SEEK_CUR: This has a read-modify-write race window
on the same file. One could argue that any application
doing unsynchronized seeks on the same file is already broken.
But for the sake of not adding a regression here I'm
using the file->f_lock to synchronize this. Using this
lock is much better than the inode mutex because it doesn't
synchronize between processes.

=> So still need a lock, but can use a cheap local one.

This patch implements this new scheme in generic_file_llseek.
I dropped generic_file_llseek_unlocked and changed all callers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/btrfs/file.c    |    2 +-
 fs/cifs/cifsfs.c   |    2 +-
 fs/gfs2/file.c     |    4 ++--
 fs/nfs/file.c      |    5 +++--
 fs/read_write.c    |   33 ++++++---------------------------
 include/linux/fs.h |    2 --
 6 files changed, 13 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index c6e493f..6107b94 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1809,7 +1809,7 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int origin)
 	switch (origin) {
 	case SEEK_END:
 	case SEEK_CUR:
-		offset = generic_file_llseek_unlocked(file, offset, origin);
+		offset = generic_file_llseek(file, offset, origin);
 		goto out;
 	case SEEK_DATA:
 	case SEEK_HOLE:
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index f93eb94..e43641b 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -721,7 +721,7 @@ static loff_t cifs_llseek(struct file *file, loff_t offset, int origin)
 		if (rc < 0)
 			return (loff_t)rc;
 	}
-	return generic_file_llseek_unlocked(file, offset, origin);
+	return generic_file_llseek(file, offset, origin);
 }
 
 static int cifs_setlease(struct file *file, long arg, struct file_lock **lease)
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index edeb9e8..fe6bc02 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -63,11 +63,11 @@ static loff_t gfs2_llseek(struct file *file, loff_t offset, int origin)
 		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_ANY,
 					   &i_gh);
 		if (!error) {
-			error = generic_file_llseek_unlocked(file, offset, origin);
+			error = generic_file_llseek(file, offset, origin);
 			gfs2_glock_dq_uninit(&i_gh);
 		}
 	} else
-		error = generic_file_llseek_unlocked(file, offset, origin);
+		error = generic_file_llseek(file, offset, origin);
 
 	return error;
 }
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 28b8c3f..12623ab 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -198,11 +198,12 @@ static loff_t nfs_file_llseek(struct file *filp, loff_t offset, int origin)
 		if (retval < 0)
 			return (loff_t)retval;
 
+		/* AK: should drop this lock. Unlikely to be needed. */
 		spin_lock(&inode->i_lock);
-		loff = generic_file_llseek_unlocked(filp, offset, origin);
+		loff = generic_file_llseek(filp, offset, origin);
 		spin_unlock(&inode->i_lock);
 	} else
-		loff = generic_file_llseek_unlocked(filp, offset, origin);
+		loff = generic_file_llseek(filp, offset, origin);
 	return loff;
 }
 
diff --git a/fs/read_write.c b/fs/read_write.c
index 179f1c3..24f0001 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -36,22 +36,23 @@ static inline int unsigned_offsets(struct file *file)
 }
 
 /**
- * generic_file_llseek_unlocked - lockless generic llseek implementation
+ * generic_file_llseek - generic llseek implementation for regular files
  * @file:	file structure to seek on
  * @offset:	file offset to seek to
  * @origin:	type of seek
  *
- * Updates the file offset to the value specified by @offset and @origin.
- * Locking must be provided by the caller.
+ * This is a generic implemenation of ->llseek useable for all normal local
+ * filesystems.  It just updates the file offset to the value specified by
+ * @offset and @origin under i_mutex.
  */
 loff_t
-generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin)
+generic_file_llseek(struct file *file, loff_t offset, int origin)
 {
 	struct inode *inode = file->f_mapping->host;
 
 	switch (origin) {
 	case SEEK_END:
-		offset += inode->i_size;
+		offset += i_size_read(inode);
 		break;
 	case SEEK_CUR:
 		/*
@@ -96,28 +97,6 @@ generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin)
 
 	return offset;
 }
-EXPORT_SYMBOL(generic_file_llseek_unlocked);
-
-/**
- * generic_file_llseek - generic llseek implementation for regular files
- * @file:	file structure to seek on
- * @offset:	file offset to seek to
- * @origin:	type of seek
- *
- * This is a generic implemenation of ->llseek useable for all normal local
- * filesystems.  It just updates the file offset to the value specified by
- * @offset and @origin under i_mutex.
- */
-loff_t generic_file_llseek(struct file *file, loff_t offset, int origin)
-{
-	loff_t rval;
-
-	mutex_lock(&file->f_dentry->d_inode->i_mutex);
-	rval = generic_file_llseek_unlocked(file, offset, origin);
-	mutex_unlock(&file->f_dentry->d_inode->i_mutex);
-
-	return rval;
-}
 EXPORT_SYMBOL(generic_file_llseek);
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 178cdb4..7a515d1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2394,8 +2394,6 @@ file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
 extern loff_t noop_llseek(struct file *file, loff_t offset, int origin);
 extern loff_t no_llseek(struct file *file, loff_t offset, int origin);
 extern loff_t generic_file_llseek(struct file *file, loff_t offset, int origin);
-extern loff_t generic_file_llseek_unlocked(struct file *file, loff_t offset,
-			int origin);
 extern int generic_file_open(struct inode * inode, struct file * filp);
 extern int nonseekable_open(struct inode * inode, struct file * filp);
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/7] VFS: Make generic lseek lockless safe
  2011-08-22 20:49 Improve lseek scalability Andi Kleen
  2011-08-22 20:49 ` [PATCH 1/7] BTRFS: Fix lseek return value for error Andi Kleen
  2011-08-22 20:49 ` [PATCH 2/7] VFS: Do (nearly) lockless generic_file_llseek Andi Kleen
@ 2011-08-22 20:49 ` Andi Kleen
  2011-08-22 20:49 ` [PATCH 4/7] VFS: Add generic_file_llseek_size Andi Kleen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-22 20:49 UTC (permalink / raw)
  To: viro; +Cc: hch, linux-fsdevel, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

- use f_lock to protect SEEK_CUR
- use i_size_read to safely read file sizes on 32bit

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/read_write.c    |   52 +++++++++++++++++++++++++++++++++++-----------------
 include/linux/fs.h |    3 ++-
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 24f0001..8e8aab3 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -35,6 +35,21 @@ static inline int unsigned_offsets(struct file *file)
 	return file->f_mode & FMODE_UNSIGNED_OFFSET;
 }
 
+static loff_t lseek_execute(struct file *file, struct inode *inode, loff_t offset,
+			    loff_t maxsize)
+{
+	if (offset < 0 && !unsigned_offsets(file))
+		return -EINVAL;
+	if (offset > maxsize)
+		return -EINVAL;
+
+	if (offset != file->f_pos) {
+		file->f_pos = offset;
+		file->f_version = 0;
+	}
+	return offset;
+}
+
 /**
  * generic_file_llseek - generic llseek implementation for regular files
  * @file:	file structure to seek on
@@ -44,6 +59,12 @@ static inline int unsigned_offsets(struct file *file)
  * This is a generic implemenation of ->llseek useable for all normal local
  * filesystems.  It just updates the file offset to the value specified by
  * @offset and @origin under i_mutex.
+ *
+ * Synchronization:
+ * SEEK_SET is unsynchronized (but atomic on 64bit platforms)
+ * SEEK_CUR is synchronized against other SEEK_CURs, but not read/writes.
+ * read/writes behave like SEEK_SET against seeks.
+ * SEEK_END
  */
 loff_t
 generic_file_llseek(struct file *file, loff_t offset, int origin)
@@ -63,14 +84,22 @@ generic_file_llseek(struct file *file, loff_t offset, int origin)
 		 */
 		if (offset == 0)
 			return file->f_pos;
-		offset += file->f_pos;
-		break;
+		/*
+		 * f_lock protects against read/modify/write race with other
+		 * SEEK_CURs. Note that parallel writes and reads behave
+		 * like SEEK_SET.
+		 */
+		spin_lock(&file->f_lock);
+		offset = lseek_execute(file, inode, file->f_pos + offset, 
+				       inode->i_sb->s_maxbytes);
+		spin_unlock(&file->f_lock);
+		return offset;
 	case SEEK_DATA:
 		/*
 		 * In the generic case the entire file is data, so as long as
 		 * offset isn't at the end of the file then the offset is data.
 		 */
-		if (offset >= inode->i_size)
+		if (offset >= i_size_read(inode))
 			return -ENXIO;
 		break;
 	case SEEK_HOLE:
@@ -78,24 +107,13 @@ generic_file_llseek(struct file *file, loff_t offset, int origin)
 		 * There is a virtual hole at the end of the file, so as long as
 		 * offset isn't i_size or larger, return i_size.
 		 */
-		if (offset >= inode->i_size)
+		if (offset >= i_size_read(inode))
 			return -ENXIO;
-		offset = inode->i_size;
+		offset = i_size_read(inode);
 		break;
 	}
 
-	if (offset < 0 && !unsigned_offsets(file))
-		return -EINVAL;
-	if (offset > inode->i_sb->s_maxbytes)
-		return -EINVAL;
-
-	/* Special lock needed here? */
-	if (offset != file->f_pos) {
-		file->f_pos = offset;
-		file->f_version = 0;
-	}
-
-	return offset;
+	return lseek_execute(file, inode, offset, inode->i_sb->s_maxbytes);
 }
 EXPORT_SYMBOL(generic_file_llseek);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7a515d1..3417259 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -965,7 +965,8 @@ struct file {
 #define f_dentry	f_path.dentry
 #define f_vfsmnt	f_path.mnt
 	const struct file_operations	*f_op;
-	spinlock_t		f_lock;  /* f_ep_links, f_flags, no IRQ */
+	spinlock_t		f_lock;  /* f_ep_links, f_flags, no IRQ,
+					    SEEK_SET */
 #ifdef CONFIG_SMP
 	int			f_sb_list_cpu;
 #endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/7] VFS: Add generic_file_llseek_size
  2011-08-22 20:49 Improve lseek scalability Andi Kleen
                   ` (2 preceding siblings ...)
  2011-08-22 20:49 ` [PATCH 3/7] VFS: Make generic lseek lockless safe Andi Kleen
@ 2011-08-22 20:49 ` Andi Kleen
  2011-08-23  0:08   ` Andreas Dilger
  2011-08-23 12:51   ` Arnd Bergmann
  2011-08-22 20:49 ` [PATCH 5/7] LSEEK: EXT4: Replace cut'n'pasted llseek code with generic_file_llseek_size Andi Kleen
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-22 20:49 UTC (permalink / raw)
  To: viro; +Cc: hch, linux-fsdevel, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a generic_file_llseek variant to the VFS that allows passing in
the current file size, instead of always using inode->i_size.
This can be used to eliminate some cut'n'paste seek code in ext4.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/read_write.c    |   34 +++++++++++++++++++++++++++-------
 include/linux/fs.h |    2 ++
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 8e8aab3..3d55cc6 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -51,14 +51,14 @@ static loff_t lseek_execute(struct file *file, struct inode *inode, loff_t offse
 }
 
 /**
- * generic_file_llseek - generic llseek implementation for regular files
+ * generic_file_llseek_size - generic llseek implementation for regular files
  * @file:	file structure to seek on
  * @offset:	file offset to seek to
  * @origin:	type of seek
+ * @size:       max size of file system
  *
- * This is a generic implemenation of ->llseek useable for all normal local
- * filesystems.  It just updates the file offset to the value specified by
- * @offset and @origin under i_mutex.
+ * Variant of generic_file_llseek that allows passing in a custom
+ * file size.
  *
  * Synchronization:
  * SEEK_SET is unsynchronized (but atomic on 64bit platforms)
@@ -67,7 +67,8 @@ static loff_t lseek_execute(struct file *file, struct inode *inode, loff_t offse
  * SEEK_END
  */
 loff_t
-generic_file_llseek(struct file *file, loff_t offset, int origin)
+generic_file_llseek_size(struct file *file, loff_t offset, int origin, 
+                         loff_t maxsize)
 {
 	struct inode *inode = file->f_mapping->host;
 
@@ -91,7 +92,7 @@ generic_file_llseek(struct file *file, loff_t offset, int origin)
 		 */
 		spin_lock(&file->f_lock);
 		offset = lseek_execute(file, inode, file->f_pos + offset, 
-				       inode->i_sb->s_maxbytes);
+				       maxsize);
 		spin_unlock(&file->f_lock);
 		return offset;
 	case SEEK_DATA:
@@ -113,7 +114,26 @@ generic_file_llseek(struct file *file, loff_t offset, int origin)
 		break;
 	}
 
-	return lseek_execute(file, inode, offset, inode->i_sb->s_maxbytes);
+	return lseek_execute(file, inode, offset, maxsize);
+}
+EXPORT_SYMBOL(generic_file_llseek_size);
+
+/**
+ * generic_file_llseek - generic llseek implementation for regular files
+ * @file:	file structure to seek on
+ * @offset:	file offset to seek to
+ * @origin:	type of seek
+ *
+ * This is a generic implemenation of ->llseek useable for all normal local
+ * filesystems.  It just updates the file offset to the value specified by
+ * @offset and @origin under i_mutex.
+ */
+loff_t generic_file_llseek(struct file *file, loff_t offset, int origin)
+{
+	struct inode *inode = file->f_mapping->host;
+
+	return generic_file_llseek_size(file, offset, origin, 
+					inode->i_sb->s_maxbytes);
 }
 EXPORT_SYMBOL(generic_file_llseek);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3417259..d64b601 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2395,6 +2395,8 @@ file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
 extern loff_t noop_llseek(struct file *file, loff_t offset, int origin);
 extern loff_t no_llseek(struct file *file, loff_t offset, int origin);
 extern loff_t generic_file_llseek(struct file *file, loff_t offset, int origin);
+extern loff_t generic_file_llseek_size(struct file *file, loff_t offset, int origin,
+				       loff_t size);
 extern int generic_file_open(struct inode * inode, struct file * filp);
 extern int nonseekable_open(struct inode * inode, struct file * filp);
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/7] LSEEK: EXT4: Replace cut'n'pasted llseek code with generic_file_llseek_size
  2011-08-22 20:49 Improve lseek scalability Andi Kleen
                   ` (3 preceding siblings ...)
  2011-08-22 20:49 ` [PATCH 4/7] VFS: Add generic_file_llseek_size Andi Kleen
@ 2011-08-22 20:49 ` Andi Kleen
  2011-08-22 20:49 ` [PATCH 6/7] LSEEK: NFS: Drop unnecessary locking in llseek Andi Kleen
  2011-08-22 20:49 ` [PATCH 7/7] LSEEK: BTRFS: Avoid i_mutex for SEEK_{CUR,SET,END} Andi Kleen
  6 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-22 20:49 UTC (permalink / raw)
  To: viro; +Cc: hch, linux-fsdevel, linux-kernel, Andi Kleen, tytso

From: Andi Kleen <ak@linux.intel.com>

This gives ext4 the benefits of unlocked llseek.

Cc: tytso@mit.edu
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/ext4/file.c |   47 +----------------------------------------------
 1 files changed, 1 insertions(+), 46 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index e4095e9..b9548f4 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -224,53 +224,8 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int origin)
 		maxbytes = EXT4_SB(inode->i_sb)->s_bitmap_maxbytes;
 	else
 		maxbytes = inode->i_sb->s_maxbytes;
-	mutex_lock(&inode->i_mutex);
-	switch (origin) {
-	case SEEK_END:
-		offset += inode->i_size;
-		break;
-	case SEEK_CUR:
-		if (offset == 0) {
-			mutex_unlock(&inode->i_mutex);
-			return file->f_pos;
-		}
-		offset += file->f_pos;
-		break;
-	case SEEK_DATA:
-		/*
-		 * In the generic case the entire file is data, so as long as
-		 * offset isn't at the end of the file then the offset is data.
-		 */
-		if (offset >= inode->i_size) {
-			mutex_unlock(&inode->i_mutex);
-			return -ENXIO;
-		}
-		break;
-	case SEEK_HOLE:
-		/*
-		 * There is a virtual hole at the end of the file, so as long as
-		 * offset isn't i_size or larger, return i_size.
-		 */
-		if (offset >= inode->i_size) {
-			mutex_unlock(&inode->i_mutex);
-			return -ENXIO;
-		}
-		offset = inode->i_size;
-		break;
-	}
-
-	if (offset < 0 || offset > maxbytes) {
-		mutex_unlock(&inode->i_mutex);
-		return -EINVAL;
-	}
-
-	if (offset != file->f_pos) {
-		file->f_pos = offset;
-		file->f_version = 0;
-	}
-	mutex_unlock(&inode->i_mutex);
 
-	return offset;
+	return generic_file_llseek_size(file, offset, origin, maxbytes);
 }
 
 const struct file_operations ext4_file_operations = {
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/7] LSEEK: NFS: Drop unnecessary locking in llseek
  2011-08-22 20:49 Improve lseek scalability Andi Kleen
                   ` (4 preceding siblings ...)
  2011-08-22 20:49 ` [PATCH 5/7] LSEEK: EXT4: Replace cut'n'pasted llseek code with generic_file_llseek_size Andi Kleen
@ 2011-08-22 20:49 ` Andi Kleen
  2011-08-22 20:49 ` [PATCH 7/7] LSEEK: BTRFS: Avoid i_mutex for SEEK_{CUR,SET,END} Andi Kleen
  6 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-22 20:49 UTC (permalink / raw)
  To: viro; +Cc: hch, linux-fsdevel, linux-kernel, Andi Kleen, Trond.Myklebust

From: Andi Kleen <ak@linux.intel.com>

This makes NFS follow the standard generic_file_llseek locking scheme.

Cc: Trond.Myklebust@netapp.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/nfs/file.c |   11 ++---------
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 12623ab..91c01f0 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -180,8 +180,6 @@ force_reval:
 
 static loff_t nfs_file_llseek(struct file *filp, loff_t offset, int origin)
 {
-	loff_t loff;
-
 	dprintk("NFS: llseek file(%s/%s, %lld, %d)\n",
 			filp->f_path.dentry->d_parent->d_name.name,
 			filp->f_path.dentry->d_name.name,
@@ -197,14 +195,9 @@ static loff_t nfs_file_llseek(struct file *filp, loff_t offset, int origin)
 		int retval = nfs_revalidate_file_size(inode, filp);
 		if (retval < 0)
 			return (loff_t)retval;
+	}
 
-		/* AK: should drop this lock. Unlikely to be needed. */
-		spin_lock(&inode->i_lock);
-		loff = generic_file_llseek(filp, offset, origin);
-		spin_unlock(&inode->i_lock);
-	} else
-		loff = generic_file_llseek(filp, offset, origin);
-	return loff;
+	return generic_file_llseek(filp, offset, origin);
 }
 
 /*
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/7] LSEEK: BTRFS: Avoid i_mutex for SEEK_{CUR,SET,END}
  2011-08-22 20:49 Improve lseek scalability Andi Kleen
                   ` (5 preceding siblings ...)
  2011-08-22 20:49 ` [PATCH 6/7] LSEEK: NFS: Drop unnecessary locking in llseek Andi Kleen
@ 2011-08-22 20:49 ` Andi Kleen
  6 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-22 20:49 UTC (permalink / raw)
  To: viro; +Cc: hch, linux-fsdevel, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Don't need the i_mutex for those cases, only for SEEK_HOLE/DATA.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/btrfs/file.c |   18 ++++++------------
 1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 6107b94..35f5f13 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1805,18 +1805,13 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int origin)
 	struct inode *inode = file->f_mapping->host;
 	int ret;
 
+	if (origin != SEEK_DATA && origin != SEEK_HOLE)
+		return generic_file_llseek(file, offset, origin);
+
 	mutex_lock(&inode->i_mutex);
-	switch (origin) {
-	case SEEK_END:
-	case SEEK_CUR:
-		offset = generic_file_llseek(file, offset, origin);
-		goto out;
-	case SEEK_DATA:
-	case SEEK_HOLE:
-		ret = find_desired_extent(inode, &offset, origin);
-		if (ret)
-			goto error;
-	}
+	ret = find_desired_extent(inode, &offset, origin);
+	if (ret)
+		goto error;
 
 	if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET)) {
 		ret = -EINVAL;
@@ -1832,7 +1827,6 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int origin)
 		file->f_pos = offset;
 		file->f_version = 0;
 	}
-out:
 	mutex_unlock(&inode->i_mutex);
 	return offset;
 error:
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/7] VFS: Add generic_file_llseek_size
  2011-08-22 20:49 ` [PATCH 4/7] VFS: Add generic_file_llseek_size Andi Kleen
@ 2011-08-23  0:08   ` Andreas Dilger
  2011-08-23  0:10     ` Andi Kleen
  2011-08-23 12:51   ` Arnd Bergmann
  1 sibling, 1 reply; 13+ messages in thread
From: Andreas Dilger @ 2011-08-23  0:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: viro, hch, linux-fsdevel, linux-kernel, Andi Kleen

On 2011-08-22, at 2:49 PM, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a generic_file_llseek variant to the VFS that allows passing in
> the current file size, instead of always using inode->i_size.
> This can be used to eliminate some cut'n'paste seek code in ext4.

I think your commit comment is incorrect.  It should read:

Add a generic_file_llseek_size() variant to the VFS that allows passing
in the maximum file size, instead of always using inode->i_sb->s_maxbytes.
This can be used to eliminate some cut'n'paste seek code in ext4.

Also, the function prototype in fs.h should take "loff_t maxbytes" instead
of "loff_t size".

> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
> fs/read_write.c    |   34 +++++++++++++++++++++++++++-------
> include/linux/fs.h |    2 ++
> 2 files changed, 29 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 8e8aab3..3d55cc6 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -51,14 +51,14 @@ static loff_t lseek_execute(struct file *file, struct inode *inode, loff_t offse
> }
> 
> /**
> - * generic_file_llseek - generic llseek implementation for regular files
> + * generic_file_llseek_size - generic llseek implementation for regular files
>  * @file:	file structure to seek on
>  * @offset:	file offset to seek to
>  * @origin:	type of seek
> + * @size:       max size of file system
>  *
> - * This is a generic implemenation of ->llseek useable for all normal local
> - * filesystems.  It just updates the file offset to the value specified by
> - * @offset and @origin under i_mutex.
> + * Variant of generic_file_llseek that allows passing in a custom
> + * file size.
>  *
>  * Synchronization:
>  * SEEK_SET is unsynchronized (but atomic on 64bit platforms)
> @@ -67,7 +67,8 @@ static loff_t lseek_execute(struct file *file, struct inode *inode, loff_t offse
>  * SEEK_END
>  */
> loff_t
> -generic_file_llseek(struct file *file, loff_t offset, int origin)
> +generic_file_llseek_size(struct file *file, loff_t offset, int origin, 
> +                         loff_t maxsize)
> {
> 	struct inode *inode = file->f_mapping->host;
> 
> @@ -91,7 +92,7 @@ generic_file_llseek(struct file *file, loff_t offset, int origin)
> 		 */
> 		spin_lock(&file->f_lock);
> 		offset = lseek_execute(file, inode, file->f_pos + offset, 
> -				       inode->i_sb->s_maxbytes);
> +				       maxsize);
> 		spin_unlock(&file->f_lock);
> 		return offset;
> 	case SEEK_DATA:
> @@ -113,7 +114,26 @@ generic_file_llseek(struct file *file, loff_t offset, int origin)
> 		break;
> 	}
> 
> -	return lseek_execute(file, inode, offset, inode->i_sb->s_maxbytes);
> +	return lseek_execute(file, inode, offset, maxsize);
> +}
> +EXPORT_SYMBOL(generic_file_llseek_size);
> +
> +/**
> + * generic_file_llseek - generic llseek implementation for regular files
> + * @file:	file structure to seek on
> + * @offset:	file offset to seek to
> + * @origin:	type of seek
> + *
> + * This is a generic implemenation of ->llseek useable for all normal local
> + * filesystems.  It just updates the file offset to the value specified by
> + * @offset and @origin under i_mutex.
> + */
> +loff_t generic_file_llseek(struct file *file, loff_t offset, int origin)
> +{
> +	struct inode *inode = file->f_mapping->host;
> +
> +	return generic_file_llseek_size(file, offset, origin, 
> +					inode->i_sb->s_maxbytes);
> }
> EXPORT_SYMBOL(generic_file_llseek);
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 3417259..d64b601 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2395,6 +2395,8 @@ file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping);
> extern loff_t noop_llseek(struct file *file, loff_t offset, int origin);
> extern loff_t no_llseek(struct file *file, loff_t offset, int origin);
> extern loff_t generic_file_llseek(struct file *file, loff_t offset, int origin);
> +extern loff_t generic_file_llseek_size(struct file *file, loff_t offset, int origin,
> +				       loff_t size);
> extern int generic_file_open(struct inode * inode, struct file * filp);
> extern int nonseekable_open(struct inode * inode, struct file * filp);
> 
> -- 
> 1.7.4.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/7] VFS: Add generic_file_llseek_size
  2011-08-23  0:08   ` Andreas Dilger
@ 2011-08-23  0:10     ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-23  0:10 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Andi Kleen, viro, hch, linux-fsdevel, linux-kernel


> I think your commit comment is incorrect.  It should read:
>
> Add a generic_file_llseek_size() variant to the VFS that allows passing
> in the maximum file size, instead of always using inode->i_sb->s_maxbytes.
> This can be used to eliminate some cut'n'paste seek code in ext4.
>
> Also, the function prototype in fs.h should take "loff_t maxbytes" instead
> of "loff_t size".

Yes you're right. Will fix.

Thanks for the review.

-Andi



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/7] VFS: Add generic_file_llseek_size
  2011-08-22 20:49 ` [PATCH 4/7] VFS: Add generic_file_llseek_size Andi Kleen
  2011-08-23  0:08   ` Andreas Dilger
@ 2011-08-23 12:51   ` Arnd Bergmann
  2011-08-23 16:07     ` Andi Kleen
  1 sibling, 1 reply; 13+ messages in thread
From: Arnd Bergmann @ 2011-08-23 12:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: viro, hch, linux-fsdevel, linux-kernel, Andi Kleen

On Monday 22 August 2011, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a generic_file_llseek variant to the VFS that allows passing in
> the current file size, instead of always using inode->i_size.
> This can be used to eliminate some cut'n'paste seek code in ext4.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Ah, very nice. I think you can use the same trick to remove duplication
in default_llseek by always setting the limit to MAX_LFS_FILESIZE.

	Arnd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/7] VFS: Add generic_file_llseek_size
  2011-08-23 12:51   ` Arnd Bergmann
@ 2011-08-23 16:07     ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-08-23 16:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Andi Kleen, viro, hch, linux-fsdevel, linux-kernel, Andi Kleen

On Tue, Aug 23, 2011 at 02:51:56PM +0200, Arnd Bergmann wrote:
> On Monday 22 August 2011, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > Add a generic_file_llseek variant to the VFS that allows passing in
> > the current file size, instead of always using inode->i_size.
> > This can be used to eliminate some cut'n'paste seek code in ext4.
> > 
> > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> Ah, very nice. I think you can use the same trick to remove duplication
> in default_llseek by always setting the limit to MAX_LFS_FILESIZE.

Not in this patchkit, but some other time perhaps.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/7] VFS: Make generic lseek lockless safe
  2011-09-15 23:06 Improve lseek scalability v3 Andi Kleen
@ 2011-09-15 23:06 ` Andi Kleen
  0 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2011-09-15 23:06 UTC (permalink / raw)
  To: viro; +Cc: linux-fsdevel, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

- use f_lock to protect SEEK_CUR
- use i_size_read to safely read file sizes on 32bit

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 fs/read_write.c    |   52 +++++++++++++++++++++++++++++++++++-----------------
 include/linux/fs.h |    3 ++-
 2 files changed, 37 insertions(+), 18 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 24f0001..8e8aab3 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -35,6 +35,21 @@ static inline int unsigned_offsets(struct file *file)
 	return file->f_mode & FMODE_UNSIGNED_OFFSET;
 }
 
+static loff_t lseek_execute(struct file *file, struct inode *inode, loff_t offset,
+			    loff_t maxsize)
+{
+	if (offset < 0 && !unsigned_offsets(file))
+		return -EINVAL;
+	if (offset > maxsize)
+		return -EINVAL;
+
+	if (offset != file->f_pos) {
+		file->f_pos = offset;
+		file->f_version = 0;
+	}
+	return offset;
+}
+
 /**
  * generic_file_llseek - generic llseek implementation for regular files
  * @file:	file structure to seek on
@@ -44,6 +59,12 @@ static inline int unsigned_offsets(struct file *file)
  * This is a generic implemenation of ->llseek useable for all normal local
  * filesystems.  It just updates the file offset to the value specified by
  * @offset and @origin under i_mutex.
+ *
+ * Synchronization:
+ * SEEK_SET is unsynchronized (but atomic on 64bit platforms)
+ * SEEK_CUR is synchronized against other SEEK_CURs, but not read/writes.
+ * read/writes behave like SEEK_SET against seeks.
+ * SEEK_END
  */
 loff_t
 generic_file_llseek(struct file *file, loff_t offset, int origin)
@@ -63,14 +84,22 @@ generic_file_llseek(struct file *file, loff_t offset, int origin)
 		 */
 		if (offset == 0)
 			return file->f_pos;
-		offset += file->f_pos;
-		break;
+		/*
+		 * f_lock protects against read/modify/write race with other
+		 * SEEK_CURs. Note that parallel writes and reads behave
+		 * like SEEK_SET.
+		 */
+		spin_lock(&file->f_lock);
+		offset = lseek_execute(file, inode, file->f_pos + offset, 
+				       inode->i_sb->s_maxbytes);
+		spin_unlock(&file->f_lock);
+		return offset;
 	case SEEK_DATA:
 		/*
 		 * In the generic case the entire file is data, so as long as
 		 * offset isn't at the end of the file then the offset is data.
 		 */
-		if (offset >= inode->i_size)
+		if (offset >= i_size_read(inode))
 			return -ENXIO;
 		break;
 	case SEEK_HOLE:
@@ -78,24 +107,13 @@ generic_file_llseek(struct file *file, loff_t offset, int origin)
 		 * There is a virtual hole at the end of the file, so as long as
 		 * offset isn't i_size or larger, return i_size.
 		 */
-		if (offset >= inode->i_size)
+		if (offset >= i_size_read(inode))
 			return -ENXIO;
-		offset = inode->i_size;
+		offset = i_size_read(inode);
 		break;
 	}
 
-	if (offset < 0 && !unsigned_offsets(file))
-		return -EINVAL;
-	if (offset > inode->i_sb->s_maxbytes)
-		return -EINVAL;
-
-	/* Special lock needed here? */
-	if (offset != file->f_pos) {
-		file->f_pos = offset;
-		file->f_version = 0;
-	}
-
-	return offset;
+	return lseek_execute(file, inode, offset, inode->i_sb->s_maxbytes);
 }
 EXPORT_SYMBOL(generic_file_llseek);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d9efdf7..fdcf2eb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -965,7 +965,8 @@ struct file {
 #define f_dentry	f_path.dentry
 #define f_vfsmnt	f_path.mnt
 	const struct file_operations	*f_op;
-	spinlock_t		f_lock;  /* f_ep_links, f_flags, no IRQ */
+	spinlock_t		f_lock;  /* f_ep_links, f_flags, no IRQ,
+					    SEEK_SET */
 #ifdef CONFIG_SMP
 	int			f_sb_list_cpu;
 #endif
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-09-15 23:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-22 20:49 Improve lseek scalability Andi Kleen
2011-08-22 20:49 ` [PATCH 1/7] BTRFS: Fix lseek return value for error Andi Kleen
2011-08-22 20:49 ` [PATCH 2/7] VFS: Do (nearly) lockless generic_file_llseek Andi Kleen
2011-08-22 20:49 ` [PATCH 3/7] VFS: Make generic lseek lockless safe Andi Kleen
2011-08-22 20:49 ` [PATCH 4/7] VFS: Add generic_file_llseek_size Andi Kleen
2011-08-23  0:08   ` Andreas Dilger
2011-08-23  0:10     ` Andi Kleen
2011-08-23 12:51   ` Arnd Bergmann
2011-08-23 16:07     ` Andi Kleen
2011-08-22 20:49 ` [PATCH 5/7] LSEEK: EXT4: Replace cut'n'pasted llseek code with generic_file_llseek_size Andi Kleen
2011-08-22 20:49 ` [PATCH 6/7] LSEEK: NFS: Drop unnecessary locking in llseek Andi Kleen
2011-08-22 20:49 ` [PATCH 7/7] LSEEK: BTRFS: Avoid i_mutex for SEEK_{CUR,SET,END} Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2011-09-15 23:06 Improve lseek scalability v3 Andi Kleen
2011-09-15 23:06 ` [PATCH 3/7] VFS: Make generic lseek lockless safe Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).