linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [rfc patch 0/4] splice: cleanups and fixes
@ 2008-06-21 15:46 Miklos Szeredi
  2008-06-21 15:46 ` [rfc patch 1/4] splice: fix comment Miklos Szeredi
                   ` (4 more replies)
  0 siblings, 5 replies; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-21 15:46 UTC (permalink / raw)
  To: jens.axboe; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

Brian Wang reported some problems with NFS export of fuse filesystems,
which turned out to be bad interaction between splice (used by nfsd)
and page cache invalidation.

I looked at the splice code, and found quite a bit of dead code,
duplication, and unnecessary complication.  This patchset attempts to
resolve those, in addition to fixing the invalidation issues.  Some
optimizations are lost in the process like the gang page lookup for
the fully cached case, and I can't really tell if these are important
enough to warrant the extra complexity.

I did minimal testing to verify that splice(2) on regular files still
works.  And since generic_file_splice_read() now shares most of its
code with generic_file_aio_read(), there's not much to go wrong in
there.  That said, it needs more testing...

Comments?

Thanks,
Miklos
--

 drivers/block/loop.c      |    5 
 fs/nfsd/vfs.c             |    9 -
 fs/pipe.c                 |   58 -------
 fs/splice.c               |  371 +++++-----------------------------------------
 include/linux/fs.h        |    2 
 include/linux/pipe_fs_i.h |   36 ----
 kernel/relay.c            |    2 
 mm/filemap.c              |    2 
 net/core/skbuff.c         |    9 -
 9 files changed, 47 insertions(+), 447 deletions(-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [rfc patch 1/4] splice: fix comment
  2008-06-21 15:46 [rfc patch 0/4] splice: cleanups and fixes Miklos Szeredi
@ 2008-06-21 15:46 ` Miklos Szeredi
  2008-06-21 15:46 ` [rfc patch 2/4] splice: remove steal from pipe_buf_operations Miklos Szeredi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-21 15:46 UTC (permalink / raw)
  To: jens.axboe; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

[-- Attachment #1: splice_fix_comment.patch --]
[-- Type: text/plain, Size: 687 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Clearing SPLICE_F_NONBLOCK means: block.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/splice.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c	2008-06-19 14:58:10.000000000 +0200
+++ linux-2.6/fs/splice.c	2008-06-20 13:12:37.000000000 +0200
@@ -978,7 +978,7 @@ ssize_t splice_direct_to_actor(struct fi
 	flags = sd->flags;
 
 	/*
-	 * Don't block on output, we have to drain the direct pipe.
+	 * Block on output, we have to drain the direct pipe.
 	 */
 	sd->flags &= ~SPLICE_F_NONBLOCK;
 

--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [rfc patch 2/4] splice: remove steal from pipe_buf_operations
  2008-06-21 15:46 [rfc patch 0/4] splice: cleanups and fixes Miklos Szeredi
  2008-06-21 15:46 ` [rfc patch 1/4] splice: fix comment Miklos Szeredi
@ 2008-06-21 15:46 ` Miklos Szeredi
  2008-06-24  8:01   ` Jens Axboe
  2008-06-21 15:46 ` [rfc patch 3/4] splice: remove confirm " Miklos Szeredi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-21 15:46 UTC (permalink / raw)
  To: jens.axboe; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

[-- Attachment #1: splice_remove_steal.patch --]
[-- Type: text/plain, Size: 7980 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

The 'steal' operation hasn't been used for some time.  Remove it and
the associated dead code.  If it's needed in the future, it can always
be easily restored.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/pipe.c                 |   31 -----------------------
 fs/splice.c               |   62 ----------------------------------------------
 include/linux/pipe_fs_i.h |   20 ++------------
 kernel/relay.c            |    1 
 net/core/skbuff.c         |    8 -----
 5 files changed, 3 insertions(+), 119 deletions(-)

Index: linux-2.6/fs/pipe.c
===================================================================
--- linux-2.6.orig/fs/pipe.c	2008-06-21 10:17:43.000000000 +0200
+++ linux-2.6/fs/pipe.c	2008-06-21 11:46:52.000000000 +0200
@@ -209,36 +209,6 @@ void generic_pipe_buf_unmap(struct pipe_
 }
 
 /**
- * generic_pipe_buf_steal - attempt to take ownership of a &pipe_buffer
- * @pipe:	the pipe that the buffer belongs to
- * @buf:	the buffer to attempt to steal
- *
- * Description:
- *	This function attempts to steal the &struct page attached to
- *	@buf. If successful, this function returns 0 and returns with
- *	the page locked. The caller may then reuse the page for whatever
- *	he wishes; the typical use is insertion into a different file
- *	page cache.
- */
-int generic_pipe_buf_steal(struct pipe_inode_info *pipe,
-			   struct pipe_buffer *buf)
-{
-	struct page *page = buf->page;
-
-	/*
-	 * A reference of one is golden, that means that the owner of this
-	 * page is the only one holding a reference to it. lock the page
-	 * and return OK.
-	 */
-	if (page_count(page) == 1) {
-		lock_page(page);
-		return 0;
-	}
-
-	return 1;
-}
-
-/**
  * generic_pipe_buf_get - get a reference to a &struct pipe_buffer
  * @pipe:	the pipe that the buffer belongs to
  * @buf:	the buffer to get a reference to
@@ -274,7 +244,6 @@ static const struct pipe_buf_operations 
 	.unmap = generic_pipe_buf_unmap,
 	.confirm = generic_pipe_buf_confirm,
 	.release = anon_pipe_buf_release,
-	.steal = generic_pipe_buf_steal,
 	.get = generic_pipe_buf_get,
 };
 
Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c	2008-06-21 10:19:01.000000000 +0200
+++ linux-2.6/fs/splice.c	2008-06-21 11:46:52.000000000 +0200
@@ -30,56 +30,6 @@
 #include <linux/uio.h>
 #include <linux/security.h>
 
-/*
- * Attempt to steal a page from a pipe buffer. This should perhaps go into
- * a vm helper function, it's already simplified quite a bit by the
- * addition of remove_mapping(). If success is returned, the caller may
- * attempt to reuse this page for another destination.
- */
-static int page_cache_pipe_buf_steal(struct pipe_inode_info *pipe,
-				     struct pipe_buffer *buf)
-{
-	struct page *page = buf->page;
-	struct address_space *mapping;
-
-	lock_page(page);
-
-	mapping = page_mapping(page);
-	if (mapping) {
-		WARN_ON(!PageUptodate(page));
-
-		/*
-		 * At least for ext2 with nobh option, we need to wait on
-		 * writeback completing on this page, since we'll remove it
-		 * from the pagecache.  Otherwise truncate wont wait on the
-		 * page, allowing the disk blocks to be reused by someone else
-		 * before we actually wrote our data to them. fs corruption
-		 * ensues.
-		 */
-		wait_on_page_writeback(page);
-
-		if (PagePrivate(page) && !try_to_release_page(page, GFP_KERNEL))
-			goto out_unlock;
-
-		/*
-		 * If we succeeded in removing the mapping, set LRU flag
-		 * and return good.
-		 */
-		if (remove_mapping(mapping, page)) {
-			buf->flags |= PIPE_BUF_FLAG_LRU;
-			return 0;
-		}
-	}
-
-	/*
-	 * Raced with truncate or failed to remove page from current
-	 * address space, unlock and return failure.
-	 */
-out_unlock:
-	unlock_page(page);
-	return 1;
-}
-
 static void page_cache_pipe_buf_release(struct pipe_inode_info *pipe,
 					struct pipe_buffer *buf)
 {
@@ -135,27 +85,15 @@ static const struct pipe_buf_operations 
 	.unmap = generic_pipe_buf_unmap,
 	.confirm = page_cache_pipe_buf_confirm,
 	.release = page_cache_pipe_buf_release,
-	.steal = page_cache_pipe_buf_steal,
 	.get = generic_pipe_buf_get,
 };
 
-static int user_page_pipe_buf_steal(struct pipe_inode_info *pipe,
-				    struct pipe_buffer *buf)
-{
-	if (!(buf->flags & PIPE_BUF_FLAG_GIFT))
-		return 1;
-
-	buf->flags |= PIPE_BUF_FLAG_LRU;
-	return generic_pipe_buf_steal(pipe, buf);
-}
-
 static const struct pipe_buf_operations user_page_pipe_buf_ops = {
 	.can_merge = 0,
 	.map = generic_pipe_buf_map,
 	.unmap = generic_pipe_buf_unmap,
 	.confirm = generic_pipe_buf_confirm,
 	.release = page_cache_pipe_buf_release,
-	.steal = user_page_pipe_buf_steal,
 	.get = generic_pipe_buf_get,
 };
 
Index: linux-2.6/include/linux/pipe_fs_i.h
===================================================================
--- linux-2.6.orig/include/linux/pipe_fs_i.h	2008-06-21 10:17:43.000000000 +0200
+++ linux-2.6/include/linux/pipe_fs_i.h	2008-06-21 11:46:52.000000000 +0200
@@ -61,16 +61,13 @@ struct pipe_inode_info {
  * Note on the nesting of these functions:
  *
  * ->confirm()
- *	->steal()
- *	...
  *	->map()
  *	...
  *	->unmap()
  *
- * That is, ->map() must be called on a confirmed buffer,
- * same goes for ->steal(). See below for the meaning of each
- * operation. Also see kerneldoc in fs/pipe.c for the pipe
- * and generic variants of these hooks.
+ * That is, ->map() must be called on a confirmed buffer. See below
+ * for the meaning of each operation. Also see kerneldoc in fs/pipe.c
+ * for the pipe and generic variants of these hooks.
  */
 struct pipe_buf_operations {
 	/*
@@ -115,16 +112,6 @@ struct pipe_buf_operations {
 	void (*release)(struct pipe_inode_info *, struct pipe_buffer *);
 
 	/*
-	 * Attempt to take ownership of the pipe buffer and its contents.
-	 * ->steal() returns 0 for success, in which case the contents
-	 * of the pipe (the buf->page) is locked and now completely owned
-	 * by the caller. The page may then be transferred to a different
-	 * mapping, the most often used case is insertion into different
-	 * file address space cache.
-	 */
-	int (*steal)(struct pipe_inode_info *, struct pipe_buffer *);
-
-	/*
 	 * Get a reference to the pipe buffer.
 	 */
 	void (*get)(struct pipe_inode_info *, struct pipe_buffer *);
@@ -146,6 +133,5 @@ void *generic_pipe_buf_map(struct pipe_i
 void generic_pipe_buf_unmap(struct pipe_inode_info *, struct pipe_buffer *, void *);
 void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *);
 int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *);
-int generic_pipe_buf_steal(struct pipe_inode_info *, struct pipe_buffer *);
 
 #endif
Index: linux-2.6/net/core/skbuff.c
===================================================================
--- linux-2.6.orig/net/core/skbuff.c	2008-06-21 10:17:43.000000000 +0200
+++ linux-2.6/net/core/skbuff.c	2008-06-21 11:46:52.000000000 +0200
@@ -88,13 +88,6 @@ static void sock_pipe_buf_get(struct pip
 	skb_get(skb);
 }
 
-static int sock_pipe_buf_steal(struct pipe_inode_info *pipe,
-			       struct pipe_buffer *buf)
-{
-	return 1;
-}
-
-
 /* Pipe buffer operations for a socket. */
 static struct pipe_buf_operations sock_pipe_buf_ops = {
 	.can_merge = 0,
@@ -102,7 +95,6 @@ static struct pipe_buf_operations sock_p
 	.unmap = generic_pipe_buf_unmap,
 	.confirm = generic_pipe_buf_confirm,
 	.release = sock_pipe_buf_release,
-	.steal = sock_pipe_buf_steal,
 	.get = sock_pipe_buf_get,
 };
 
Index: linux-2.6/kernel/relay.c
===================================================================
--- linux-2.6.orig/kernel/relay.c	2008-06-21 11:46:52.000000000 +0200
+++ linux-2.6/kernel/relay.c	2008-06-21 11:47:05.000000000 +0200
@@ -1081,7 +1081,6 @@ static struct pipe_buf_operations relay_
 	.unmap = generic_pipe_buf_unmap,
 	.confirm = generic_pipe_buf_confirm,
 	.release = relay_pipe_buf_release,
-	.steal = generic_pipe_buf_steal,
 	.get = generic_pipe_buf_get,
 };
 

--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-21 15:46 [rfc patch 0/4] splice: cleanups and fixes Miklos Szeredi
  2008-06-21 15:46 ` [rfc patch 1/4] splice: fix comment Miklos Szeredi
  2008-06-21 15:46 ` [rfc patch 2/4] splice: remove steal from pipe_buf_operations Miklos Szeredi
@ 2008-06-21 15:46 ` Miklos Szeredi
  2008-06-24  8:04   ` Jens Axboe
  2008-06-21 15:46 ` [rfc patch 4/4] splice: use do_generic_file_read() Miklos Szeredi
  2008-06-21 17:20 ` [rfc patch 0/4] splice: cleanups and fixes Subrata Modak
  4 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-21 15:46 UTC (permalink / raw)
  To: jens.axboe; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

[-- Attachment #1: splice_remove_confirm.patch --]
[-- Type: text/plain, Size: 9353 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

The 'confirm' operation was only used for splicing from page cache, to
wait for read on a page to finish.  But generic_file_splice_read()
already blocks on readahead reads, so it seems logical to block on the
rare and slow single page reads too.

So wait for readpage to finish inside __generic_file_splice_read() and
remove the 'confirm' method.

This also fixes short return counts when the filesystem (e.g. fuse)
invalidates the page between insertation and removal.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 drivers/block/loop.c      |    5 ---
 fs/nfsd/vfs.c             |    9 ------
 fs/pipe.c                 |   27 ------------------
 fs/splice.c               |   69 +++++-----------------------------------------
 include/linux/pipe_fs_i.h |   22 +-------------
 kernel/relay.c            |    1 
 net/core/skbuff.c         |    1 
 7 files changed, 11 insertions(+), 123 deletions(-)

Index: linux-2.6/drivers/block/loop.c
===================================================================
--- linux-2.6.orig/drivers/block/loop.c	2008-06-21 11:46:52.000000000 +0200
+++ linux-2.6/drivers/block/loop.c	2008-06-21 11:51:39.000000000 +0200
@@ -395,11 +395,6 @@ lo_splice_actor(struct pipe_inode_info *
 	struct page *page = buf->page;
 	sector_t IV;
 	size_t size;
-	int ret;
-
-	ret = buf->ops->confirm(pipe, buf);
-	if (unlikely(ret))
-		return ret;
 
 	IV = ((sector_t) page->index << (PAGE_CACHE_SHIFT - 9)) +
 							(buf->offset >> 9);
Index: linux-2.6/fs/pipe.c
===================================================================
--- linux-2.6.orig/fs/pipe.c	2008-06-21 11:46:52.000000000 +0200
+++ linux-2.6/fs/pipe.c	2008-06-21 11:47:09.000000000 +0200
@@ -223,26 +223,10 @@ void generic_pipe_buf_get(struct pipe_in
 	page_cache_get(buf->page);
 }
 
-/**
- * generic_pipe_buf_confirm - verify contents of the pipe buffer
- * @info:	the pipe that the buffer belongs to
- * @buf:	the buffer to confirm
- *
- * Description:
- *	This function does nothing, because the generic pipe code uses
- *	pages that are always good when inserted into the pipe.
- */
-int generic_pipe_buf_confirm(struct pipe_inode_info *info,
-			     struct pipe_buffer *buf)
-{
-	return 0;
-}
-
 static const struct pipe_buf_operations anon_pipe_buf_ops = {
 	.can_merge = 1,
 	.map = generic_pipe_buf_map,
 	.unmap = generic_pipe_buf_unmap,
-	.confirm = generic_pipe_buf_confirm,
 	.release = anon_pipe_buf_release,
 	.get = generic_pipe_buf_get,
 };
@@ -281,13 +265,6 @@ pipe_read(struct kiocb *iocb, const stru
 			if (chars > total_len)
 				chars = total_len;
 
-			error = ops->confirm(pipe, buf);
-			if (error) {
-				if (!ret)
-					error = ret;
-				break;
-			}
-
 			atomic = !iov_fault_in_pages_write(iov, chars);
 redo:
 			addr = ops->map(pipe, buf, atomic);
@@ -402,10 +379,6 @@ pipe_write(struct kiocb *iocb, const str
 			int error, atomic = 1;
 			void *addr;
 
-			error = ops->confirm(pipe, buf);
-			if (error)
-				goto out;
-
 			iov_fault_in_pages_read(iov, chars);
 redo1:
 			addr = ops->map(pipe, buf, atomic);
Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c	2008-06-21 11:46:52.000000000 +0200
+++ linux-2.6/fs/splice.c	2008-06-21 11:49:52.000000000 +0200
@@ -37,53 +37,10 @@ static void page_cache_pipe_buf_release(
 	buf->flags &= ~PIPE_BUF_FLAG_LRU;
 }
 
-/*
- * Check whether the contents of buf is OK to access. Since the content
- * is a page cache page, IO may be in flight.
- */
-static int page_cache_pipe_buf_confirm(struct pipe_inode_info *pipe,
-				       struct pipe_buffer *buf)
-{
-	struct page *page = buf->page;
-	int err;
-
-	if (!PageUptodate(page)) {
-		lock_page(page);
-
-		/*
-		 * Page got truncated/unhashed. This will cause a 0-byte
-		 * splice, if this is the first page.
-		 */
-		if (!page->mapping) {
-			err = -ENODATA;
-			goto error;
-		}
-
-		/*
-		 * Uh oh, read-error from disk.
-		 */
-		if (!PageUptodate(page)) {
-			err = -EIO;
-			goto error;
-		}
-
-		/*
-		 * Page is ok afterall, we are done.
-		 */
-		unlock_page(page);
-	}
-
-	return 0;
-error:
-	unlock_page(page);
-	return err;
-}
-
 static const struct pipe_buf_operations page_cache_pipe_buf_ops = {
 	.can_merge = 0,
 	.map = generic_pipe_buf_map,
 	.unmap = generic_pipe_buf_unmap,
-	.confirm = page_cache_pipe_buf_confirm,
 	.release = page_cache_pipe_buf_release,
 	.get = generic_pipe_buf_get,
 };
@@ -92,7 +49,6 @@ static const struct pipe_buf_operations 
 	.can_merge = 0,
 	.map = generic_pipe_buf_map,
 	.unmap = generic_pipe_buf_unmap,
-	.confirm = generic_pipe_buf_confirm,
 	.release = page_cache_pipe_buf_release,
 	.get = generic_pipe_buf_get,
 };
@@ -349,6 +305,11 @@ __generic_file_splice_read(struct file *
 
 				break;
 			}
+			wait_on_page_locked(page);
+			if (!PageUptodate(page)) {
+				error = -EIO;
+				break;
+			}
 		}
 fill_it:
 		/*
@@ -451,13 +412,10 @@ static int pipe_to_sendpage(struct pipe_
 	loff_t pos = sd->pos;
 	int ret, more;
 
-	ret = buf->ops->confirm(pipe, buf);
-	if (!ret) {
-		more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len;
+	more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len;
 
-		ret = file->f_op->sendpage(file, buf->page, buf->offset,
-					   sd->len, &pos, more);
-	}
+	ret = file->f_op->sendpage(file, buf->page, buf->offset,
+				   sd->len, &pos, more);
 
 	return ret;
 }
@@ -492,13 +450,6 @@ static int pipe_to_file(struct pipe_inod
 	void *fsdata;
 	int ret;
 
-	/*
-	 * make sure the data in this buffer is uptodate
-	 */
-	ret = buf->ops->confirm(pipe, buf);
-	if (unlikely(ret))
-		return ret;
-
 	offset = sd->pos & ~PAGE_CACHE_MASK;
 
 	this_len = sd->len;
@@ -1231,10 +1182,6 @@ static int pipe_to_user(struct pipe_inod
 	char *src;
 	int ret;
 
-	ret = buf->ops->confirm(pipe, buf);
-	if (unlikely(ret))
-		return ret;
-
 	/*
 	 * See if we can use the atomic maps, by prefaulting in the
 	 * pages and doing an atomic copy
Index: linux-2.6/include/linux/pipe_fs_i.h
===================================================================
--- linux-2.6.orig/include/linux/pipe_fs_i.h	2008-06-21 11:46:52.000000000 +0200
+++ linux-2.6/include/linux/pipe_fs_i.h	2008-06-21 11:47:09.000000000 +0200
@@ -58,16 +58,8 @@ struct pipe_inode_info {
 };
 
 /*
- * Note on the nesting of these functions:
- *
- * ->confirm()
- *	->map()
- *	...
- *	->unmap()
- *
- * That is, ->map() must be called on a confirmed buffer. See below
- * for the meaning of each operation. Also see kerneldoc in fs/pipe.c
- * for the pipe and generic variants of these hooks.
+ * Also see kerneldoc in fs/pipe.c for the pipe and generic variants
+ * of these hooks.
  */
 struct pipe_buf_operations {
 	/*
@@ -97,15 +89,6 @@ struct pipe_buf_operations {
 	void (*unmap)(struct pipe_inode_info *, struct pipe_buffer *, void *);
 
 	/*
-	 * ->confirm() verifies that the data in the pipe buffer is there
-	 * and that the contents are good. If the pages in the pipe belong
-	 * to a file system, we may need to wait for IO completion in this
-	 * hook. Returns 0 for good, or a negative error value in case of
-	 * error.
-	 */
-	int (*confirm)(struct pipe_inode_info *, struct pipe_buffer *);
-
-	/*
 	 * When the contents of this pipe buffer has been completely
 	 * consumed by a reader, ->release() is called.
 	 */
@@ -132,6 +115,5 @@ void __free_pipe_info(struct pipe_inode_
 void *generic_pipe_buf_map(struct pipe_inode_info *, struct pipe_buffer *, int);
 void generic_pipe_buf_unmap(struct pipe_inode_info *, struct pipe_buffer *, void *);
 void generic_pipe_buf_get(struct pipe_inode_info *, struct pipe_buffer *);
-int generic_pipe_buf_confirm(struct pipe_inode_info *, struct pipe_buffer *);
 
 #endif
Index: linux-2.6/kernel/relay.c
===================================================================
--- linux-2.6.orig/kernel/relay.c	2008-06-21 11:47:05.000000000 +0200
+++ linux-2.6/kernel/relay.c	2008-06-21 11:47:09.000000000 +0200
@@ -1079,7 +1079,6 @@ static struct pipe_buf_operations relay_
 	.can_merge = 0,
 	.map = generic_pipe_buf_map,
 	.unmap = generic_pipe_buf_unmap,
-	.confirm = generic_pipe_buf_confirm,
 	.release = relay_pipe_buf_release,
 	.get = generic_pipe_buf_get,
 };
Index: linux-2.6/net/core/skbuff.c
===================================================================
--- linux-2.6.orig/net/core/skbuff.c	2008-06-21 11:46:52.000000000 +0200
+++ linux-2.6/net/core/skbuff.c	2008-06-21 11:47:09.000000000 +0200
@@ -93,7 +93,6 @@ static struct pipe_buf_operations sock_p
 	.can_merge = 0,
 	.map = generic_pipe_buf_map,
 	.unmap = generic_pipe_buf_unmap,
-	.confirm = generic_pipe_buf_confirm,
 	.release = sock_pipe_buf_release,
 	.get = sock_pipe_buf_get,
 };
Index: linux-2.6/fs/nfsd/vfs.c
===================================================================
--- linux-2.6.orig/fs/nfsd/vfs.c	2008-06-19 14:58:10.000000000 +0200
+++ linux-2.6/fs/nfsd/vfs.c	2008-06-21 11:50:57.000000000 +0200
@@ -839,14 +839,7 @@ nfsd_splice_actor(struct pipe_inode_info
 	struct svc_rqst *rqstp = sd->u.data;
 	struct page **pp = rqstp->rq_respages + rqstp->rq_resused;
 	struct page *page = buf->page;
-	size_t size;
-	int ret;
-
-	ret = buf->ops->confirm(pipe, buf);
-	if (unlikely(ret))
-		return ret;
-
-	size = sd->len;
+	size_t size = sd->len;
 
 	if (rqstp->rq_res.page_len == 0) {
 		get_page(page);

--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [rfc patch 4/4] splice: use do_generic_file_read()
  2008-06-21 15:46 [rfc patch 0/4] splice: cleanups and fixes Miklos Szeredi
                   ` (2 preceding siblings ...)
  2008-06-21 15:46 ` [rfc patch 3/4] splice: remove confirm " Miklos Szeredi
@ 2008-06-21 15:46 ` Miklos Szeredi
  2008-06-24  8:05   ` Jens Axboe
  2008-06-21 17:20 ` [rfc patch 0/4] splice: cleanups and fixes Subrata Modak
  4 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-21 15:46 UTC (permalink / raw)
  To: jens.axboe; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

[-- Attachment #1: splice_generic_file_splice_read_cleanup.patch --]
[-- Type: text/plain, Size: 9031 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

__generic_file_splice_read() duplicates a lot of stuff common with the
generic page cache reading.  So reuse that code instead to simplify
the page cache splice code.

This also fixes some corner cases which weren't properly handled in
the splice code because of complexity issues.  In particular it fixes
a problem when the filesystem (e.g. fuse) invalidates pages during the
splice operation.

There might be some slight fall in performance due to the removal of
the gang lookup for pages.  However I'm not sure if this is
significant enough to warrant the extra complication.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/splice.c        |  248 +++++++----------------------------------------------
 include/linux/fs.h |    2 
 mm/filemap.c       |    2 
 3 files changed, 40 insertions(+), 212 deletions(-)

Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c	2008-06-21 15:19:49.000000000 +0200
+++ linux-2.6/fs/splice.c	2008-06-21 16:47:07.000000000 +0200
@@ -159,208 +159,25 @@ static void spd_release_page(struct spli
 	page_cache_release(spd->pages[i]);
 }
 
-static int
-__generic_file_splice_read(struct file *in, loff_t *ppos,
-			   struct pipe_inode_info *pipe, size_t len,
-			   unsigned int flags)
+static int file_splice_read_actor(read_descriptor_t *desc, struct page *page,
+				  unsigned long offset, unsigned long size)
 {
-	struct address_space *mapping = in->f_mapping;
-	unsigned int loff, nr_pages, req_pages;
-	struct page *pages[PIPE_BUFFERS];
-	struct partial_page partial[PIPE_BUFFERS];
-	struct page *page;
-	pgoff_t index, end_index;
-	loff_t isize;
-	int error, page_nr;
-	struct splice_pipe_desc spd = {
-		.pages = pages,
-		.partial = partial,
-		.flags = flags,
-		.ops = &page_cache_pipe_buf_ops,
-		.spd_release = spd_release_page,
-	};
-
-	index = *ppos >> PAGE_CACHE_SHIFT;
-	loff = *ppos & ~PAGE_CACHE_MASK;
-	req_pages = (len + loff + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	nr_pages = min(req_pages, (unsigned)PIPE_BUFFERS);
-
-	/*
-	 * Lookup the (hopefully) full range of pages we need.
-	 */
-	spd.nr_pages = find_get_pages_contig(mapping, index, nr_pages, pages);
-	index += spd.nr_pages;
-
-	/*
-	 * If find_get_pages_contig() returned fewer pages than we needed,
-	 * readahead/allocate the rest and fill in the holes.
-	 */
-	if (spd.nr_pages < nr_pages)
-		page_cache_sync_readahead(mapping, &in->f_ra, in,
-				index, req_pages - spd.nr_pages);
-
-	error = 0;
-	while (spd.nr_pages < nr_pages) {
-		/*
-		 * Page could be there, find_get_pages_contig() breaks on
-		 * the first hole.
-		 */
-		page = find_get_page(mapping, index);
-		if (!page) {
-			/*
-			 * page didn't exist, allocate one.
-			 */
-			page = page_cache_alloc_cold(mapping);
-			if (!page)
-				break;
+	struct splice_pipe_desc *spd = desc->arg.data;
+	unsigned long count = desc->count;
 
-			error = add_to_page_cache_lru(page, mapping, index,
-						mapping_gfp_mask(mapping));
-			if (unlikely(error)) {
-				page_cache_release(page);
-				if (error == -EEXIST)
-					continue;
-				break;
-			}
-			/*
-			 * add_to_page_cache() locks the page, unlock it
-			 * to avoid convoluting the logic below even more.
-			 */
-			unlock_page(page);
-		}
+	if (size > count)
+		size = count;
 
-		pages[spd.nr_pages++] = page;
-		index++;
-	}
+	page_cache_get(page);
+	spd->pages[spd->nr_pages] = page;
+	spd->partial[spd->nr_pages].offset = offset;
+	spd->partial[spd->nr_pages].len = size;
+	spd->nr_pages++;
 
-	/*
-	 * Now loop over the map and see if we need to start IO on any
-	 * pages, fill in the partial map, etc.
-	 */
-	index = *ppos >> PAGE_CACHE_SHIFT;
-	nr_pages = spd.nr_pages;
-	spd.nr_pages = 0;
-	for (page_nr = 0; page_nr < nr_pages; page_nr++) {
-		unsigned int this_len;
-
-		if (!len)
-			break;
-
-		/*
-		 * this_len is the max we'll use from this page
-		 */
-		this_len = min_t(unsigned long, len, PAGE_CACHE_SIZE - loff);
-		page = pages[page_nr];
+	desc->count = count - size;
+	desc->written += size;
 
-		if (PageReadahead(page))
-			page_cache_async_readahead(mapping, &in->f_ra, in,
-					page, index, req_pages - page_nr);
-
-		/*
-		 * If the page isn't uptodate, we may need to start io on it
-		 */
-		if (!PageUptodate(page)) {
-			/*
-			 * If in nonblock mode then dont block on waiting
-			 * for an in-flight io page
-			 */
-			if (flags & SPLICE_F_NONBLOCK) {
-				if (TestSetPageLocked(page)) {
-					error = -EAGAIN;
-					break;
-				}
-			} else
-				lock_page(page);
-
-			/*
-			 * page was truncated, stop here. if this isn't the
-			 * first page, we'll just complete what we already
-			 * added
-			 */
-			if (!page->mapping) {
-				unlock_page(page);
-				break;
-			}
-			/*
-			 * page was already under io and is now done, great
-			 */
-			if (PageUptodate(page)) {
-				unlock_page(page);
-				goto fill_it;
-			}
-
-			/*
-			 * need to read in the page
-			 */
-			error = mapping->a_ops->readpage(in, page);
-			if (unlikely(error)) {
-				/*
-				 * We really should re-lookup the page here,
-				 * but it complicates things a lot. Instead
-				 * lets just do what we already stored, and
-				 * we'll get it the next time we are called.
-				 */
-				if (error == AOP_TRUNCATED_PAGE)
-					error = 0;
-
-				break;
-			}
-			wait_on_page_locked(page);
-			if (!PageUptodate(page)) {
-				error = -EIO;
-				break;
-			}
-		}
-fill_it:
-		/*
-		 * i_size must be checked after PageUptodate.
-		 */
-		isize = i_size_read(mapping->host);
-		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
-		if (unlikely(!isize || index > end_index))
-			break;
-
-		/*
-		 * if this is the last page, see if we need to shrink
-		 * the length and stop
-		 */
-		if (end_index == index) {
-			unsigned int plen;
-
-			/*
-			 * max good bytes in this page
-			 */
-			plen = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
-			if (plen <= loff)
-				break;
-
-			/*
-			 * force quit after adding this page
-			 */
-			this_len = min(this_len, plen - loff);
-			len = this_len;
-		}
-
-		partial[page_nr].offset = loff;
-		partial[page_nr].len = this_len;
-		len -= this_len;
-		loff = 0;
-		spd.nr_pages++;
-		index++;
-	}
-
-	/*
-	 * Release any pages at the end, if we quit early. 'page_nr' is how far
-	 * we got, 'nr_pages' is how many pages are in the map.
-	 */
-	while (page_nr < nr_pages)
-		page_cache_release(pages[page_nr++]);
-	in->f_ra.prev_pos = (loff_t)index << PAGE_CACHE_SHIFT;
-
-	if (spd.nr_pages)
-		return splice_to_pipe(pipe, &spd);
-
-	return error;
+	return size;
 }
 
 /**
@@ -381,24 +198,33 @@ ssize_t generic_file_splice_read(struct 
 				 struct pipe_inode_info *pipe, size_t len,
 				 unsigned int flags)
 {
-	loff_t isize, left;
-	int ret;
-
-	isize = i_size_read(in->f_mapping->host);
-	if (unlikely(*ppos >= isize))
-		return 0;
-
-	left = isize - *ppos;
-	if (unlikely(left < len))
-		len = left;
+	ssize_t ret;
+	loff_t pos;
+	struct page *pages[PIPE_BUFFERS];
+	struct partial_page partial[PIPE_BUFFERS];
+	struct splice_pipe_desc spd = {
+		.pages = pages,
+		.partial = partial,
+		.flags = flags,
+		.ops = &page_cache_pipe_buf_ops,
+		.spd_release = spd_release_page,
+	};
+	read_descriptor_t desc = {
+		.count = len,
+		.arg.data = &spd,
+	};
 
-	ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
-	if (ret > 0)
-		*ppos += ret;
+	pos = *ppos;
+	do_generic_file_read(in, &pos, &desc, file_splice_read_actor);
+	ret = desc.error;
+	if (spd.nr_pages) {
+		ret = splice_to_pipe(pipe, &spd);
+		if (ret > 0)
+			*ppos += ret;
+	}
 
 	return ret;
 }
-
 EXPORT_SYMBOL(generic_file_splice_read);
 
 /*
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2008-06-21 15:14:26.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2008-06-21 15:19:49.000000000 +0200
@@ -1854,6 +1854,8 @@ extern ssize_t do_sync_read(struct file 
 extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos);
 extern int generic_segment_checks(const struct iovec *iov,
 		unsigned long *nr_segs, size_t *count, int access_flags);
+extern void do_generic_file_read(struct file *filp, loff_t *ppos,
+				 read_descriptor_t *desc, read_actor_t actor);
 
 /* fs/splice.c */
 extern ssize_t generic_file_splice_read(struct file *, loff_t *,
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2008-06-21 15:14:26.000000000 +0200
+++ linux-2.6/mm/filemap.c	2008-06-21 15:19:49.000000000 +0200
@@ -891,7 +891,7 @@ static void shrink_readahead_size_eio(st
  * This is really ugly. But the goto's actually try to clarify some
  * of the logic when it comes to error handling etc.
  */
-static void do_generic_file_read(struct file *filp, loff_t *ppos,
+void do_generic_file_read(struct file *filp, loff_t *ppos,
 		read_descriptor_t *desc, read_actor_t actor)
 {
 	struct address_space *mapping = filp->f_mapping;

--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 0/4] splice: cleanups and fixes
  2008-06-21 15:46 [rfc patch 0/4] splice: cleanups and fixes Miklos Szeredi
                   ` (3 preceding siblings ...)
  2008-06-21 15:46 ` [rfc patch 4/4] splice: use do_generic_file_read() Miklos Szeredi
@ 2008-06-21 17:20 ` Subrata Modak
  2008-06-22  6:16   ` Miklos Szeredi
  4 siblings, 1 reply; 38+ messages in thread
From: Subrata Modak @ 2008-06-21 17:20 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jens.axboe, linux-fsdevel, ltp-list

Hi Miklos,

In the backdrop of the changes in splice(), would you also like to help
us in reviewing the existing testcase for splice() in LTP:

http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/splice/,

It would also be great if you can send us some patch as well in the
regard.

Regards--
Subrata

On Sat, 2008-06-21 at 17:46 +0200, Miklos Szeredi wrote:
> Brian Wang reported some problems with NFS export of fuse filesystems,
> which turned out to be bad interaction between splice (used by nfsd)
> and page cache invalidation.
> 
> I looked at the splice code, and found quite a bit of dead code,
> duplication, and unnecessary complication.  This patchset attempts to
> resolve those, in addition to fixing the invalidation issues.  Some
> optimizations are lost in the process like the gang page lookup for
> the fully cached case, and I can't really tell if these are important
> enough to warrant the extra complexity.
> 
> I did minimal testing to verify that splice(2) on regular files still
> works.  And since generic_file_splice_read() now shares most of its
> code with generic_file_aio_read(), there's not much to go wrong in
> there.  That said, it needs more testing...
> 
> Comments?
> 
> Thanks,
> Miklos
> --
> 
>  drivers/block/loop.c      |    5 
>  fs/nfsd/vfs.c             |    9 -
>  fs/pipe.c                 |   58 -------
>  fs/splice.c               |  371 +++++-----------------------------------------
>  include/linux/fs.h        |    2 
>  include/linux/pipe_fs_i.h |   36 ----
>  kernel/relay.c            |    2 
>  mm/filemap.c              |    2 
>  net/core/skbuff.c         |    9 -
>  9 files changed, 47 insertions(+), 447 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 0/4] splice: cleanups and fixes
  2008-06-21 17:20 ` [rfc patch 0/4] splice: cleanups and fixes Subrata Modak
@ 2008-06-22  6:16   ` Miklos Szeredi
  2008-06-23 15:26     ` Subrata Modak
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-22  6:16 UTC (permalink / raw)
  To: subrata; +Cc: miklos, jens.axboe, linux-fsdevel, ltp-list

> In the backdrop of the changes in splice(), would you also like to help
> us in reviewing the existing testcase for splice() in LTP:
> 
> http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/splice/,

Well, there's probably lot more different tests that could be done:

 - splicing from various sources (socket, pipe, regular file)
 - splicing to various sources (...)
 - splicing in full page(s)
 - splicing in partial page(s)
 - filling the destination pipe (blocking/nonblocking)
 - splicing in a big buffer (bigger than the pipe buffer)
 - splicing out full buffers
 - splicing out partial buffers
 - splicing out multiple buffers
 - splicing out more data then is in the pipe (blocking/nonblocking)
 - parallel splicing (in and/or out)

And then there's vmsplice(), which I haven't even looked at.

> It would also be great if you can send us some patch as well in the
> regard.

Unfortunately this would be a much bigger project than the cleanup I
did on the kernel splice code, and I don't really have the time to do
it :(

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 0/4] splice: cleanups and fixes
  2008-06-22  6:16   ` Miklos Szeredi
@ 2008-06-23 15:26     ` Subrata Modak
  2008-06-25 13:17       ` [LTP] " Subrata Modak
  0 siblings, 1 reply; 38+ messages in thread
From: Subrata Modak @ 2008-06-23 15:26 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jens.axboe, linux-fsdevel, ltp-list

Hi Miklos ,

On Sun, 2008-06-22 at 08:16 +0200, Miklos Szeredi wrote:
> > In the backdrop of the changes in splice(), would you also like to help
> > us in reviewing the existing testcase for splice() in LTP:
> > 
> > http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/splice/,
> 
> Well, there's probably lot more different tests that could be done:
> 
>  - splicing from various sources (socket, pipe, regular file)
>  - splicing to various sources (...)
>  - splicing in full page(s)
>  - splicing in partial page(s)
>  - filling the destination pipe (blocking/nonblocking)
>  - splicing in a big buffer (bigger than the pipe buffer)
>  - splicing out full buffers
>  - splicing out partial buffers
>  - splicing out multiple buffers
>  - splicing out more data then is in the pipe (blocking/nonblocking)
>  - parallel splicing (in and/or out)
> 

Thanks for pointing out the areas that can be tested in addition to what
is existing. The list is very comprehensive. I will see if i can get
somebody to work on this. Will put you in CC for review. Hope you do not
mind that.

> And then there's vmsplice(), which I haven't even looked at.

Yes, please do look into this and give us more feedback like the splice
above. You can find it here:

http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/vmsplice/.

> 
> > It would also be great if you can send us some patch as well in the
> > regard.
> 
> Unfortunately this would be a much bigger project than the cleanup I
> did on the kernel splice code, and I don't really have the time to do
> it :(

We appreciate the already feedback you have given us. Hope you will also
help in future review of the testcases.

Regards--
Subrata

> 
> Miklos


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 2/4] splice: remove steal from pipe_buf_operations
  2008-06-21 15:46 ` [rfc patch 2/4] splice: remove steal from pipe_buf_operations Miklos Szeredi
@ 2008-06-24  8:01   ` Jens Axboe
  2008-06-24  8:50     ` Miklos Szeredi
  0 siblings, 1 reply; 38+ messages in thread
From: Jens Axboe @ 2008-06-24  8:01 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

On Sat, Jun 21 2008, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> The 'steal' operation hasn't been used for some time.  Remove it and
> the associated dead code.  If it's needed in the future, it can always
> be easily restored.

I'd rather not just remove this, it's basically waiting for Nick to make
good on his promise to make stealing work again (he disabled it).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-21 15:46 ` [rfc patch 3/4] splice: remove confirm " Miklos Szeredi
@ 2008-06-24  8:04   ` Jens Axboe
  2008-06-24  8:54     ` Miklos Szeredi
  0 siblings, 1 reply; 38+ messages in thread
From: Jens Axboe @ 2008-06-24  8:04 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

On Sat, Jun 21 2008, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> The 'confirm' operation was only used for splicing from page cache, to
> wait for read on a page to finish.  But generic_file_splice_read()
> already blocks on readahead reads, so it seems logical to block on the
> rare and slow single page reads too.
> 
> So wait for readpage to finish inside __generic_file_splice_read() and
> remove the 'confirm' method.
> 
> This also fixes short return counts when the filesystem (e.g. fuse)
> invalidates the page between insertation and removal.

One of the basic goals of splice is to allow the pipe buffer to only be
consisten when a consumer asks for it, otherwise the filling will always
be sync. There should be no blocking on reads in the splice-in path,
only on consumption for splice-out.

So nack on this one, too.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 4/4] splice: use do_generic_file_read()
  2008-06-21 15:46 ` [rfc patch 4/4] splice: use do_generic_file_read() Miklos Szeredi
@ 2008-06-24  8:05   ` Jens Axboe
  2008-06-24 11:11     ` Miklos Szeredi
  0 siblings, 1 reply; 38+ messages in thread
From: Jens Axboe @ 2008-06-24  8:05 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

On Sat, Jun 21 2008, Miklos Szeredi wrote:
> From: Miklos Szeredi <mszeredi@suse.cz>
> 
> __generic_file_splice_read() duplicates a lot of stuff common with the
> generic page cache reading.  So reuse that code instead to simplify
> the page cache splice code.
> 
> This also fixes some corner cases which weren't properly handled in
> the splice code because of complexity issues.  In particular it fixes
> a problem when the filesystem (e.g. fuse) invalidates pages during the
> splice operation.
> 
> There might be some slight fall in performance due to the removal of
> the gang lookup for pages.  However I'm not sure if this is
> significant enough to warrant the extra complication.

This makes everything sync, it's a no go. Why don't we look into fixing
the invalidate problem that fuse sees (can you elaborate on that?)?

I tried to do a quick performance test with your patches for comparison
in the cached case, but it crashes immediately in spd_release_page()


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 2/4] splice: remove steal from pipe_buf_operations
  2008-06-24  8:01   ` Jens Axboe
@ 2008-06-24  8:50     ` Miklos Szeredi
  2008-06-24 12:21       ` Nick Piggin
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24  8:50 UTC (permalink / raw)
  To: jens.axboe
  Cc: miklos, linux-fsdevel, linux-kernel, linux-mm, torvalds, npiggin

> On Sat, Jun 21 2008, Miklos Szeredi wrote:
> > From: Miklos Szeredi <mszeredi@suse.cz>
> > 
> > The 'steal' operation hasn't been used for some time.  Remove it and
> > the associated dead code.  If it's needed in the future, it can always
> > be easily restored.
> 
> I'd rather not just remove this, it's basically waiting for Nick to make
> good on his promise to make stealing work again (he disabled it).

Yes, I read the commit log.  He disabled it with the intent that we
may or may not want this interface back, and possibly in some other
form.

Jens, does somebody need this feature?  What for?  If not, then I
guess nobody will bother implementing it.

(Nick CC-d)

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24  8:04   ` Jens Axboe
@ 2008-06-24  8:54     ` Miklos Szeredi
  2008-06-24 11:19       ` Jens Axboe
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24  8:54 UTC (permalink / raw)
  To: jens.axboe; +Cc: miklos, linux-fsdevel, linux-kernel, linux-mm, torvalds

> > The 'confirm' operation was only used for splicing from page cache, to
> > wait for read on a page to finish.  But generic_file_splice_read()
> > already blocks on readahead reads, so it seems logical to block on the
> > rare and slow single page reads too.
> > 
> > So wait for readpage to finish inside __generic_file_splice_read() and
> > remove the 'confirm' method.
> > 
> > This also fixes short return counts when the filesystem (e.g. fuse)
> > invalidates the page between insertation and removal.
> 
> One of the basic goals of splice is to allow the pipe buffer to only be
> consisten when a consumer asks for it, otherwise the filling will always
> be sync. There should be no blocking on reads in the splice-in path,
> only on consumption for splice-out.

What you are ignoring (and I've mentioned in the changelog) is that it
is *already* sync.  Look at the code: this starts I/O:

		page_cache_sync_readahead(mapping, &in->f_ra, in,
				index, req_pages - spd.nr_pages);

And this waits for it to finish:

		if (!PageUptodate(page)) {
			...
				lock_page(page);

The only way it will be async, is if there's no readahead.  But do we
want to optmize that case?

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 4/4] splice: use do_generic_file_read()
  2008-06-24  8:05   ` Jens Axboe
@ 2008-06-24 11:11     ` Miklos Szeredi
  0 siblings, 0 replies; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 11:11 UTC (permalink / raw)
  To: jens.axboe; +Cc: miklos, linux-fsdevel, linux-kernel, linux-mm, torvalds

> On Sat, Jun 21 2008, Miklos Szeredi wrote:
> > From: Miklos Szeredi <mszeredi@suse.cz>
> > 
> > __generic_file_splice_read() duplicates a lot of stuff common with the
> > generic page cache reading.  So reuse that code instead to simplify
> > the page cache splice code.
> > 
> > This also fixes some corner cases which weren't properly handled in
> > the splice code because of complexity issues.  In particular it fixes
> > a problem when the filesystem (e.g. fuse) invalidates pages during the
> > splice operation.
> > 
> > There might be some slight fall in performance due to the removal of
> > the gang lookup for pages.  However I'm not sure if this is
> > significant enough to warrant the extra complication.
> 
> This makes everything sync, it's a no go. Why don't we look into fixing
> the invalidate problem that fuse sees (can you elaborate on that?)?

Fuse does invalidate_inode_pages2() if it detects stale cached data.
It does this in open() and getattr() for example.

If the invalidation happened while the page was sitting in the pipe,
then splice-out will return a zero count.  Nfsd will happily send this
zero length data to the client, which will interpret it as I/O error
(page isn't uptodate).

Or the invalidation can happen during splice as well, which can also
return a short or zero count:

			/*
			 * page was truncated, stop here. if this isn't the
			 * first page, we'll just complete what we already
			 * added
			 */
			if (!page->mapping) {
				unlock_page(page);
				break;
			}


> I tried to do a quick performance test with your patches for comparison
> in the cached case, but it crashes immediately in spd_release_page()

Argh, yeah I didn't test properly.  It didn't stop filling the buffers
after PIPE_BUFFERS pages.  Thanks for noticing!

Here's an updated 4/4 patch:

Miklos
--
Subject: splice: use do_generic_file_read()

From: Miklos Szeredi <mszeredi@suse.cz>

__generic_file_splice_read() duplicates a lot of stuff common with the
generic page cache reading.  So reuse that code instead to simplify
the page cache splice code.

This also fixes some corner cases which weren't properly handled in
the splice code because of complexity issues.  In particular it fixes
a problem when the filesystem (e.g. fuse) invalidates pages during the
splice operation.

There might be some slight fall in performance due to the removal of
the gang lookup for pages.  However I'm not sure if this is
significant enough to warrant the extra complication.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/splice.c        |  247 +++++++----------------------------------------------
 include/linux/fs.h |    2 
 mm/filemap.c       |    2 
 3 files changed, 40 insertions(+), 211 deletions(-)

Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c	2008-06-21 17:36:47.000000000 +0200
+++ linux-2.6/fs/splice.c	2008-06-22 11:33:34.000000000 +0200
@@ -159,208 +159,26 @@ static void spd_release_page(struct spli
 	page_cache_release(spd->pages[i]);
 }
 
-static int
-__generic_file_splice_read(struct file *in, loff_t *ppos,
-			   struct pipe_inode_info *pipe, size_t len,
-			   unsigned int flags)
+static int file_splice_read_actor(read_descriptor_t *desc, struct page *page,
+				  unsigned long offset, unsigned long size)
 {
-	struct address_space *mapping = in->f_mapping;
-	unsigned int loff, nr_pages, req_pages;
-	struct page *pages[PIPE_BUFFERS];
-	struct partial_page partial[PIPE_BUFFERS];
-	struct page *page;
-	pgoff_t index, end_index;
-	loff_t isize;
-	int error, page_nr;
-	struct splice_pipe_desc spd = {
-		.pages = pages,
-		.partial = partial,
-		.flags = flags,
-		.ops = &page_cache_pipe_buf_ops,
-		.spd_release = spd_release_page,
-	};
-
-	index = *ppos >> PAGE_CACHE_SHIFT;
-	loff = *ppos & ~PAGE_CACHE_MASK;
-	req_pages = (len + loff + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	nr_pages = min(req_pages, (unsigned)PIPE_BUFFERS);
-
-	/*
-	 * Lookup the (hopefully) full range of pages we need.
-	 */
-	spd.nr_pages = find_get_pages_contig(mapping, index, nr_pages, pages);
-	index += spd.nr_pages;
-
-	/*
-	 * If find_get_pages_contig() returned fewer pages than we needed,
-	 * readahead/allocate the rest and fill in the holes.
-	 */
-	if (spd.nr_pages < nr_pages)
-		page_cache_sync_readahead(mapping, &in->f_ra, in,
-				index, req_pages - spd.nr_pages);
-
-	error = 0;
-	while (spd.nr_pages < nr_pages) {
-		/*
-		 * Page could be there, find_get_pages_contig() breaks on
-		 * the first hole.
-		 */
-		page = find_get_page(mapping, index);
-		if (!page) {
-			/*
-			 * page didn't exist, allocate one.
-			 */
-			page = page_cache_alloc_cold(mapping);
-			if (!page)
-				break;
-
-			error = add_to_page_cache_lru(page, mapping, index,
-						mapping_gfp_mask(mapping));
-			if (unlikely(error)) {
-				page_cache_release(page);
-				if (error == -EEXIST)
-					continue;
-				break;
-			}
-			/*
-			 * add_to_page_cache() locks the page, unlock it
-			 * to avoid convoluting the logic below even more.
-			 */
-			unlock_page(page);
-		}
-
-		pages[spd.nr_pages++] = page;
-		index++;
-	}
-
-	/*
-	 * Now loop over the map and see if we need to start IO on any
-	 * pages, fill in the partial map, etc.
-	 */
-	index = *ppos >> PAGE_CACHE_SHIFT;
-	nr_pages = spd.nr_pages;
-	spd.nr_pages = 0;
-	for (page_nr = 0; page_nr < nr_pages; page_nr++) {
-		unsigned int this_len;
-
-		if (!len)
-			break;
-
-		/*
-		 * this_len is the max we'll use from this page
-		 */
-		this_len = min_t(unsigned long, len, PAGE_CACHE_SIZE - loff);
-		page = pages[page_nr];
-
-		if (PageReadahead(page))
-			page_cache_async_readahead(mapping, &in->f_ra, in,
-					page, index, req_pages - page_nr);
-
-		/*
-		 * If the page isn't uptodate, we may need to start io on it
-		 */
-		if (!PageUptodate(page)) {
-			/*
-			 * If in nonblock mode then dont block on waiting
-			 * for an in-flight io page
-			 */
-			if (flags & SPLICE_F_NONBLOCK) {
-				if (TestSetPageLocked(page)) {
-					error = -EAGAIN;
-					break;
-				}
-			} else
-				lock_page(page);
-
-			/*
-			 * page was truncated, stop here. if this isn't the
-			 * first page, we'll just complete what we already
-			 * added
-			 */
-			if (!page->mapping) {
-				unlock_page(page);
-				break;
-			}
-			/*
-			 * page was already under io and is now done, great
-			 */
-			if (PageUptodate(page)) {
-				unlock_page(page);
-				goto fill_it;
-			}
+	struct splice_pipe_desc *spd = desc->arg.data;
+	unsigned long count = desc->count;
 
-			/*
-			 * need to read in the page
-			 */
-			error = mapping->a_ops->readpage(in, page);
-			if (unlikely(error)) {
-				/*
-				 * We really should re-lookup the page here,
-				 * but it complicates things a lot. Instead
-				 * lets just do what we already stored, and
-				 * we'll get it the next time we are called.
-				 */
-				if (error == AOP_TRUNCATED_PAGE)
-					error = 0;
+	BUG_ON(spd->nr_pages >= PIPE_BUFFERS);
 
-				break;
-			}
-			wait_on_page_locked(page);
-			if (!PageUptodate(page)) {
-				error = -EIO;
-				break;
-			}
-		}
-fill_it:
-		/*
-		 * i_size must be checked after PageUptodate.
-		 */
-		isize = i_size_read(mapping->host);
-		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
-		if (unlikely(!isize || index > end_index))
-			break;
+	if (size > count)
+		size = count;
 
-		/*
-		 * if this is the last page, see if we need to shrink
-		 * the length and stop
-		 */
-		if (end_index == index) {
-			unsigned int plen;
-
-			/*
-			 * max good bytes in this page
-			 */
-			plen = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
-			if (plen <= loff)
-				break;
-
-			/*
-			 * force quit after adding this page
-			 */
-			this_len = min(this_len, plen - loff);
-			len = this_len;
-		}
-
-		partial[page_nr].offset = loff;
-		partial[page_nr].len = this_len;
-		len -= this_len;
-		loff = 0;
-		spd.nr_pages++;
-		index++;
-	}
-
-	/*
-	 * Release any pages at the end, if we quit early. 'page_nr' is how far
-	 * we got, 'nr_pages' is how many pages are in the map.
-	 */
-	while (page_nr < nr_pages)
-		page_cache_release(pages[page_nr++]);
-	in->f_ra.prev_pos = (loff_t)index << PAGE_CACHE_SHIFT;
+	page_cache_get(page);
+	spd->pages[spd->nr_pages] = page;
+	spd->partial[spd->nr_pages].offset = offset;
+	spd->partial[spd->nr_pages].len = size;
+	spd->nr_pages++;
 
-	if (spd.nr_pages)
-		return splice_to_pipe(pipe, &spd);
+	desc->count = count - size;
 
-	return error;
+	return size;
 }
 
 /**
@@ -381,24 +199,33 @@ ssize_t generic_file_splice_read(struct 
 				 struct pipe_inode_info *pipe, size_t len,
 				 unsigned int flags)
 {
-	loff_t isize, left;
-	int ret;
-
-	isize = i_size_read(in->f_mapping->host);
-	if (unlikely(*ppos >= isize))
-		return 0;
-
-	left = isize - *ppos;
-	if (unlikely(left < len))
-		len = left;
+	ssize_t ret;
+	loff_t pos = *ppos;
+	size_t offset = pos & ~PAGE_CACHE_MASK;
+	struct page *pages[PIPE_BUFFERS];
+	struct partial_page partial[PIPE_BUFFERS];
+	struct splice_pipe_desc spd = {
+		.pages = pages,
+		.partial = partial,
+		.flags = flags,
+		.ops = &page_cache_pipe_buf_ops,
+		.spd_release = spd_release_page,
+	};
+	read_descriptor_t desc = {
+		.count = min(len, (PIPE_BUFFERS << PAGE_CACHE_SHIFT) - offset),
+		.arg.data = &spd,
+	};
 
-	ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
-	if (ret > 0)
-		*ppos += ret;
+	do_generic_file_read(in, &pos, &desc, file_splice_read_actor);
+	ret = desc.error;
+	if (spd.nr_pages) {
+		ret = splice_to_pipe(pipe, &spd);
+		if (ret > 0)
+			*ppos += ret;
+	}
 
 	return ret;
 }
-
 EXPORT_SYMBOL(generic_file_splice_read);
 
 /*
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2008-06-21 17:17:41.000000000 +0200
+++ linux-2.6/include/linux/fs.h	2008-06-21 17:36:47.000000000 +0200
@@ -1854,6 +1854,8 @@ extern ssize_t do_sync_read(struct file 
 extern ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos);
 extern int generic_segment_checks(const struct iovec *iov,
 		unsigned long *nr_segs, size_t *count, int access_flags);
+extern void do_generic_file_read(struct file *filp, loff_t *ppos,
+				 read_descriptor_t *desc, read_actor_t actor);
 
 /* fs/splice.c */
 extern ssize_t generic_file_splice_read(struct file *, loff_t *,
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c	2008-06-21 17:17:41.000000000 +0200
+++ linux-2.6/mm/filemap.c	2008-06-21 17:36:47.000000000 +0200
@@ -891,7 +891,7 @@ static void shrink_readahead_size_eio(st
  * This is really ugly. But the goto's actually try to clarify some
  * of the logic when it comes to error handling etc.
  */
-static void do_generic_file_read(struct file *filp, loff_t *ppos,
+void do_generic_file_read(struct file *filp, loff_t *ppos,
 		read_descriptor_t *desc, read_actor_t actor)
 {
 	struct address_space *mapping = filp->f_mapping;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24  8:54     ` Miklos Szeredi
@ 2008-06-24 11:19       ` Jens Axboe
  2008-06-24 11:36         ` Miklos Szeredi
  0 siblings, 1 reply; 38+ messages in thread
From: Jens Axboe @ 2008-06-24 11:19 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, linux-mm, torvalds

On Tue, Jun 24 2008, Miklos Szeredi wrote:
> > > The 'confirm' operation was only used for splicing from page cache, to
> > > wait for read on a page to finish.  But generic_file_splice_read()
> > > already blocks on readahead reads, so it seems logical to block on the
> > > rare and slow single page reads too.
> > > 
> > > So wait for readpage to finish inside __generic_file_splice_read() and
> > > remove the 'confirm' method.
> > > 
> > > This also fixes short return counts when the filesystem (e.g. fuse)
> > > invalidates the page between insertation and removal.
> > 
> > One of the basic goals of splice is to allow the pipe buffer to only be
> > consisten when a consumer asks for it, otherwise the filling will always
> > be sync. There should be no blocking on reads in the splice-in path,
> > only on consumption for splice-out.
> 
> What you are ignoring (and I've mentioned in the changelog) is that it
> is *already* sync.  Look at the code: this starts I/O:
> 
> 		page_cache_sync_readahead(mapping, &in->f_ra, in,
> 				index, req_pages - spd.nr_pages);
> 
> And this waits for it to finish:
> 
> 		if (!PageUptodate(page)) {
> 			...
> 				lock_page(page);
> 
> The only way it will be async, is if there's no readahead.  But do we
> want to optmize that case?

It's an unfortunate side effect of the read-ahead, I'd much rather just
get rid of that. It _should_ behave like the non-ra case, when a page is
added it merely has IO started on it. So we want to have that be
something like

        if (!PageUptodate(page) && !PageInFlight(page))
                ...

basically like PageWriteback(), but for read-in.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 11:19       ` Jens Axboe
@ 2008-06-24 11:36         ` Miklos Szeredi
  2008-06-24 11:46           ` Evgeniy Polyakov
                             ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 11:36 UTC (permalink / raw)
  To: jens.axboe; +Cc: miklos, linux-fsdevel, linux-kernel, linux-mm, torvalds

> It's an unfortunate side effect of the read-ahead, I'd much rather just
> get rid of that. It _should_ behave like the non-ra case, when a page is
> added it merely has IO started on it. So we want to have that be
> something like
> 
>         if (!PageUptodate(page) && !PageInFlight(page))
>                 ...
> 
> basically like PageWriteback(), but for read-in.

OK it could be done, possibly at great pain.  But why is it important?
What's the use case where it matters that splice-in should not block
on the read?

And note, after the pipe is full it will block no matter what, since
the consumer will have to wait until the page is brought uptodate, and
can only then commence with getting the data out from the pipe.

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 11:36         ` Miklos Szeredi
@ 2008-06-24 11:46           ` Evgeniy Polyakov
  2008-06-24 12:02             ` Miklos Szeredi
  2008-06-24 12:16           ` Nick Piggin
  2008-06-24 17:30           ` Linus Torvalds
  2 siblings, 1 reply; 38+ messages in thread
From: Evgeniy Polyakov @ 2008-06-24 11:46 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm, torvalds

On Tue, Jun 24, 2008 at 01:36:35PM +0200, Miklos Szeredi (miklos@szeredi.hu) wrote:
> > basically like PageWriteback(), but for read-in.
> 
> OK it could be done, possibly at great pain.  But why is it important?

Maybe not that great if mark all readahead pages as, well, readahead,
and do the same for readpage (essnetially it is the same).

> What's the use case where it matters that splice-in should not block
> on the read?

To be able to transfer what was already read?

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 11:46           ` Evgeniy Polyakov
@ 2008-06-24 12:02             ` Miklos Szeredi
  2008-06-24 12:15               ` Evgeniy Polyakov
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 12:02 UTC (permalink / raw)
  To: johnpol; +Cc: miklos, jens.axboe, linux-fsdevel, linux-kernel, linux-mm,
	torvalds

> > > basically like PageWriteback(), but for read-in.
> > 
> > OK it could be done, possibly at great pain.  But why is it important?
> 
> Maybe not that great if mark all readahead pages as, well, readahead,
> and do the same for readpage (essnetially it is the same).

It isn't that easy.  Readahead (->readpages()) is best effort, and is
allowed to not bring the page uptodate, since it will be retried with
->readpage().  I don't know whether any filesystems actually do that,
but it's allowed nonetheless.

> > What's the use case where it matters that splice-in should not block
> > on the read?
> 
> To be able to transfer what was already read?

That needs the consumer to be non-blocking...

Umm, one more reason why the ->confirm() stuff is currently busted:
pipe_read() will block on such a buffer even if pipe file is marked
O_NONBLOCK.  Fixing that would take a hell of a lot of added
complexity in pipe_poll(), etc...

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 12:02             ` Miklos Szeredi
@ 2008-06-24 12:15               ` Evgeniy Polyakov
  0 siblings, 0 replies; 38+ messages in thread
From: Evgeniy Polyakov @ 2008-06-24 12:15 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm, torvalds

On Tue, Jun 24, 2008 at 02:02:19PM +0200, Miklos Szeredi (miklos@szeredi.hu) wrote:
> > Maybe not that great if mark all readahead pages as, well, readahead,
> > and do the same for readpage (essnetially it is the same).
> 
> It isn't that easy.  Readahead (->readpages()) is best effort, and is
> allowed to not bring the page uptodate, since it will be retried with
> ->readpage().  I don't know whether any filesystems actually do that,
> but it's allowed nonetheless.

Yes, there is such filesystem :)
It is quite useful for network FS, since it does not bother to wait until
pages are in the cache and can try next request. Anyone who scheduled
readahead has full control over that pages and is allowed to set/clear
whatever flags it want (pages are locked), so it would be a great win to
set page as being read and unlocked. It can be a policy to clear read
bit when page is evicted from the cache by failed readahead/readpage(s).

> > > What's the use case where it matters that splice-in should not block
> > > on the read?
> > 
> > To be able to transfer what was already read?
> 
> That needs the consumer to be non-blocking...
> 
> Umm, one more reason why the ->confirm() stuff is currently busted:
> pipe_read() will block on such a buffer even if pipe file is marked
> O_NONBLOCK.  Fixing that would take a hell of a lot of added
> complexity in pipe_poll(), etc...

Yes, nonblocking splice is tricky and it covers only half of the users.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 11:36         ` Miklos Szeredi
  2008-06-24 11:46           ` Evgeniy Polyakov
@ 2008-06-24 12:16           ` Nick Piggin
  2008-06-24 12:22             ` Jens Axboe
  2008-06-24 13:00             ` Miklos Szeredi
  2008-06-24 17:30           ` Linus Torvalds
  2 siblings, 2 replies; 38+ messages in thread
From: Nick Piggin @ 2008-06-24 12:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm, torvalds

On Tuesday 24 June 2008 21:36, Miklos Szeredi wrote:
> > It's an unfortunate side effect of the read-ahead, I'd much rather just
> > get rid of that. It _should_ behave like the non-ra case, when a page is
> > added it merely has IO started on it. So we want to have that be
> > something like
> >
> >         if (!PageUptodate(page) && !PageInFlight(page))
> >                 ...
> >
> > basically like PageWriteback(), but for read-in.
>
> OK it could be done, possibly at great pain.  But why is it important?

It has been considered, but adding atomic operations on these paths
always really hurts. Adding something like this would basically be
another at least 2 atomic operations that can never be removed again...

Provided that you've done the sync readahead earlier, it presumably
should be a very rare case to have to start new IO in the loop
below, right? In which case, I wonder if we couldn't move that 2nd
loop out of generic_file_splice_read and into
page_cache_pipe_buf_confirm. 


> What's the use case where it matters that splice-in should not block
> on the read?

It just makes it generally less able to pipeline IO and computation,
doesn't it?


> And note, after the pipe is full it will block no matter what, since
> the consumer will have to wait until the page is brought uptodate, and
> can only then commence with getting the data out from the pipe.

True, but (especially with patches to variably size the pipe buffer)
I imagine programs could be designed fairly carefully to the size of
the buffer (and not just things that blast bulk data down the pipe...)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 2/4] splice: remove steal from pipe_buf_operations
  2008-06-24  8:50     ` Miklos Szeredi
@ 2008-06-24 12:21       ` Nick Piggin
  0 siblings, 0 replies; 38+ messages in thread
From: Nick Piggin @ 2008-06-24 12:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm, torvalds,
	npiggin

On Tuesday 24 June 2008 18:50, Miklos Szeredi wrote:
> > On Sat, Jun 21 2008, Miklos Szeredi wrote:
> > > From: Miklos Szeredi <mszeredi@suse.cz>
> > >
> > > The 'steal' operation hasn't been used for some time.  Remove it and
> > > the associated dead code.  If it's needed in the future, it can always
> > > be easily restored.
> >
> > I'd rather not just remove this, it's basically waiting for Nick to make
> > good on his promise to make stealing work again (he disabled it).
>
> Yes, I read the commit log.  He disabled it with the intent that we
> may or may not want this interface back, and possibly in some other
> form.
>
> Jens, does somebody need this feature?  What for?  If not, then I
> guess nobody will bother implementing it.
>
> (Nick CC-d)

I've kind of wanted to implement it "because it is cool", but also
because I hoped to ensure the new write_begin/write_end APIs would
be general enough to support it.

Unfortunately I'm not aware of a killer feature so it's kept going
down the todo pile. However it really is one of those things where
you need the API to work before you get programs emerging that make
interesting uses out of it. I think if it can be done without adding
complication to pagecache/filesystem code, then it would be nice.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 12:16           ` Nick Piggin
@ 2008-06-24 12:22             ` Jens Axboe
  2008-06-24 13:00             ` Miklos Szeredi
  1 sibling, 0 replies; 38+ messages in thread
From: Jens Axboe @ 2008-06-24 12:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, linux-mm, torvalds

On Tue, Jun 24 2008, Nick Piggin wrote:
> On Tuesday 24 June 2008 21:36, Miklos Szeredi wrote:
> > > It's an unfortunate side effect of the read-ahead, I'd much rather just
> > > get rid of that. It _should_ behave like the non-ra case, when a page is
> > > added it merely has IO started on it. So we want to have that be
> > > something like
> > >
> > >         if (!PageUptodate(page) && !PageInFlight(page))
> > >                 ...
> > >
> > > basically like PageWriteback(), but for read-in.
> >
> > OK it could be done, possibly at great pain.  But why is it important?
> 
> It has been considered, but adding atomic operations on these paths
> always really hurts. Adding something like this would basically be
> another at least 2 atomic operations that can never be removed again...
> 
> Provided that you've done the sync readahead earlier, it presumably
> should be a very rare case to have to start new IO in the loop
> below, right? In which case, I wonder if we couldn't move that 2nd
> loop out of generic_file_splice_read and into
> page_cache_pipe_buf_confirm. 

That's a good point, moving those blocks of code to the other end makes
a lot of sense. Or just kill the read-ahead, or at least do it
differently. It's definitely an oversight/bug having splice from file
block on the pages it just issued read-ahead for.

> > What's the use case where it matters that splice-in should not block
> > on the read?
> 
> It just makes it generally less able to pipeline IO and computation,
> doesn't it?

Precisely!

> > And note, after the pipe is full it will block no matter what, since
> > the consumer will have to wait until the page is brought uptodate, and
> > can only then commence with getting the data out from the pipe.
> 
> True, but (especially with patches to variably size the pipe buffer)
> I imagine programs could be designed fairly carefully to the size of
> the buffer (and not just things that blast bulk data down the pipe...)

Yep, that's the whole premise for the dynpipe branch I've been carrying
around for some time.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 12:16           ` Nick Piggin
  2008-06-24 12:22             ` Jens Axboe
@ 2008-06-24 13:00             ` Miklos Szeredi
  1 sibling, 0 replies; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 13:00 UTC (permalink / raw)
  To: nickpiggin
  Cc: miklos, jens.axboe, linux-fsdevel, linux-kernel, linux-mm,
	torvalds

> > > It's an unfortunate side effect of the read-ahead, I'd much rather just
> > > get rid of that. It _should_ behave like the non-ra case, when a page is
> > > added it merely has IO started on it. So we want to have that be
> > > something like
> > >
> > >         if (!PageUptodate(page) && !PageInFlight(page))
> > >                 ...
> > >
> > > basically like PageWriteback(), but for read-in.
> >
> > OK it could be done, possibly at great pain.  But why is it important?
> 
> It has been considered, but adding atomic operations on these paths
> always really hurts. Adding something like this would basically be
> another at least 2 atomic operations that can never be removed again...
> 
> Provided that you've done the sync readahead earlier, it presumably
> should be a very rare case to have to start new IO in the loop
> below, right? In which case, I wonder if we couldn't move that 2nd
> loop out of generic_file_splice_read and into
> page_cache_pipe_buf_confirm. 

The problem with that second loop (which started this thing) is that
if a page is invalidated by the filesystem, then it doesn't redo the
lookup/read like the plain cached read does.

And that can't be done in page_cache_pipe_buf_confirm() at all.

> > What's the use case where it matters that splice-in should not block
> > on the read?
> 
> It just makes it generally less able to pipeline IO and computation,
> doesn't it?

Maybe.  I don't really see how splice might be used that would be
helped by this.  Do you have a concrete example?

In fact I don't really know at all what splice is being used for
(other than the in kernel uses: nfsd, sendfile).  

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 11:36         ` Miklos Szeredi
  2008-06-24 11:46           ` Evgeniy Polyakov
  2008-06-24 12:16           ` Nick Piggin
@ 2008-06-24 17:30           ` Linus Torvalds
  2008-06-24 18:24             ` Miklos Szeredi
  2 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2008-06-24 17:30 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm



On Tue, 24 Jun 2008, Miklos Szeredi wrote:
> 
> OK it could be done, possibly at great pain.  But why is it important?
> What's the use case where it matters that splice-in should not block
> on the read?

If you're splicing from one file to another, the _goal_ you should have is 
that you want to have a mode where you can literally steal the page, and 
never _ever_ be IO-synchronous (well, meta-data accesses will be, you 
can't really avoid that sanely).

IOW, it should be possible to do a

 - splice() file->pipe with SPLICE_STEAL
	don't even wait for the read to finish!

 - splice() pipe->file
	insert the page into the destination page cache, mark it dirty

an no, we probably do not support that yet (for example, I wouldn't be 
surprised if "dirty + !uptodate" is considered an error for the VM even 
though the page should still be locked from the read), but it really was a 
design goal.

Also, asynchronous is important even when you "just" want to overlap IO 
with CPU, so even if it's going to the network, then if you can delay the 
"wait for IO to complete" until the last possible moment (ie the _second_ 
splice, when you end up copying it into an SKB, then both your throughput 
and your latency are likely going to be noticeably better, because you've 
now been able to do a lot of the costly CPU work (system exit + entry at 
the least, but hopefully a noticeable portion of the TCP stack too) 
overlapped with the disk seeking.

So asynchronous ops was really one of the big goals for splice. 

			Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 17:30           ` Linus Torvalds
@ 2008-06-24 18:24             ` Miklos Szeredi
  2008-06-24 18:31               ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 18:24 UTC (permalink / raw)
  To: torvalds; +Cc: miklos, jens.axboe, linux-fsdevel, linux-kernel, linux-mm

> > 
> > OK it could be done, possibly at great pain.  But why is it important?
> > What's the use case where it matters that splice-in should not block
> > on the read?
> 
> If you're splicing from one file to another, the _goal_ you should have is 
> that you want to have a mode where you can literally steal the page, and 
> never _ever_ be IO-synchronous (well, meta-data accesses will be, you 
> can't really avoid that sanely).
> 
> IOW, it should be possible to do a
> 
>  - splice() file->pipe with SPLICE_STEAL
> 	don't even wait for the read to finish!
> 
>  - splice() pipe->file
> 	insert the page into the destination page cache, mark it dirty
> 
> an no, we probably do not support that yet (for example, I wouldn't be 
> surprised if "dirty + !uptodate" is considered an error for the VM even 
> though the page should still be locked from the read), but it really was a 
> design goal.

OK.  But currently we have an implementation that

 1) doesn't do any of this, unless readahead is disabled

 2) if readhead is disabled, it does the async splice-in (file->pipe),
    but blocks on splice-out (pipe->any)

 3) it blocks on read(pipefd, ...) even if pipefd is set to O_NONBLOCK

And in addition, splice-in and splice-out can return a short count or
even zero count if the filesystem invalidates the cached pages during
the splicing (data became stale for example).  Are these the right
semantics?  I'm not sure.

> Also, asynchronous is important even when you "just" want to overlap IO 
> with CPU, so even if it's going to the network, then if you can delay the 
> "wait for IO to complete" until the last possible moment (ie the _second_ 
> splice, when you end up copying it into an SKB, then both your throughput 
> and your latency are likely going to be noticeably better, because you've 
> now been able to do a lot of the costly CPU work (system exit + entry at 
> the least, but hopefully a noticeable portion of the TCP stack too) 
> overlapped with the disk seeking.

My feeling is (and I'm not an expert in this area at all) is that disk
seeking will be many orders of magnitude slower than any CPU work
associated with getting the data out to the network.

> So asynchronous ops was really one of the big goals for splice. 

Well, if it can be implemented right, I have nothing against that.
But what we currently have is very far from that, and it seems to me
there are very big hurdles to overcome yet.

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 18:24             ` Miklos Szeredi
@ 2008-06-24 18:31               ` Linus Torvalds
  2008-06-24 19:05                 ` Miklos Szeredi
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2008-06-24 18:31 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm



On Tue, 24 Jun 2008, Miklos Szeredi wrote:
> 
> OK.  But currently we have an implementation that
> 
>  1) doesn't do any of this, unless readahead is disabled

Sure. But removing even the conceptual support? Not a good idea.

> And in addition, splice-in and splice-out can return a short count or
> even zero count if the filesystem invalidates the cached pages during
> the splicing (data became stale for example).  Are these the right
> semantics?  I'm not sure.

What does that really have with splice() and removing the features? Why 
don't you just fix that issue? 

		Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 18:31               ` Linus Torvalds
@ 2008-06-24 19:05                 ` Miklos Szeredi
  2008-06-24 19:17                   ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 19:05 UTC (permalink / raw)
  To: torvalds; +Cc: miklos, jens.axboe, linux-fsdevel, linux-kernel, linux-mm

> On Tue, 24 Jun 2008, Miklos Szeredi wrote:
> > 
> > OK.  But currently we have an implementation that
> > 
> >  1) doesn't do any of this, unless readahead is disabled
> 
> Sure. But removing even the conceptual support? Not a good idea.
> 
> > And in addition, splice-in and splice-out can return a short count or
> > even zero count if the filesystem invalidates the cached pages during
> > the splicing (data became stale for example).  Are these the right
> > semantics?  I'm not sure.
> 
> What does that really have with splice() and removing the features? Why 
> don't you just fix that issue? 

Because it's freakin' difficult, and I'm lazy, that's why :)

Let's start with page_cache_pipe_buf_confirm().  How should we deal
with finding an invalidated page (!PageUptodate(page) &&
!page->mapping)?

We could return zero to use the contents even though it was
invalidated, not good, but if the page was originally uptodate, then
it should be OK to use the stale data.  But it could have been
invalidated before becoming uptodate, so the contents could be total
crap, and that's not good.  So now we have to tweak page invalidation
to differentiate between was-uptodate and was-never-uptodate pages.

The other is __generic_file_splice_read().  Currently it just bails
out if it finds an invalidated page.  That could be rewritten to throw
away the page, look it up again in the radix tree, etc, etc...  Lots
of added complexity in an already not-too-simple function.

All for what?  To be able to keep the async-on-no-readahead behavior
of generic_file_splice_read()?  The current implementation is not even
close to what would be required to do the async splicing properly.

Conclusion: I think we are better off with a simple
do_generic_file_read() based implementation until someone gives this
the proper thought and effort, than to leave all the complex and dead
code to rot and cause people (me) headaches... :)

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 19:05                 ` Miklos Szeredi
@ 2008-06-24 19:17                   ` Linus Torvalds
  2008-06-24 19:24                     ` Miklos Szeredi
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2008-06-24 19:17 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm



On Tue, 24 Jun 2008, Miklos Szeredi wrote:
> 
> Let's start with page_cache_pipe_buf_confirm().  How should we deal
> with finding an invalidated page (!PageUptodate(page) &&
> !page->mapping)?

I suspect we just have to use it. After all, it was valid when the read 
was done. The fact that it got invalidated later is kind of immaterial.

splice() is an optimized read+write. The read reads it into a temporary 
buffer. The fact that it's a zero-copy buffer and basically just re-uses 
the source doesn't really change that.

		Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 19:17                   ` Linus Torvalds
@ 2008-06-24 19:24                     ` Miklos Szeredi
  2008-06-24 19:26                       ` Miklos Szeredi
  2008-06-24 19:45                       ` Linus Torvalds
  0 siblings, 2 replies; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 19:24 UTC (permalink / raw)
  To: torvalds; +Cc: miklos, jens.axboe, linux-fsdevel, linux-kernel, linux-mm

> > 
> > Let's start with page_cache_pipe_buf_confirm().  How should we deal
> > with finding an invalidated page (!PageUptodate(page) &&
> > !page->mapping)?
> 
> I suspect we just have to use it. After all, it was valid when the read 
> was done. The fact that it got invalidated later is kind of immaterial.

Right.  But what if it's invalidated *before* becoming uptodate (if
you'd read my mail further, I discussed this).

Why does invalidate_complete_page2() do ClearPageUptodate()?  Dunno,
maybe it shoulnd't.  But that would need a rather thorough audit of
all code checking PageUptodate()...

> splice() is an optimized read+write. The read reads it into a temporary 
> buffer. The fact that it's a zero-copy buffer and basically just re-uses 
> the source doesn't really change that.

Agreed.

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 19:24                     ` Miklos Szeredi
@ 2008-06-24 19:26                       ` Miklos Szeredi
  2008-06-24 19:32                         ` Miklos Szeredi
  2008-06-24 19:45                       ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 19:26 UTC (permalink / raw)
  To: torvalds; +Cc: miklos, jens.axboe, linux-fsdevel, linux-kernel, linux-mm

> > > 
> > > Let's start with page_cache_pipe_buf_confirm().  How should we deal
> > > with finding an invalidated page (!PageUptodate(page) &&
> > > !page->mapping)?
> > 
> > I suspect we just have to use it. After all, it was valid when the read 
> > was done. The fact that it got invalidated later is kind of immaterial.
> 
> Right.  But what if it's invalidated *before* becoming uptodate (if
> you'd read my mail further, I discussed this).

Please ignore, this can't happen of course due to page locking...

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 19:26                       ` Miklos Szeredi
@ 2008-06-24 19:32                         ` Miklos Szeredi
  2008-06-24 19:47                           ` Linus Torvalds
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 19:32 UTC (permalink / raw)
  To: torvalds; +Cc: miklos, jens.axboe, linux-fsdevel, linux-kernel, linux-mm

> > > > 
> > > > Let's start with page_cache_pipe_buf_confirm().  How should we deal
> > > > with finding an invalidated page (!PageUptodate(page) &&
> > > > !page->mapping)?
> > > 
> > > I suspect we just have to use it. After all, it was valid when the read 
> > > was done. The fact that it got invalidated later is kind of immaterial.
> > 
> > Right.  But what if it's invalidated *before* becoming uptodate (if
> > you'd read my mail further, I discussed this).
> 
> Please ignore, this can't happen of course due to page locking...

Or it can only happen if there was an I/O error on reading the page.

So it's an issue after all...

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 19:24                     ` Miklos Szeredi
  2008-06-24 19:26                       ` Miklos Szeredi
@ 2008-06-24 19:45                       ` Linus Torvalds
  1 sibling, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2008-06-24 19:45 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm



On Tue, 24 Jun 2008, Miklos Szeredi wrote:
> 
> Right.  But what if it's invalidated *before* becoming uptodate (if
> you'd read my mail further, I discussed this).
> 
> Why does invalidate_complete_page2() do ClearPageUptodate()?  Dunno,
> maybe it shoulnd't.  But that would need a rather thorough audit of
> all code checking PageUptodate()...

Quite frankly, that shouldn't happen in the first place. Nothing should 
clear up-to-date on a page that is locked. Doesn't 
invalidate_complete_page() wait for the page to be unlocked already?

That said, the VM people already discussed (I think for other reasons), 
removing the ClearPageUptodate(), because it has problems. There's talk 
about just unhashing the page or moving it to another radix tree or 
something.

		Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 19:32                         ` Miklos Szeredi
@ 2008-06-24 19:47                           ` Linus Torvalds
  2008-06-24 20:06                             ` Miklos Szeredi
  0 siblings, 1 reply; 38+ messages in thread
From: Linus Torvalds @ 2008-06-24 19:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jens.axboe, linux-fsdevel, linux-kernel, linux-mm



On Tue, 24 Jun 2008, Miklos Szeredi wrote:
> 
> Or it can only happen if there was an I/O error on reading the page.

Now, IO errors are something else. They should have the PG_error bit set, 
and we should just return EIO or something.

That said, I'd not be surprised at all if error handling is broken. It's 
often been broken even in the _normal_ paths (ie totally normal "read()" 
etc).

		Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 3/4] splice: remove confirm from pipe_buf_operations
  2008-06-24 19:47                           ` Linus Torvalds
@ 2008-06-24 20:06                             ` Miklos Szeredi
  0 siblings, 0 replies; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-24 20:06 UTC (permalink / raw)
  To: torvalds; +Cc: miklos, jens.axboe, linux-fsdevel, linux-kernel, linux-mm

> > 
> > Or it can only happen if there was an I/O error on reading the page.
> 
> Now, IO errors are something else. They should have the PG_error bit set, 
> and we should just return EIO or something.

Linus, you're right (as always), but see where this is going?  A rare
problem (splice() returning short count because of an invalidated
page) is becoming an even more rare problem (splice() returning
rubbish instead of an error, if ->readpage() failed, and filesystem
forgot to set PG_error).  And it won't show up in any other paths,
because the generic_file_aio_read() path will just check
PageUptodate(), and return -EIO if not.

OK, maybe we should add a WARN_ON(!PageError()) for the
!PageUptodate() case in generic_file_aio_read(), but that could still
leave some filesystems broken for a long time which experience I/O
errors rarely.

So I think the only sane solution here is to remove
ClearPageUptodate().  But that's a VM people's call, I don't have
enough insight into that.

Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [LTP] [rfc patch 0/4] splice: cleanups and fixes
  2008-06-23 15:26     ` Subrata Modak
@ 2008-06-25 13:17       ` Subrata Modak
  2008-06-25 14:00         ` Miklos Szeredi
  0 siblings, 1 reply; 38+ messages in thread
From: Subrata Modak @ 2008-06-25 13:17 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, ltp-list, jens.axboe

On Mon, 2008-06-23 at 20:56 +0530, Subrata Modak wrote:
> Hi Miklos ,
> 
> On Sun, 2008-06-22 at 08:16 +0200, Miklos Szeredi wrote:
> > > In the backdrop of the changes in splice(), would you also like to help
> > > us in reviewing the existing testcase for splice() in LTP:
> > > 
> > > http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/splice/,
> > 
> > Well, there's probably lot more different tests that could be done:
> > 
> >  - splicing from various sources (socket, pipe, regular file)
> >  - splicing to various sources (...)
> >  - splicing in full page(s)
> >  - splicing in partial page(s)
> >  - filling the destination pipe (blocking/nonblocking)
> >  - splicing in a big buffer (bigger than the pipe buffer)
> >  - splicing out full buffers
> >  - splicing out partial buffers
> >  - splicing out multiple buffers
> >  - splicing out more data then is in the pipe (blocking/nonblocking)
> >  - parallel splicing (in and/or out)
> > 
> 
> Thanks for pointing out the areas that can be tested in addition to what
> is existing. The list is very comprehensive. I will see if i can get
> somebody to work on this. Will put you in CC for review. Hope you do not
> mind that.
> 
> > And then there's vmsplice(), which I haven't even looked at.
> 
> Yes, please do look into this and give us more feedback like the splice
> above. You can find it here:
> 
> http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/vmsplice/.

Hi Miklos,

Would you also like to provide some wonderful feedback for the
vmsplice() syscall test cases like the way you provided for the splice()
test cases?

Regards--
Subrata

> 
> > 
> > > It would also be great if you can send us some patch as well in the
> > > regard.
> > 
> > Unfortunately this would be a much bigger project than the cleanup I
> > did on the kernel splice code, and I don't really have the time to do
> > it :(
> 
> We appreciate the already feedback you have given us. Hope you will also
> help in future review of the testcases.
> 
> Regards--
> Subrata
> 
> > 
> > Miklos
> 
> 
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Ltp-list mailing list
> Ltp-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ltp-list


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [LTP] [rfc patch 0/4] splice: cleanups and fixes
  2008-06-25 13:17       ` [LTP] " Subrata Modak
@ 2008-06-25 14:00         ` Miklos Szeredi
       [not found]           ` <E1KBVXV-000618-C9-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Miklos Szeredi @ 2008-06-25 14:00 UTC (permalink / raw)
  To: subrata; +Cc: miklos, linux-fsdevel, ltp-list, jens.axboe

> Would you also like to provide some wonderful feedback for the
> vmsplice() syscall test cases like the way you provided for the splice()
> test cases?

I haven't looked at vmsplice() yet.  I think Jens would be in a better
position to provide such feedback.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [rfc patch 0/4] splice: cleanups and fixes
       [not found]           ` <E1KBVXV-000618-C9-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
@ 2008-06-26  7:34             ` Subrata Modak
  2008-10-22 12:41               ` [LTP] " Subrata Modak
  0 siblings, 1 reply; 38+ messages in thread
From: Subrata Modak @ 2008-06-26  7:34 UTC (permalink / raw)
  To: Miklos Szeredi, jens.axboe
  Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	ltp-list-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Wed, 2008-06-25 at 16:00 +0200, Miklos Szeredi wrote:
> > Would you also like to provide some wonderful feedback for the
> > vmsplice() syscall test cases like the way you provided for the splice()
> > test cases?
> 
> I haven't looked at vmsplice() yet.  I think Jens would be in a better
> position to provide such feedback.

Thanks Miklos. I would also request feedback from Jens as well.

Hi Jens,

Miklos has recently helped us review the spilce() syscall test cases in
LTP, available at:
http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/splice/
and has provided us valuable feedback on improvements of these test
cases on the following lines:

> > Well, there's probably lot more different tests that could be done:
> > 
> >  - splicing from various sources (socket, pipe, regular file)
> >  - splicing to various sources (...)
> >  - splicing in full page(s)
> >  - splicing in partial page(s)
> >  - filling the destination pipe (blocking/nonblocking)
> >  - splicing in a big buffer (bigger than the pipe buffer)
> >  - splicing out full buffers
> >  - splicing out partial buffers
> >  - splicing out multiple buffers
> >  - splicing out more data then is in the pipe (blocking/nonblocking)
> >  - parallel splicing (in and/or out)
> > 

I would request you to kindly help us review the vmspilce() syscall test
cases in LTP, and also give us review feedback and improvements on the
same:
http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/vmsplice/

Regards--
Subrata

> 
> Thanks,
> Miklos


-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [LTP] [rfc patch 0/4] splice: cleanups and fixes
  2008-06-26  7:34             ` Subrata Modak
@ 2008-10-22 12:41               ` Subrata Modak
  0 siblings, 0 replies; 38+ messages in thread
From: Subrata Modak @ 2008-10-22 12:41 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: jens.axboe, linux-fsdevel, ltp-list

Hi Miklos,

On Thu, 2008-06-26 at 13:04 +0530, Subrata Modak wrote:
> On Wed, 2008-06-25 at 16:00 +0200, Miklos Szeredi wrote:
> > > Would you also like to provide some wonderful feedback for the
> > > vmsplice() syscall test cases like the way you provided for the splice()
> > > test cases?
> > 
> > I haven't looked at vmsplice() yet.  I think Jens would be in a better
> > position to provide such feedback.
> 
> Thanks Miklos. I would also request feedback from Jens as well.
> 
> Hi Jens,
> 
> Miklos has recently helped us review the spilce() syscall test cases in
> LTP, available at:
> http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/splice/
> and has provided us valuable feedback on improvements of these test
> cases on the following lines:
> 
> > > Well, there's probably lot more different tests that could be done:
> > > 
> > >  - splicing from various sources (socket, pipe, regular file)
> > >  - splicing to various sources (...)
> > >  - splicing in full page(s)
> > >  - splicing in partial page(s)
> > >  - filling the destination pipe (blocking/nonblocking)
> > >  - splicing in a big buffer (bigger than the pipe buffer)
> > >  - splicing out full buffers
> > >  - splicing out partial buffers
> > >  - splicing out multiple buffers
> > >  - splicing out more data then is in the pipe (blocking/nonblocking)
> > >  - parallel splicing (in and/or out)
> > > 
> 
> I would request you to kindly help us review the vmspilce() syscall test
> cases in LTP, and also give us review feedback and improvements on the
> same:
> http://ltp.cvs.sourceforge.net/ltp/ltp/testcases/kernel/syscalls/vmsplice/

Would you like to provide us some feedback here.

Regards--
Subrata

> 
> Regards--
> Subrata
> 
> > 
> > Thanks,
> > Miklos
> 
> 
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Ltp-list mailing list
> Ltp-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ltp-list


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2008-10-22 12:41 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-21 15:46 [rfc patch 0/4] splice: cleanups and fixes Miklos Szeredi
2008-06-21 15:46 ` [rfc patch 1/4] splice: fix comment Miklos Szeredi
2008-06-21 15:46 ` [rfc patch 2/4] splice: remove steal from pipe_buf_operations Miklos Szeredi
2008-06-24  8:01   ` Jens Axboe
2008-06-24  8:50     ` Miklos Szeredi
2008-06-24 12:21       ` Nick Piggin
2008-06-21 15:46 ` [rfc patch 3/4] splice: remove confirm " Miklos Szeredi
2008-06-24  8:04   ` Jens Axboe
2008-06-24  8:54     ` Miklos Szeredi
2008-06-24 11:19       ` Jens Axboe
2008-06-24 11:36         ` Miklos Szeredi
2008-06-24 11:46           ` Evgeniy Polyakov
2008-06-24 12:02             ` Miklos Szeredi
2008-06-24 12:15               ` Evgeniy Polyakov
2008-06-24 12:16           ` Nick Piggin
2008-06-24 12:22             ` Jens Axboe
2008-06-24 13:00             ` Miklos Szeredi
2008-06-24 17:30           ` Linus Torvalds
2008-06-24 18:24             ` Miklos Szeredi
2008-06-24 18:31               ` Linus Torvalds
2008-06-24 19:05                 ` Miklos Szeredi
2008-06-24 19:17                   ` Linus Torvalds
2008-06-24 19:24                     ` Miklos Szeredi
2008-06-24 19:26                       ` Miklos Szeredi
2008-06-24 19:32                         ` Miklos Szeredi
2008-06-24 19:47                           ` Linus Torvalds
2008-06-24 20:06                             ` Miklos Szeredi
2008-06-24 19:45                       ` Linus Torvalds
2008-06-21 15:46 ` [rfc patch 4/4] splice: use do_generic_file_read() Miklos Szeredi
2008-06-24  8:05   ` Jens Axboe
2008-06-24 11:11     ` Miklos Szeredi
2008-06-21 17:20 ` [rfc patch 0/4] splice: cleanups and fixes Subrata Modak
2008-06-22  6:16   ` Miklos Szeredi
2008-06-23 15:26     ` Subrata Modak
2008-06-25 13:17       ` [LTP] " Subrata Modak
2008-06-25 14:00         ` Miklos Szeredi
     [not found]           ` <E1KBVXV-000618-C9-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
2008-06-26  7:34             ` Subrata Modak
2008-10-22 12:41               ` [LTP] " Subrata Modak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).