All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jens Axboe <jens.axboe@oracle.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Eric Dumazet <dada1@cosmosbay.com>,
	linux-kernel@vger.kernel.org, cotte@de.ibm.com, hugh@veritas.com,
	neilb@suse.de, zanussi@us.ibm.com, hch@infradead.org
Subject: Re: [PATCH] sendfile removal
Date: Sun, 3 Jun 2007 21:05:08 +0800	[thread overview]
Message-ID: <380875906.01702@ustc.edu.cn> (raw)
Message-ID: <20070603130507.GA11170@mail.ustc.edu.cn> (raw)
In-Reply-To: <alpine.LFD.0.98.0706020836360.23741@woody.linux-foundation.org>

On Sat, Jun 02, 2007 at 08:40:04AM -0700, Linus Torvalds wrote:
> On Sat, 2 Jun 2007, Jens Axboe wrote:
> Your suggested:
> 
> >       if ((flags & SPLICE_F_NONBLOCK) && spd.nr_pages) {
> >               if (TestSetPageLocked(page))
> >                       break;
> >       } else
> >               lock_page(page);
> > 
> > should do that - always block for the first page and potentially return
> > a partial results for the remaining pages that read-ahead kicked into
> > gear.
> 
> would work, but I suspect that for a server, returning EAGAIN once is 
> actually the best option - if it has a select() loop, and something else 
> is running, the "return EAGAIN once" actually makes tons of sense (it 
> would basically boil down to the kernel effectively saying "ok, try 
> anything else you might have pending in your queues first, if you get back 
> to me, I'll block then").

May be we can settle with "return EAGAIN once" for *all* possible
waits, including I/O submitted by others:

          if ((flags & SPLICE_F_NONBLOCK) &&
                  TestSetPageLocked(page) &&
                  in->f_ra.prev_index != index)
                  break;
          else
                  lock_page(page);

However, the upper layer generic_file_splice_read() may keep
on calling __generic_file_splice_read(), so we need more code to
stop it with flag SPLICE_INTERNAL_WILLBLOCK:

---
(not tested yet; still not sure if it's the best solution.)

In non-block splicing, return EAGAIN once for all possible I/O waits.

It works by checking (ra.prev_index != index) in
__generic_file_splice_read().  Another reader on the same fd will at
worst cause a few more splice syscalls.

If keep calling generic_file_splice_read() in non-block mode, it will
return partial data before the first hole; then return EAGAIN at
second call; at last it will block on the I/O page.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/splice.c               |   38 ++++++++++++++++++++++--------------
 include/linux/pipe_fs_i.h |    2 +
 2 files changed, 26 insertions(+), 14 deletions(-)

--- linux-2.6.22-rc3-mm1.orig/fs/splice.c
+++ linux-2.6.22-rc3-mm1/fs/splice.c
@@ -264,10 +264,11 @@ static ssize_t splice_to_pipe(struct pip
 static int
 __generic_file_splice_read(struct file *in, loff_t *ppos,
 			   struct pipe_inode_info *pipe, size_t len,
-			   unsigned int flags)
+			   unsigned int *flags)
 {
 	struct address_space *mapping = in->f_mapping;
 	unsigned int loff, nr_pages;
+	unsigned int first_hole;
 	struct page *pages[PIPE_BUFFERS];
 	struct partial_page partial[PIPE_BUFFERS];
 	struct page *page;
@@ -278,7 +279,7 @@ __generic_file_splice_read(struct file *
 	struct splice_pipe_desc spd = {
 		.pages = pages,
 		.partial = partial,
-		.flags = flags,
+		.flags = *flags,
 		.ops = &page_cache_pipe_buf_ops,
 	};
 
@@ -299,12 +300,17 @@ __generic_file_splice_read(struct file *
 	 * Lookup the (hopefully) full range of pages we need.
 	 */
 	spd.nr_pages = find_get_pages_contig(mapping, index, nr_pages, pages);
+	first_hole = spd.nr_pages;
+	index += spd.nr_pages;
 
 	/*
 	 * If find_get_pages_contig() returned fewer pages than we needed,
-	 * allocate the rest.
+	 * readahead/allocate the rest.
 	 */
-	index += spd.nr_pages;
+	if (spd.nr_pages < nr_pages)
+		page_cache_readahead_ondemand(mapping, &in->f_ra, in,
+				NULL, index, nr_pages - spd.nr_pages);
+
 	while (spd.nr_pages < nr_pages) {
 		/*
 		 * Page could be there, find_get_pages_contig() breaks on
@@ -312,9 +318,6 @@ __generic_file_splice_read(struct file *
 		 */
 		page = find_get_page(mapping, index);
 		if (!page) {
-			page_cache_readahead_ondemand(mapping, &in->f_ra, in,
-					NULL, index, nr_pages - spd.nr_pages);
-
 			/*
 			 * page didn't exist, allocate one.
 			 */
@@ -369,13 +372,13 @@ __generic_file_splice_read(struct file *
 		 */
 		if (!PageUptodate(page)) {
 			/*
-			 * If in nonblock mode then dont block on waiting
-			 * for an in-flight io page
+			 * In non-block mode, only block at the second try.
 			 */
-			if (flags & SPLICE_F_NONBLOCK) {
-				if (TestSetPageLocked(page))
-					break;
-			} else
+			if ((*flags & SPLICE_F_NONBLOCK) &&
+				TestSetPageLocked(page) &&
+				in->f_ra.prev_index != index)
+				break;
+			else
 				lock_page(page);
 
 			/*
@@ -454,6 +457,10 @@ fill_it:
 		page_cache_release(pages[page_nr++]);
 	in->f_ra.prev_index = index;
 
+	if ((*flags & SPLICE_F_NONBLOCK) && (spd.nr_pages > first_hole)) {
+		*flags |= SPLICE_INTERNAL_WILLBLOCK;
+		spd.nr_pages = first_hole;
+	}
 	if (spd.nr_pages)
 		return splice_to_pipe(pipe, &spd);
 
@@ -480,7 +487,7 @@ ssize_t generic_file_splice_read(struct 
 	spliced = 0;
 
 	while (len) {
-		ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
+		ret = __generic_file_splice_read(in, ppos, pipe, len, &flags);
 
 		if (ret < 0)
 			break;
@@ -496,6 +503,9 @@ ssize_t generic_file_splice_read(struct 
 		*ppos += ret;
 		len -= ret;
 		spliced += ret;
+
+		if (flags & SPLICE_INTERNAL_WILLBLOCK)
+			break;
 	}
 
 	if (spliced)
--- linux-2.6.22-rc3-mm1.orig/include/linux/pipe_fs_i.h
+++ linux-2.6.22-rc3-mm1/include/linux/pipe_fs_i.h
@@ -82,6 +82,8 @@ int generic_pipe_buf_steal(struct pipe_i
 #define SPLICE_F_MORE	(0x04)	/* expect more data */
 #define SPLICE_F_GIFT	(0x08)	/* pages passed in are a gift */
 
+#define SPLICE_INTERNAL_WILLBLOCK  (0x100) /* read on next page will block */
+
 /*
  * Passed to the actors
  */


  parent reply	other threads:[~2007-06-03 13:05 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-31 10:33 [PATCH] sendfile removal Jens Axboe
2007-05-31 10:47 ` Jens Axboe
2007-05-31 10:47 ` Eric Dumazet
2007-05-31 10:53   ` Jens Axboe
2007-06-01  4:09     ` H. Peter Anvin
2007-06-01  5:41       ` Jens Axboe
2007-06-01  5:50         ` H. Peter Anvin
2007-06-01  7:22           ` Eric Dumazet
2007-06-01 15:52             ` H. Peter Anvin
2007-06-01 16:18               ` Linus Torvalds
2007-06-01 16:47                 ` Eric Dumazet
2007-06-01 16:53                 ` H. Peter Anvin
2007-06-02 15:02                   ` Jens Axboe
2007-06-02 15:01                 ` Jens Axboe
2007-06-02 15:40                   ` Linus Torvalds
2007-06-02 16:35                     ` Jens Axboe
2007-06-03 13:05                     ` Fengguang Wu [this message]
2007-06-03 13:05                       ` Fengguang Wu
2007-06-03 14:29                       ` Fengguang Wu
2007-06-03 14:29                         ` Fengguang Wu
2007-06-04  0:46                         ` Fengguang Wu
2007-06-04  0:46                           ` Fengguang Wu
2007-06-04  8:05                           ` Jens Axboe
2007-06-04 11:22                             ` Fengguang Wu
2007-06-04 11:22                               ` Fengguang Wu
2007-06-01 16:22               ` Pádraig Brady
2007-05-31 10:55 ` Christoph Hellwig
2007-05-31 11:05   ` Jens Axboe
2007-05-31 12:26     ` Neil Brown
2007-05-31 12:27       ` Jens Axboe
2007-06-01  2:44         ` [PATCH] sendfile removal (nfsd update) Neil Brown
2007-06-01  5:44           ` Jens Axboe
2007-06-01  8:01             ` Jens Axboe
2007-06-01  8:15     ` [PATCH] sendfile removal Jens Axboe
2007-05-31 11:04 ` Carsten Otte
2007-05-31 11:06   ` Jens Axboe
2007-05-31 15:33 ` Tom Zanussi
2007-05-31 19:01   ` Jens Axboe
2007-05-31 17:06 ` Hugh Dickins
2007-05-31 17:31   ` Christoph Hellwig
2007-05-31 19:03   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=380875906.01702@ustc.edu.cn \
    --to=fengguang.wu@gmail.com \
    --cc=cotte@de.ibm.com \
    --cc=dada1@cosmosbay.com \
    --cc=hch@infradead.org \
    --cc=hpa@zytor.com \
    --cc=hugh@veritas.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=torvalds@linux-foundation.org \
    --cc=zanussi@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.