public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>, Andrew Morton <akpm@osdl.org>,
	Oleg Nesterov <oleg@tv-sign.ru>,
	Steven Pratt <slpratt@austin.ibm.com>,
	Ram Pai <linuxram@us.ibm.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: [RFC] splice() and readahead interaction
Date: Thu, 10 May 2007 21:53:59 +0200	[thread overview]
Message-ID: <46437857.9070403@cosmosbay.com> (raw)
In-Reply-To: <f6b15c890705050204l11045ba3w66c8c4ae0ac3407f@mail.gmail.com>

Fengguang Wu a écrit :
> 2007/5/2, Eric Dumazet <dada1@cosmosbay.com <mailto:dada1@cosmosbay.com>>:
> 
>     Since you work on readahead, could you please find the reason
>     following program triggers a problem in splice() syscall ?
> 
>     Description :
> 
>     I tried to use splice(SPLICE_F_NONBLOCK) in a non blocking
>     environnement, in an attempt to implement cheap AIO, and zero-copy
>     splice() feature.
> 
>     I quicky found that readahead in splice() is not really working.
> 
>     To demonstrate the problem, just compile the attached program, and
>     use it to pipe a big file (not yet in cache) to /dev/null :
> 
>     $ gcc -o spliceout spliceout.c
>     $ spliceout -d BIGFILE | cat >/dev/null
>     offset=49152 ret=49152
>     offset=65536 ret=16384
>     offset=131072 ret=65536
>     ...no more progress...   (splice() returns -1 and EAGAIN)
> 
>     reading splice(SPLICE_F_NONBLOCK) syscall implementation, I expected
>     to exploit its ability to call readahead(), and do some progress if
>     pages are ready in cache.
> 
>     But apparently, even on an idle machine, it is not working as expected.
> 
> 
> 
> Eric Dumazet, thank you for disclosing this bug.
> 
> Readahead logic somehow fails to populate the page range with data.
> It can be because
> 1) the readahead routine is not always called in the following lines of 
> fs/splice.c:
>         if (!loff || nr_pages > 1)
>                 page_cache_readahead(mapping, &in->f_ra, in, index, 
> nr_pages);
> 2) even called, page_cache_readahead() wont guarantee the pages are there.
> It wont submit readahead I/O for pages already in the radix tree, or 
> when (ra_pages == 0), or after 256 cache hits.
> 
> In your case, it should be because of the retried reads, which lead to 
> excessive cache hits, and disables readahead at some time.
> 
> And that _one_ failure of readahead blocks the whole read process.
> The application receives EAGAIN and retries the read, but 
> __generic_file_splice_read() refuse to make progress:
> - in the previous invocation, it has allocated a blank page and inserted 
> it into the radix tree, but never has the chance to start I/O for it: 
> the test of SPLICE_F_NONBLOCK goes before that.
> - in the retried invocation, the readahead code will neither get out of 
> the cache hit mode, nor will it submit I/O for an already existing page.
> 
> The attached patch should fix the critical splice bug. Sorry for not 
> being able to test it locally for now - I'm at home and running knoppix. 
> And the readahead bug will be fixed by the upcoming on-demand readahead 
> patch. I should be back and submit it after a week.
> 
> Thank you,
> Fengguang Wu
> 
> 
> ------------------------------------------------------------------------
> 
> --- linux-2.6.21.1/fs/splice.c.old	2007-05-05 04:40:38.000000000 -0400
> +++ linux-2.6.21.1/fs/splice.c	2007-05-05 04:41:59.000000000 -0400
> @@ -378,10 +378,11 @@
>  			 * If in nonblock mode then dont block on waiting
>  			 * for an in-flight io page
>  			 */
> -			if (flags & SPLICE_F_NONBLOCK)
> -				break;
> -
> -			lock_page(page);
> +			if (flags & SPLICE_F_NONBLOCK) {
> +				if (TestSetPageLocked(page))
> +					break;
> +			} else
> +				lock_page(page);
>  
>  			/*
>  			 * page was truncated, stop here. if this isn't the

Sorry for the delay.

This patches solves the problem, thank you !



      parent reply	other threads:[~2007-05-10 19:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070425131133.GA26863@mail.ustc.edu.cn>
2007-04-25 13:11 ` [RFC][PATCH] on-demand readahead Fengguang Wu
2007-04-25 14:37   ` Andi Kleen
     [not found]     ` <20070425160400.GA27954@mail.ustc.edu.cn>
2007-04-25 16:04       ` Fengguang Wu
2007-04-26  6:58         ` Andrew Morton
2007-04-25 16:08       ` Andi Kleen
     [not found]         ` <20070426011655.GA6373@mail.ustc.edu.cn>
2007-04-26  1:16           ` Fengguang Wu
2007-05-02 10:02             ` [RFC] splice() and readahead interaction Eric Dumazet
     [not found]               ` <f6b15c890705050204l11045ba3w66c8c4ae0ac3407f@mail.gmail.com>
2007-05-07 21:54                 ` Andrew Morton
2007-05-10 19:53                 ` Eric Dumazet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46437857.9070403@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=akpm@osdl.org \
    --cc=andi@firstfloor.org \
    --cc=fengguang.wu@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    --cc=slpratt@austin.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox