From: Eric Dumazet <dada1@cosmosbay.com>
To: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>, Andrew Morton <akpm@osdl.org>,
Oleg Nesterov <oleg@tv-sign.ru>,
Steven Pratt <slpratt@austin.ibm.com>,
Ram Pai <linuxram@us.ibm.com>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: [RFC] splice() and readahead interaction
Date: Thu, 10 May 2007 21:53:59 +0200 [thread overview]
Message-ID: <46437857.9070403@cosmosbay.com> (raw)
In-Reply-To: <f6b15c890705050204l11045ba3w66c8c4ae0ac3407f@mail.gmail.com>
Fengguang Wu a écrit :
> 2007/5/2, Eric Dumazet <dada1@cosmosbay.com <mailto:dada1@cosmosbay.com>>:
>
> Since you work on readahead, could you please find the reason
> following program triggers a problem in splice() syscall ?
>
> Description :
>
> I tried to use splice(SPLICE_F_NONBLOCK) in a non blocking
> environnement, in an attempt to implement cheap AIO, and zero-copy
> splice() feature.
>
> I quicky found that readahead in splice() is not really working.
>
> To demonstrate the problem, just compile the attached program, and
> use it to pipe a big file (not yet in cache) to /dev/null :
>
> $ gcc -o spliceout spliceout.c
> $ spliceout -d BIGFILE | cat >/dev/null
> offset=49152 ret=49152
> offset=65536 ret=16384
> offset=131072 ret=65536
> ...no more progress... (splice() returns -1 and EAGAIN)
>
> reading splice(SPLICE_F_NONBLOCK) syscall implementation, I expected
> to exploit its ability to call readahead(), and do some progress if
> pages are ready in cache.
>
> But apparently, even on an idle machine, it is not working as expected.
>
>
>
> Eric Dumazet, thank you for disclosing this bug.
>
> Readahead logic somehow fails to populate the page range with data.
> It can be because
> 1) the readahead routine is not always called in the following lines of
> fs/splice.c:
> if (!loff || nr_pages > 1)
> page_cache_readahead(mapping, &in->f_ra, in, index,
> nr_pages);
> 2) even called, page_cache_readahead() wont guarantee the pages are there.
> It wont submit readahead I/O for pages already in the radix tree, or
> when (ra_pages == 0), or after 256 cache hits.
>
> In your case, it should be because of the retried reads, which lead to
> excessive cache hits, and disables readahead at some time.
>
> And that _one_ failure of readahead blocks the whole read process.
> The application receives EAGAIN and retries the read, but
> __generic_file_splice_read() refuse to make progress:
> - in the previous invocation, it has allocated a blank page and inserted
> it into the radix tree, but never has the chance to start I/O for it:
> the test of SPLICE_F_NONBLOCK goes before that.
> - in the retried invocation, the readahead code will neither get out of
> the cache hit mode, nor will it submit I/O for an already existing page.
>
> The attached patch should fix the critical splice bug. Sorry for not
> being able to test it locally for now - I'm at home and running knoppix.
> And the readahead bug will be fixed by the upcoming on-demand readahead
> patch. I should be back and submit it after a week.
>
> Thank you,
> Fengguang Wu
>
>
> ------------------------------------------------------------------------
>
> --- linux-2.6.21.1/fs/splice.c.old 2007-05-05 04:40:38.000000000 -0400
> +++ linux-2.6.21.1/fs/splice.c 2007-05-05 04:41:59.000000000 -0400
> @@ -378,10 +378,11 @@
> * If in nonblock mode then dont block on waiting
> * for an in-flight io page
> */
> - if (flags & SPLICE_F_NONBLOCK)
> - break;
> -
> - lock_page(page);
> + if (flags & SPLICE_F_NONBLOCK) {
> + if (TestSetPageLocked(page))
> + break;
> + } else
> + lock_page(page);
>
> /*
> * page was truncated, stop here. if this isn't the
Sorry for the delay.
This patches solves the problem, thank you !
prev parent reply other threads:[~2007-05-10 19:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20070425131133.GA26863@mail.ustc.edu.cn>
2007-04-25 13:11 ` [RFC][PATCH] on-demand readahead Fengguang Wu
2007-04-25 14:37 ` Andi Kleen
[not found] ` <20070425160400.GA27954@mail.ustc.edu.cn>
2007-04-25 16:04 ` Fengguang Wu
2007-04-26 6:58 ` Andrew Morton
2007-04-25 16:08 ` Andi Kleen
[not found] ` <20070426011655.GA6373@mail.ustc.edu.cn>
2007-04-26 1:16 ` Fengguang Wu
2007-05-02 10:02 ` [RFC] splice() and readahead interaction Eric Dumazet
[not found] ` <f6b15c890705050204l11045ba3w66c8c4ae0ac3407f@mail.gmail.com>
2007-05-07 21:54 ` Andrew Morton
2007-05-10 19:53 ` Eric Dumazet [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46437857.9070403@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@osdl.org \
--cc=andi@firstfloor.org \
--cc=fengguang.wu@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxram@us.ibm.com \
--cc=mingo@elte.hu \
--cc=oleg@tv-sign.ru \
--cc=slpratt@austin.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox