From: Eric Dumazet <dada1@cosmosbay.com>
To: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>, Andrew Morton <akpm@osdl.org>,
Oleg Nesterov <oleg@tv-sign.ru>,
Steven Pratt <slpratt@austin.ibm.com>,
Ram Pai <linuxram@us.ibm.com>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: [RFC] splice() and readahead interaction
Date: Thu, 10 May 2007 21:53:59 +0200 [thread overview]
Message-ID: <46437857.9070403@cosmosbay.com> (raw)
In-Reply-To: <f6b15c890705050204l11045ba3w66c8c4ae0ac3407f@mail.gmail.com>
Fengguang Wu a écrit :
> 2007/5/2, Eric Dumazet <dada1@cosmosbay.com <mailto:dada1@cosmosbay.com>>:
>
> Since you work on readahead, could you please find the reason
> following program triggers a problem in splice() syscall ?
>
> Description :
>
> I tried to use splice(SPLICE_F_NONBLOCK) in a non blocking
> environnement, in an attempt to implement cheap AIO, and zero-copy
> splice() feature.
>
> I quicky found that readahead in splice() is not really working.
>
> To demonstrate the problem, just compile the attached program, and
> use it to pipe a big file (not yet in cache) to /dev/null :
>
> $ gcc -o spliceout spliceout.c
> $ spliceout -d BIGFILE | cat >/dev/null
> offset=49152 ret=49152
> offset=65536 ret=16384
> offset=131072 ret=65536
> ...no more progress... (splice() returns -1 and EAGAIN)
>
> reading splice(SPLICE_F_NONBLOCK) syscall implementation, I expected
> to exploit its ability to call readahead(), and do some progress if
> pages are ready in cache.
>
> But apparently, even on an idle machine, it is not working as expected.
>
>
>
> Eric Dumazet, thank you for disclosing this bug.
>
> Readahead logic somehow fails to populate the page range with data.
> It can be because
> 1) the readahead routine is not always called in the following lines of
> fs/splice.c:
> if (!loff || nr_pages > 1)
> page_cache_readahead(mapping, &in->f_ra, in, index,
> nr_pages);
> 2) even called, page_cache_readahead() wont guarantee the pages are there.
> It wont submit readahead I/O for pages already in the radix tree, or
> when (ra_pages == 0), or after 256 cache hits.
>
> In your case, it should be because of the retried reads, which lead to
> excessive cache hits, and disables readahead at some time.
>
> And that _one_ failure of readahead blocks the whole read process.
> The application receives EAGAIN and retries the read, but
> __generic_file_splice_read() refuse to make progress:
> - in the previous invocation, it has allocated a blank page and inserted
> it into the radix tree, but never has the chance to start I/O for it:
> the test of SPLICE_F_NONBLOCK goes before that.
> - in the retried invocation, the readahead code will neither get out of
> the cache hit mode, nor will it submit I/O for an already existing page.
>
> The attached patch should fix the critical splice bug. Sorry for not
> being able to test it locally for now - I'm at home and running knoppix.
> And the readahead bug will be fixed by the upcoming on-demand readahead
> patch. I should be back and submit it after a week.
>
> Thank you,
> Fengguang Wu
>
>
> ------------------------------------------------------------------------
>
> --- linux-2.6.21.1/fs/splice.c.old 2007-05-05 04:40:38.000000000 -0400
> +++ linux-2.6.21.1/fs/splice.c 2007-05-05 04:41:59.000000000 -0400
> @@ -378,10 +378,11 @@
> * If in nonblock mode then dont block on waiting
> * for an in-flight io page
> */
> - if (flags & SPLICE_F_NONBLOCK)
> - break;
> -
> - lock_page(page);
> + if (flags & SPLICE_F_NONBLOCK) {
> + if (TestSetPageLocked(page))
> + break;
> + } else
> + lock_page(page);
>
> /*
> * page was truncated, stop here. if this isn't the
Sorry for the delay.
This patches solves the problem, thank you !
prev parent reply other threads:[~2007-05-10 19:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-25 13:11 [RFC][PATCH] on-demand readahead Fengguang Wu
2007-04-25 13:11 ` Fengguang Wu
2007-04-25 14:37 ` Andi Kleen
2007-04-25 16:04 ` Fengguang Wu
2007-04-25 16:04 ` Fengguang Wu
2007-04-26 6:58 ` Andrew Morton
2007-04-25 16:08 ` Andi Kleen
2007-04-26 1:16 ` Fengguang Wu
2007-04-26 1:16 ` Fengguang Wu
2007-05-02 10:02 ` [RFC] splice() and readahead interaction Eric Dumazet
[not found] ` <f6b15c890705050204l11045ba3w66c8c4ae0ac3407f@mail.gmail.com>
2007-05-07 21:54 ` Andrew Morton
2007-05-10 19:53 ` Eric Dumazet [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46437857.9070403@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@osdl.org \
--cc=andi@firstfloor.org \
--cc=fengguang.wu@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxram@us.ibm.com \
--cc=mingo@elte.hu \
--cc=oleg@tv-sign.ru \
--cc=slpratt@austin.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.