All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] readahead even for FMODE_RANDOM
Date: Fri, 2 Apr 2010 09:23:38 +0800	[thread overview]
Message-ID: <20100402012338.GA6393@localhost> (raw)
In-Reply-To: <20100401183151.GO23510@kernel.dk>

Hi Jens,

On Fri, Apr 02, 2010 at 02:31:51AM +0800, Jens Axboe wrote:
> Hi,
> 
> I got a problem report with fio where larger block size random reads
> where markedly slower with buffered IO than with O_DIRECT, and the
> initial thought was that perhaps this was some fio oddity. The reporter
> eventually discovered that turning off the fadvise hint made it work
> fine. So I took a look, and it seems we never do readahead for
> FMODE_RANDOM even if the request size is larger than 1 page. That seems
> like a bug, if an application is doing eg 16kb random reads, you want to
> readahead the 12kb remaining data. On devices where smaller transfer
> sizes are slower than larger ones, this can make a large difference.
> 
> This patch makes us readahead even for FMODE_RANDOM, iff we'll be
> reading more pages in that single read. I ran a quick test here, and it
> appears to fix the problem (no difference with fadvise POSIX_FADV_RANDOM
> being passed in or not).
 
I guess the application is doing (at least partial) sequential reads,
while at the same time tell kernel with POSIX_FADV_RANDOM that it's
doing random reads.

If so, it's mainly the application's fault.

However the kernel can behave more smart and less "dumb" :)
It can inherit the current good behavior of "submit one single 16kb io
request for a 16kb random read() syscall", while still be able to
start _larger sized_ readahead if it's actually a sequential one.

> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
> 
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 337b20e..d4b201c 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -501,8 +501,11 @@ void page_cache_sync_readahead(struct address_space *mapping,
>  	if (!ra->ra_pages)
>  		return;
>  
> -	/* be dumb */
> -	if (filp->f_mode & FMODE_RANDOM) {
> +	/*
> +	 * Be dumb for files marked as randomly accessed, but do readahead
> +	 * inside the original request (req_size > 1).
> +	 */
> +	if ((filp->f_mode & FMODE_RANDOM) && req_size == 1) {
>  		force_page_cache_readahead(mapping, filp, offset, req_size);
>  		return;
>  	}

The patch only fixes the (req_size != 1) case that exposed by your
application. A complete fix would be 

@@ -820,12 +825,6 @@ void page_cache_sync_readahead(struct ad
 	if (!ra->ra_pages)
 		return;
 
-	/* be dumb */
-	if (filp->f_mode & FMODE_RANDOM) {
-		force_page_cache_readahead(mapping, filp, offset, req_size);
-		return;
-	}
-
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, false, offset, req_size);
 }

And a more optimized patch would look like this.  Note that only the
last chunk is necessary for bug fixing, and only this chunk can be
applied to vanilla 2.6.34-rc3.

If no problem, I'll submit a patch with only the last chunk for
2.6.34, and submit the remaining chunks for 2.6.35.

Thanks,
Fengguang
---
Subject: readahead: more smart readahead on POSIX_FADV_RANDOM
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Fri Apr 02 08:52:42 CST 2010

Some times user space applications will tell POSIX_FADV_RANDOM
while still doing some sequential reads.

The kernel can behave a bit smarter in this case, by letting the
readahead heuristics handle the POSIX_FADV_RANDOM case, but with
less aggressive assumption on sequential reads.

CC: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/readahead.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

--- linux.orig/mm/readahead.c	2010-04-02 08:43:53.000000000 +0800
+++ linux/mm/readahead.c	2010-04-02 09:00:51.000000000 +0800
@@ -664,6 +664,7 @@ ondemand_readahead(struct address_space 
 	unsigned long max = max_sane_readahead(ra->ra_pages);
 	unsigned long tt;  /* thrashing shreshold */
 	pgoff_t start;
+	bool random_hint = (filp && (filp->f_mode & FMODE_RANDOM));
 
 	/*
 	 * start of file
@@ -671,7 +672,8 @@ ondemand_readahead(struct address_space 
 	if (!offset) {
 		ra_set_pattern(ra, RA_PATTERN_INITIAL);
 		ra->start = offset;
-		if ((ra->ra_flags & READAHEAD_LSEEK) && req_size <= max) {
+		if ((random_hint || (ra->ra_flags & READAHEAD_LSEEK)) &&
+		    req_size <= max) {
 			ra->size = req_size;
 			ra->async_size = 0;
 			goto readit;
@@ -743,8 +745,11 @@ context_readahead:
 	} else
 		start = offset;
 
-	tt = count_history_pages(mapping, ra, offset,
-				 READAHEAD_ASYNC_RATIO * max);
+	if (unlikely(random_hint))
+		tt = 0;
+	else
+		tt = count_history_pages(mapping, ra, offset,
+					 READAHEAD_ASYNC_RATIO * max);
 	/*
 	 * no history pages cached, could be
 	 * 	- a random read
@@ -820,12 +825,6 @@ void page_cache_sync_readahead(struct ad
 	if (!ra->ra_pages)
 		return;
 
-	/* be dumb */
-	if (filp->f_mode & FMODE_RANDOM) {
-		force_page_cache_readahead(mapping, filp, offset, req_size);
-		return;
-	}
-
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, false, offset, req_size);
 }

  reply	other threads:[~2010-04-02  1:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-01 18:31 [PATCH] readahead even for FMODE_RANDOM Jens Axboe
2010-04-02  1:23 ` Wu Fengguang [this message]
2010-04-02  6:38   ` Jens Axboe
2010-04-02  6:52     ` Wu Fengguang
2010-04-02  6:59       ` Jens Axboe
2010-04-02  7:21         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100402012338.GA6393@localhost \
    --to=fengguang.wu@intel.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.