All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] readahead even for FMODE_RANDOM
Date: Fri, 2 Apr 2010 15:21:08 +0800	[thread overview]
Message-ID: <20100402072108.GA21772@localhost> (raw)
In-Reply-To: <20100402065917.GX23510@kernel.dk>

On Fri, Apr 02, 2010 at 02:59:17PM +0800, Jens Axboe wrote:
> On Fri, Apr 02 2010, Wu Fengguang wrote:
> > On Fri, Apr 02, 2010 at 02:38:30PM +0800, Jens Axboe wrote:
> > > On Fri, Apr 02 2010, Wu Fengguang wrote:
> > > > Hi Jens,
> > > > 
> > > > On Fri, Apr 02, 2010 at 02:31:51AM +0800, Jens Axboe wrote:
> > > > > Hi,
> > > > > 
> > > > > I got a problem report with fio where larger block size random reads
> > > > > where markedly slower with buffered IO than with O_DIRECT, and the
> > > > > initial thought was that perhaps this was some fio oddity. The reporter
> > > > > eventually discovered that turning off the fadvise hint made it work
> > > > > fine. So I took a look, and it seems we never do readahead for
> > > > > FMODE_RANDOM even if the request size is larger than 1 page. That seems
> > > > > like a bug, if an application is doing eg 16kb random reads, you want to
> > > > > readahead the 12kb remaining data. On devices where smaller transfer
> > > > > sizes are slower than larger ones, this can make a large difference.
> > > > > 
> > > > > This patch makes us readahead even for FMODE_RANDOM, iff we'll be
> > > > > reading more pages in that single read. I ran a quick test here, and it
> > > > > appears to fix the problem (no difference with fadvise POSIX_FADV_RANDOM
> > > > > being passed in or not).
> > > >  
> > > > I guess the application is doing (at least partial) sequential reads,
> > > > while at the same time tell kernel with POSIX_FADV_RANDOM that it's
> > > > doing random reads.
> > > > 
> > > > If so, it's mainly the application's fault.
> > > 
> > > The application is doing large random reads. It's purely random, so
> > > the POSIX_FADV_RANDOM hint is correct. However, thinking about it, it
> > 
> > How large is it? For random reads > read_ahead_kb,
> > ondemand_readahead() will break it into read_ahead_kb sized IOs, while
> > force_page_cache_readahead() won't. That may impact IO performance.
> 
> The test case was 128kb random reads. So should still be within the
> normal read_ahead_kb. I suspect the reporter would not have noticed if

Yeah. 128kb random reads won't trigger readahead.

However each 129kb random read will trigger 2*128kb readahead IOs,
if we let ondemand_readahead() handle these random reads..

> the issue size was as large as read_ahead_kb even if the request size
> was larger, the problem was that he ended up seeing 4kb ios only.
> 
> > > may be that we later hit a random "block" that has now been cached due
> > > to this read-ahead. Let me try and rule that out completely and see if
> > > there's still the difference. The original reporter observed 4kb reads
> > > hitting the driver, where 128kb was expected.
> > 
> > 4kb reads hit the disk (on POSIX_FADV_RANDOM)? That sounds like
> > behavior in pre .34 kernels that is fixed by commit 0141450f66c:
> > 
> >     readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM
> 
> Could explain why I'm not reproducing when doing a quick test on the
> laptop. It is an older kernel. So it could be that I'm just imaging the
> issue on the current kernel, I don't have hard data to back it up on
> that version.
> 
> So disregard the patch for now, part-sequential behaviour on
> POSIX_FADV_RANDOM isn't the issue here.

OK.

Thanks,
Fengguang

      reply	other threads:[~2010-04-02  7:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-01 18:31 [PATCH] readahead even for FMODE_RANDOM Jens Axboe
2010-04-02  1:23 ` Wu Fengguang
2010-04-02  6:38   ` Jens Axboe
2010-04-02  6:52     ` Wu Fengguang
2010-04-02  6:59       ` Jens Axboe
2010-04-02  7:21         ` Wu Fengguang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100402072108.GA21772@localhost \
    --to=fengguang.wu@intel.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.