Re: Poor read performance when sequential write presents

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@suse.de>
To: Andrew Morton <akpm@zip.com.au>
Cc: William Lee Irwin III <wli@holomorphy.com>,
	Giuliano Pochini <pochini@shiny.it>,
	linux-kernel@vger.kernel.org, "chen,
	xiangping" <chen_xiangping@emc.com>
Subject: Re: Poor read performance when sequential write presents
Date: Tue, 28 May 2002 11:25:03 +0200	[thread overview]
Message-ID: <20020528092503.GJ17674@suse.de> (raw)
In-Reply-To: <3CED4843.2783B568@zip.com.au> <XFMail.20020524105942.pochini@shiny.it> <3CEE0758.27110CAD@zip.com.au> <20020524094606.GH14918@holomorphy.com> <3CEE1035.1E67E1B8@zip.com.au> <20020527080632.GC17674@suse.de> <3CF1ECD1.A1BB2CF1@zip.com.au> <20020527085414.GD17674@suse.de> <3CF1FDF8.B775DF44@zip.com.au>

On Mon, May 27 2002, Andrew Morton wrote:
> > On Mon, May 27 2002, Andrew Morton wrote:
> > > Jens Axboe wrote:
> > > >
> > > > ...
> > > > > But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
> > > > > treated the same.  Odd.
> > > >
> > > > It didn't really go away, it just gets handled automatically now.
> > > > elv_next_request() marks the request as started, in which case the i/o
> > > > scheduler won't consider it for merging etc. SCSI removes the request
> > > > directly after it has been marked started, while IDE leaves it on the
> > > > queue until it completes. For IDE TCQ, the behaviour is the same as with
> > > > SCSI.
> > >
> > > It won't consider the active request at the head of the queue for
> > > merging (making the request larger).  But it _could_ consider the
> > > request when making decisions about insertion (adding a new request
> > > at the head of the queue because it's close-on-disk to the active
> > > one).   Does it do that?
> > 
> > Only when the front request isn't active is it safe to consider
> > insertion in front of it. 2.5 does that exactly because it knows if the
> > request has been started, while 2.4 has to guess by looking at the
> > head-active flag and the plug status.
> > 
> > If the request is started, we will only consider placing in front of the
> > 2nd request not after the 1st. We could consider in between 1st and 2nd,
> > that should be safe. In fact that should be perfectly safe, just move
> > the barrier and started test down after the insert test. *req is the
> > insert-after point.
> 
> Makes sense.  I suspect it may even worsen the problem I observed
> with the mpage code.  Set the readahead to 256k with `blockdev --setra 512'
> and then run tiobench.  The read latencies are massive - one thread
> gets hold of the disk head and hogs it for 30-60 seconds.
> 
> The readahead code has a sort of double-window design.  The idea is that
> if the disk does 50 megs/sec and your application processes data at
> 49 megs/sec, the application will never block on I/O.  At 256k readahead,
> the readahead code will be laying out four BIOs at a time.  It's probable
> that the application is actually submitting BIOs for a new readahead
> window before all of the BIOs for the old one are complete.  So it's
> performing merging against its own reads.
> 
> Given all this, what I would expect to see is for thread "A" to capture
> the disk head for some period of time, until eventually one of thread "B"'s
> requests expires its latency.  Then thread "B" gets to hog the disk head.
> That's reasonable behaviour,  but the latencies are *enormous*.  Almost
> like the latency stuff isn't working.  But it sure looks OK.

I'm still waiting for some time to implement some nicer i/o scheduling
algorithms, I'd be sad to see elevator_linus be the default for 2.6. For
now it's just receiving the odd fixes here and there which do make small
improvements.

> Not super-high priority at this time.  I'll play with it some more.
> (Some userspace tunables for the elevator would be nice.  Hint. ;))

Agreed :-)

> hmm.  Actually the code looks a bit odd:
> 
>                 if (elv_linus_sequence(__rq)-- <= 0)
>                         break;
>                 if (!(__rq->flags & REQ_CMD))
>                         continue;
>                 if (elv_linus_sequence(__rq) < bio_sectors(bio))
>                         break;
> 
> The first decrement is saying that elv_linus_sequence is in units of
> requests, but the comparison (and the later `-= bio_sectors()') seems
> to be saying it's in units of sectors.

Well, it really is in units of sectors in 2.5, the first decrement is a
scan aging measure.

> I think calculating the latency in terms of requests makes more sense - just
> ignore the actual size of those requests (or weight it down in some manner).
> But I don't immediately see what the above code is up to?

That might make more sense, but again it's not likely to make
elevator_linus too tolerable anyways. You can easily changes the
read/write initial sequences to be >> 2 what they are now, and just
account seeks. The end result would be very similar, though :-)

-- 
Jens Axboe

next prev parent reply	other threads:[~2002-05-28  9:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-05-23 14:20 Poor read performance when sequential write presents chen, xiangping
2002-05-23 19:51 ` Andrew Morton
2002-05-24  8:59   ` Giuliano Pochini
2002-05-24  9:26     ` Andrew Morton
2002-05-24  9:46       ` William Lee Irwin III
2002-05-24 10:04         ` Andrew Morton
2002-05-27  8:06           ` Jens Axboe
2002-05-27  8:22             ` Andrew Morton
2002-05-27  8:54               ` Jens Axboe
2002-05-27  9:35                 ` Andrew Morton
2002-05-28  9:25                   ` Jens Axboe [this message]
2002-05-28  9:36                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020528092503.GJ17674@suse.de \
    --to=axboe@suse.de \
    --cc=akpm@zip.com.au \
    --cc=chen_xiangping@emc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pochini@shiny.it \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox