Re: Poor read performance when sequential write presents

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@zip.com.au>
To: Jens Axboe <axboe@suse.de>
Cc: William Lee Irwin III <wli@holomorphy.com>,
	Giuliano Pochini <pochini@shiny.it>,
	linux-kernel@vger.kernel.org, "chen,
	xiangping" <chen_xiangping@emc.com>
Subject: Re: Poor read performance when sequential write presents
Date: Mon, 27 May 2002 02:35:52 -0700	[thread overview]
Message-ID: <3CF1FDF8.B775DF44@zip.com.au> (raw)
In-Reply-To: <3CED4843.2783B568@zip.com.au> <XFMail.20020524105942.pochini@shiny.it> <3CEE0758.27110CAD@zip.com.au> <20020524094606.GH14918@holomorphy.com> <3CEE1035.1E67E1B8@zip.com.au> <20020527080632.GC17674@suse.de> <3CF1ECD1.A1BB2CF1@zip.com.au> <20020527085414.GD17674@suse.de>

Jens Axboe wrote:
> 
> On Mon, May 27 2002, Andrew Morton wrote:
> > Jens Axboe wrote:
> > >
> > > ...
> > > > But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
> > > > treated the same.  Odd.
> > >
> > > It didn't really go away, it just gets handled automatically now.
> > > elv_next_request() marks the request as started, in which case the i/o
> > > scheduler won't consider it for merging etc. SCSI removes the request
> > > directly after it has been marked started, while IDE leaves it on the
> > > queue until it completes. For IDE TCQ, the behaviour is the same as with
> > > SCSI.
> >
> > It won't consider the active request at the head of the queue for
> > merging (making the request larger).  But it _could_ consider the
> > request when making decisions about insertion (adding a new request
> > at the head of the queue because it's close-on-disk to the active
> > one).   Does it do that?
> 
> Only when the front request isn't active is it safe to consider
> insertion in front of it. 2.5 does that exactly because it knows if the
> request has been started, while 2.4 has to guess by looking at the
> head-active flag and the plug status.
> 
> If the request is started, we will only consider placing in front of the
> 2nd request not after the 1st. We could consider in between 1st and 2nd,
> that should be safe. In fact that should be perfectly safe, just move
> the barrier and started test down after the insert test. *req is the
> insert-after point.

Makes sense.  I suspect it may even worsen the problem I observed
with the mpage code.  Set the readahead to 256k with `blockdev --setra 512'
and then run tiobench.  The read latencies are massive - one thread
gets hold of the disk head and hogs it for 30-60 seconds.

The readahead code has a sort of double-window design.  The idea is that
if the disk does 50 megs/sec and your application processes data at
49 megs/sec, the application will never block on I/O.  At 256k readahead,
the readahead code will be laying out four BIOs at a time.  It's probable
that the application is actually submitting BIOs for a new readahead
window before all of the BIOs for the old one are complete.  So it's performing
merging against its own reads.

Given all this, what I would expect to see is for thread "A" to capture
the disk head for some period of time, until eventually one of thread "B"'s
requests expires its latency.  Then thread "B" gets to hog the disk head.
That's reasonable behaviour,  but the latencies are *enormous*.  Almost
like the latency stuff isn't working.  But it sure looks OK.

Not super-high priority at this time.  I'll play with it some more.
(Some userspace tunables for the elevator would be nice.  Hint. ;))

hmm.  Actually the code looks a bit odd:

                if (elv_linus_sequence(__rq)-- <= 0)
                        break;
                if (!(__rq->flags & REQ_CMD))
                        continue;
                if (elv_linus_sequence(__rq) < bio_sectors(bio))
                        break;

The first decrement is saying that elv_linus_sequence is in units of
requests, but the comparison (and the later `-= bio_sectors()') seems
to be saying it's in units of sectors.

I think calculating the latency in terms of requests makes more sense - just
ignore the actual size of those requests (or weight it down in some manner).
But I don't immediately see what the above code is up to?

-

next prev parent reply	other threads:[~2002-05-27  9:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-05-23 14:20 Poor read performance when sequential write presents chen, xiangping
2002-05-23 19:51 ` Andrew Morton
2002-05-24  8:59   ` Giuliano Pochini
2002-05-24  9:26     ` Andrew Morton
2002-05-24  9:46       ` William Lee Irwin III
2002-05-24 10:04         ` Andrew Morton
2002-05-27  8:06           ` Jens Axboe
2002-05-27  8:22             ` Andrew Morton
2002-05-27  8:54               ` Jens Axboe
2002-05-27  9:35                 ` Andrew Morton [this message]
2002-05-28  9:25                   ` Jens Axboe
2002-05-28  9:36                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3CF1FDF8.B775DF44@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=axboe@suse.de \
    --cc=chen_xiangping@emc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pochini@shiny.it \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox