From: Jens Axboe <axboe@suse.de>
To: Andrew Morton <akpm@zip.com.au>
Cc: William Lee Irwin III <wli@holomorphy.com>,
Giuliano Pochini <pochini@shiny.it>,
linux-kernel@vger.kernel.org, "chen,
xiangping" <chen_xiangping@emc.com>
Subject: Re: Poor read performance when sequential write presents
Date: Tue, 28 May 2002 11:25:03 +0200 [thread overview]
Message-ID: <20020528092503.GJ17674@suse.de> (raw)
In-Reply-To: <3CED4843.2783B568@zip.com.au> <XFMail.20020524105942.pochini@shiny.it> <3CEE0758.27110CAD@zip.com.au> <20020524094606.GH14918@holomorphy.com> <3CEE1035.1E67E1B8@zip.com.au> <20020527080632.GC17674@suse.de> <3CF1ECD1.A1BB2CF1@zip.com.au> <20020527085414.GD17674@suse.de> <3CF1FDF8.B775DF44@zip.com.au>
On Mon, May 27 2002, Andrew Morton wrote:
> > On Mon, May 27 2002, Andrew Morton wrote:
> > > Jens Axboe wrote:
> > > >
> > > > ...
> > > > > But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
> > > > > treated the same. Odd.
> > > >
> > > > It didn't really go away, it just gets handled automatically now.
> > > > elv_next_request() marks the request as started, in which case the i/o
> > > > scheduler won't consider it for merging etc. SCSI removes the request
> > > > directly after it has been marked started, while IDE leaves it on the
> > > > queue until it completes. For IDE TCQ, the behaviour is the same as with
> > > > SCSI.
> > >
> > > It won't consider the active request at the head of the queue for
> > > merging (making the request larger). But it _could_ consider the
> > > request when making decisions about insertion (adding a new request
> > > at the head of the queue because it's close-on-disk to the active
> > > one). Does it do that?
> >
> > Only when the front request isn't active is it safe to consider
> > insertion in front of it. 2.5 does that exactly because it knows if the
> > request has been started, while 2.4 has to guess by looking at the
> > head-active flag and the plug status.
> >
> > If the request is started, we will only consider placing in front of the
> > 2nd request not after the 1st. We could consider in between 1st and 2nd,
> > that should be safe. In fact that should be perfectly safe, just move
> > the barrier and started test down after the insert test. *req is the
> > insert-after point.
>
> Makes sense. I suspect it may even worsen the problem I observed
> with the mpage code. Set the readahead to 256k with `blockdev --setra 512'
> and then run tiobench. The read latencies are massive - one thread
> gets hold of the disk head and hogs it for 30-60 seconds.
>
> The readahead code has a sort of double-window design. The idea is that
> if the disk does 50 megs/sec and your application processes data at
> 49 megs/sec, the application will never block on I/O. At 256k readahead,
> the readahead code will be laying out four BIOs at a time. It's probable
> that the application is actually submitting BIOs for a new readahead
> window before all of the BIOs for the old one are complete. So it's
> performing merging against its own reads.
>
> Given all this, what I would expect to see is for thread "A" to capture
> the disk head for some period of time, until eventually one of thread "B"'s
> requests expires its latency. Then thread "B" gets to hog the disk head.
> That's reasonable behaviour, but the latencies are *enormous*. Almost
> like the latency stuff isn't working. But it sure looks OK.
I'm still waiting for some time to implement some nicer i/o scheduling
algorithms, I'd be sad to see elevator_linus be the default for 2.6. For
now it's just receiving the odd fixes here and there which do make small
improvements.
> Not super-high priority at this time. I'll play with it some more.
> (Some userspace tunables for the elevator would be nice. Hint. ;))
Agreed :-)
> hmm. Actually the code looks a bit odd:
>
> if (elv_linus_sequence(__rq)-- <= 0)
> break;
> if (!(__rq->flags & REQ_CMD))
> continue;
> if (elv_linus_sequence(__rq) < bio_sectors(bio))
> break;
>
> The first decrement is saying that elv_linus_sequence is in units of
> requests, but the comparison (and the later `-= bio_sectors()') seems
> to be saying it's in units of sectors.
Well, it really is in units of sectors in 2.5, the first decrement is a
scan aging measure.
> I think calculating the latency in terms of requests makes more sense - just
> ignore the actual size of those requests (or weight it down in some manner).
> But I don't immediately see what the above code is up to?
That might make more sense, but again it's not likely to make
elevator_linus too tolerable anyways. You can easily changes the
read/write initial sequences to be >> 2 what they are now, and just
account seeks. The end result would be very similar, though :-)
--
Jens Axboe
next prev parent reply other threads:[~2002-05-28 9:25 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-23 14:20 Poor read performance when sequential write presents chen, xiangping
2002-05-23 19:51 ` Andrew Morton
2002-05-24 8:59 ` Giuliano Pochini
2002-05-24 9:26 ` Andrew Morton
2002-05-24 9:46 ` William Lee Irwin III
2002-05-24 10:04 ` Andrew Morton
2002-05-27 8:06 ` Jens Axboe
2002-05-27 8:22 ` Andrew Morton
2002-05-27 8:54 ` Jens Axboe
2002-05-27 9:35 ` Andrew Morton
2002-05-28 9:25 ` Jens Axboe [this message]
2002-05-28 9:36 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020528092503.GJ17674@suse.de \
--to=axboe@suse.de \
--cc=akpm@zip.com.au \
--cc=chen_xiangping@emc.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pochini@shiny.it \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox