From: Andrew Morton <akpm@zip.com.au>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Rik van Riel <riel@conectiva.com.br>,
Marcelo Tosatti <marcelo@conectiva.com.br>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Wed, 12 Dec 2001 01:59:38 -0800 [thread overview]
Message-ID: <3C172A8A.3760C553@zip.com.au> (raw)
In-Reply-To: <3C15B0B3.1399043B@zip.com.au> <Pine.LNX.4.33L.0112111130110.4079-100000@imladris.surriel.com>, <Pine.LNX.4.33L.0112111130110.4079-100000@imladris.surriel.com>; <20011211144634.F4801@athlon.random> <3C1718E1.C22141B3@zip.com.au>, <3C1718E1.C22141B3@zip.com.au>; from akpm@zip.com.au on Wed, Dec 12, 2001 at 12:44:17AM -0800 <20011212102141.Q4801@athlon.random>
Andrea Arcangeli wrote:
>
> ...
> > The swapstorm I agree is uninteresting. The slowdown with a heavy write
> > load impacts a very common usage, and I've told you how to mostly fix
> > it. You need to back out the change to bdflush.
>
> I guess i should drop the run_task_queue(&tq_disk) instead of replacing
> it back with a wait_for_some_buffers().
hum. Nope, it definitely wants the wait_for_locked_buffers() in there.
36 seconds versus 25. (21 on stock kernel)
My theory is that balance_dirty() is directing heaps of wakeups
to bdflush, so bdflush just keeps on running. I'll take a look
tomorrow.
(If we're sending that many wakeups, we should do a waitqueue_active
test in wakeup_bdflush...)
> ...
>
> Note that the first elevator (not elevator_linus) could handle this
> case, however it was too complicated and I'm been told it was hurting
> too much the performance of things like dbench etc.. But it was allowing
> you to take a few seconds for your test number 2 for example. Quite
> frankly all my benchmark were latency oriented, but I couldn't notice
> an huge drop of performance, but OTOH at that time my test box had a
> 10mbyte/sec HD, and I know for experience that on such HD numbers tends
> to be very different than on fast SCSI and my current test hd IDE
> 33mbyte/sec so I think they were right.
OK, well I think I'll make it so the feature defaults to "off" - no
change in behaviour. People need to run `elvtune -b non-zero-value'
to turn it on.
So what is then needed is testing to determine the latency-versus-throughput
tradeoff. Andries takes manpage patches :)
> ...
> > - Your attempt to address read latencies didn't work out, and should
> > be dropped (hopefully Marcelo and Jens are OK with an elevator hack :))
>
> It should not be dropped. And it's not an hack, I only enabled the code
> that was basically disabled due the huge numbers. It will work as 2.2.20.
Sorry, I was referring to the elevator-bypass patch. Jens called
it a hack ;)
> Now what you want to add is an hack to move the read at the top of the
> request_queue and if you go back to 2.3.5x you'll see I was doing this,
> that's the first thing I did while playing with the elevator. And
> latency-wise it was working great. I'm sure somebody remebers the kind
> of latency you could get with such an elevator.
>
> Then I got flames from Linus and Ingo claiming that I screwedup the
> elevator and that I was the source of the 2.3.x bad I/O performance and
> so they required to nearly rewrite the elevator in a way that was
> obvious that couldn't hurt the benchmarks and so Jens dropped part of my
> latency-capable elevator and he did the elevator_linus that of course
> cannot hurt performance of benchmarks, but that has the usual problem
> you need to wait 1 minute for xterm to be stared under a write flood.
>
> However my object was to avoid nearly infinite starvation and the
> elevator_linus avoids it (you can start the xterm it in 1 minute,
> previously in early 2.3 and 2.2 you'd need to wait for the disk to be
> full, and that could take some day with some terabyte of data). So I was
> pretty much fine with elevator_linus too but we very well known reads
> would be starved again significantly (even if not indefinitely).
>
OK, thanks.
As long as the elevator-bypass tunable gives a good range of
latency-versus-throughput tuning then I'll be happy. It's a
bit sad that in even the best case, reads are penalised by a
factor of ten when there are writes happening.
But fixing that would require major readahead surgery, and perhaps
implementation of anticipatory scheduling, as described in
http://www.cse.ucsc.edu/~sbrandt/290S/anticipatoryscheduling.pdf
which is out of scope.
-
next prev parent reply other threads:[~2001-12-12 10:01 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-12-10 19:08 2.4.16 & OOM killer screw up (fwd) Marcelo Tosatti
2001-12-10 20:47 ` Andrew Morton
2001-12-10 19:42 ` Marcelo Tosatti
2001-12-11 0:11 ` Andrea Arcangeli
2001-12-11 7:07 ` Andrew Morton
2001-12-11 13:32 ` Rik van Riel
2001-12-11 13:46 ` Andrea Arcangeli
2001-12-12 8:44 ` Andrew Morton
2001-12-12 9:21 ` Andrea Arcangeli
2001-12-12 9:45 ` Rik van Riel
2001-12-12 10:09 ` Andrea Arcangeli
2001-12-12 9:59 ` Andrew Morton [this message]
2001-12-12 10:15 ` Andrea Arcangeli
2001-12-11 13:42 ` Andrea Arcangeli
2001-12-11 13:59 ` Rik van Riel
2001-12-11 14:23 ` Andrea Arcangeli
2001-12-11 15:27 ` Daniel Phillips
2001-12-12 11:16 ` Andrea Arcangeli
2001-12-12 20:03 ` Daniel Phillips
2001-12-12 21:25 ` Andrea Arcangeli
2001-12-11 13:59 ` Abraham vd Merwe
2001-12-11 14:01 ` Andrea Arcangeli
2001-12-11 17:30 ` Leigh Orf
2001-12-11 15:47 ` Henning P. Schmiedehausen
2001-12-11 16:01 ` Alan Cox
2001-12-11 16:37 ` Hubert Mantel
2001-12-11 17:09 ` Rik van Riel
2001-12-11 17:28 ` Alan Cox
2001-12-11 17:22 ` Rik van Riel
2001-12-11 17:23 ` Christoph Hellwig
2001-12-12 22:20 ` Rob Landley
2001-12-13 8:48 ` Alan Cox
2001-12-13 8:47 ` David S. Miller
2001-12-13 18:41 ` Matthias Andree
2001-12-13 10:22 ` [OT] " Rob Landley
2001-12-12 8:39 ` Andrew Morton
2001-12-11 0:43 ` Andrea Arcangeli
2001-12-11 15:46 ` Luigi Genoni
2001-12-12 22:05 ` Ken Brownfield
2001-12-12 22:30 ` Andrea Arcangeli
2001-12-12 23:23 ` Rik van Riel
[not found] <Pine.LNX.4.33L.0112102004490.1352-100000@duckman.distro.conectiva>
2001-12-11 16:45 ` Marcelo Tosatti
2001-12-11 18:51 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3C172A8A.3760C553@zip.com.au \
--to=akpm@zip.com.au \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo@conectiva.com.br \
--cc=riel@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox