From: Andrea Arcangeli <andrea@suse.de>
To: Andrew Morton <akpm@zip.com.au>
Cc: Rik van Riel <riel@conectiva.com.br>,
Marcelo Tosatti <marcelo@conectiva.com.br>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: 2.4.16 & OOM killer screw up (fwd)
Date: Wed, 12 Dec 2001 11:15:58 +0100 [thread overview]
Message-ID: <20011212111558.Y4801@athlon.random> (raw)
In-Reply-To: <3C15B0B3.1399043B@zip.com.au> <Pine.LNX.4.33L.0112111130110.4079-100000@imladris.surriel.com>, <Pine.LNX.4.33L.0112111130110.4079-100000@imladris.surriel.com>; <20011211144634.F4801@athlon.random> <3C1718E1.C22141B3@zip.com.au>, <3C1718E1.C22141B3@zip.com.au>; <20011212102141.Q4801@athlon.random> <3C172A8A.3760C553@zip.com.au>
In-Reply-To: <3C172A8A.3760C553@zip.com.au>; from akpm@zip.com.au on Wed, Dec 12, 2001 at 01:59:38AM -0800
On Wed, Dec 12, 2001 at 01:59:38AM -0800, Andrew Morton wrote:
> Andrea Arcangeli wrote:
> >
> > ...
> > > The swapstorm I agree is uninteresting. The slowdown with a heavy write
> > > load impacts a very common usage, and I've told you how to mostly fix
> > > it. You need to back out the change to bdflush.
> >
> > I guess i should drop the run_task_queue(&tq_disk) instead of replacing
> > it back with a wait_for_some_buffers().
>
> hum. Nope, it definitely wants the wait_for_locked_buffers() in there.
> 36 seconds versus 25. (21 on stock kernel)
please try without the wait_for_locked_buffers and without the
run_task_queue, just delete that line.
>
> My theory is that balance_dirty() is directing heaps of wakeups
> to bdflush, so bdflush just keeps on running. I'll take a look
> tomorrow.
Please delete the wait_on_buffers from balance_dirty() too, it's totally
broken there as well.
wait_on_something _does_ wakeup the queue just like a run_task_queue()
otherwise it's a noop.
However I need to check better the refile of clean buffers from locked to
clean lists, we should make sure not to spend too much time there, the
first time a wait_on_buffers is recalled...
> (If we're sending that many wakeups, we should do a waitqueue_active
> test in wakeup_bdflush...)
>
> > ...
> >
> > Note that the first elevator (not elevator_linus) could handle this
> > case, however it was too complicated and I'm been told it was hurting
> > too much the performance of things like dbench etc.. But it was allowing
> > you to take a few seconds for your test number 2 for example. Quite
> > frankly all my benchmark were latency oriented, but I couldn't notice
> > an huge drop of performance, but OTOH at that time my test box had a
> > 10mbyte/sec HD, and I know for experience that on such HD numbers tends
> > to be very different than on fast SCSI and my current test hd IDE
> > 33mbyte/sec so I think they were right.
>
> OK, well I think I'll make it so the feature defaults to "off" - no
> change in behaviour. People need to run `elvtune -b non-zero-value'
> to turn it on.
Ok. BTW, I guess on this side it worth to work only on 2.5. We know
latency isn't very good in 2.4 and in 2.2, we're more throughput oriented.
Ah and of course to make the latency better we could as well reduce the
size of the I/O queue, I bet the queues are way oversized for a normal
desktop.
>
> So what is then needed is testing to determine the latency-versus-throughput
> tradeoff. Andries takes manpage patches :)
>
> > ...
> > > - Your attempt to address read latencies didn't work out, and should
> > > be dropped (hopefully Marcelo and Jens are OK with an elevator hack :))
> >
> > It should not be dropped. And it's not an hack, I only enabled the code
> > that was basically disabled due the huge numbers. It will work as 2.2.20.
>
> Sorry, I was referring to the elevator-bypass patch. Jens called
> it a hack ;)
Oh yes, that's an "hack" :), and it definitely works well for the latency.
>
> > Now what you want to add is an hack to move the read at the top of the
> > request_queue and if you go back to 2.3.5x you'll see I was doing this,
> > that's the first thing I did while playing with the elevator. And
> > latency-wise it was working great. I'm sure somebody remebers the kind
> > of latency you could get with such an elevator.
> >
> > Then I got flames from Linus and Ingo claiming that I screwedup the
> > elevator and that I was the source of the 2.3.x bad I/O performance and
> > so they required to nearly rewrite the elevator in a way that was
> > obvious that couldn't hurt the benchmarks and so Jens dropped part of my
> > latency-capable elevator and he did the elevator_linus that of course
> > cannot hurt performance of benchmarks, but that has the usual problem
> > you need to wait 1 minute for xterm to be stared under a write flood.
> >
> > However my object was to avoid nearly infinite starvation and the
> > elevator_linus avoids it (you can start the xterm it in 1 minute,
> > previously in early 2.3 and 2.2 you'd need to wait for the disk to be
> > full, and that could take some day with some terabyte of data). So I was
> > pretty much fine with elevator_linus too but we very well known reads
> > would be starved again significantly (even if not indefinitely).
> >
>
> OK, thanks.
>
> As long as the elevator-bypass tunable gives a good range of
> latency-versus-throughput tuning then I'll be happy. It's a
> bit sad that in even the best case, reads are penalised by a
> factor of ten when there are writes happening.
>
> But fixing that would require major readahead surgery, and perhaps
> implementation of anticipatory scheduling, as described in
> http://www.cse.ucsc.edu/~sbrandt/290S/anticipatoryscheduling.pdf
> which is out of scope.
>
> -
Andrea
next prev parent reply other threads:[~2001-12-12 10:15 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-12-10 19:08 2.4.16 & OOM killer screw up (fwd) Marcelo Tosatti
2001-12-10 20:47 ` Andrew Morton
2001-12-10 19:42 ` Marcelo Tosatti
2001-12-11 0:11 ` Andrea Arcangeli
2001-12-11 7:07 ` Andrew Morton
2001-12-11 13:32 ` Rik van Riel
2001-12-11 13:46 ` Andrea Arcangeli
2001-12-12 8:44 ` Andrew Morton
2001-12-12 9:21 ` Andrea Arcangeli
2001-12-12 9:45 ` Rik van Riel
2001-12-12 10:09 ` Andrea Arcangeli
2001-12-12 9:59 ` Andrew Morton
2001-12-12 10:15 ` Andrea Arcangeli [this message]
2001-12-11 13:42 ` Andrea Arcangeli
2001-12-11 13:59 ` Rik van Riel
2001-12-11 14:23 ` Andrea Arcangeli
2001-12-11 15:27 ` Daniel Phillips
2001-12-12 11:16 ` Andrea Arcangeli
2001-12-12 20:03 ` Daniel Phillips
2001-12-12 21:25 ` Andrea Arcangeli
2001-12-11 13:59 ` Abraham vd Merwe
2001-12-11 14:01 ` Andrea Arcangeli
2001-12-11 17:30 ` Leigh Orf
2001-12-11 15:47 ` Henning P. Schmiedehausen
2001-12-11 16:01 ` Alan Cox
2001-12-11 16:37 ` Hubert Mantel
2001-12-11 17:09 ` Rik van Riel
2001-12-11 17:28 ` Alan Cox
2001-12-11 17:22 ` Rik van Riel
2001-12-11 17:23 ` Christoph Hellwig
2001-12-12 22:20 ` Rob Landley
2001-12-13 8:48 ` Alan Cox
2001-12-13 8:47 ` David S. Miller
2001-12-13 18:41 ` Matthias Andree
2001-12-13 10:22 ` [OT] " Rob Landley
2001-12-12 8:39 ` Andrew Morton
2001-12-11 0:43 ` Andrea Arcangeli
2001-12-11 15:46 ` Luigi Genoni
2001-12-12 22:05 ` Ken Brownfield
2001-12-12 22:30 ` Andrea Arcangeli
2001-12-12 23:23 ` Rik van Riel
[not found] <Pine.LNX.4.33L.0112102004490.1352-100000@duckman.distro.conectiva>
2001-12-11 16:45 ` Marcelo Tosatti
2001-12-11 18:51 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20011212111558.Y4801@athlon.random \
--to=andrea@suse.de \
--cc=akpm@zip.com.au \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo@conectiva.com.br \
--cc=riel@conectiva.com.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox