public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: Marcelo Tosatti <marcelo@conectiva.com.br>
Cc: Mike Fedyk <mfedyk@matchmail.com>,
	Ahmed Masud <masud@googgun.com>,
	"'lkml'" <linux-kernel@vger.kernel.org>
Subject: Re: Unresponiveness of 2.4.16
Date: Wed, 28 Nov 2001 12:31:14 -0800	[thread overview]
Message-ID: <3C054992.48F5C9E7@zip.com.au> (raw)
In-Reply-To: <3C03FE2F.63D7ACFD@zip.com.au> <Pine.LNX.4.21.0111281604390.15571-100000@freak.distro.conectiva>

Marcelo Tosatti wrote:
> 
> On Tue, 27 Nov 2001, Andrew Morton wrote:
> 
> > Mike Fedyk wrote:
> > >
> > > >   I'll send you a patch which makes the VM less inclined to page things
> > > >   out in the presence of heavy writes, and which decreases read
> > > >   latencies.
> > > >
> > > Is this patch posted anywhere?
> >
> > I sent it yesterday, in this thread.  Here it is again.
> >
> > Description:
> >
> > - Account for locked as well as dirty buffers when deciding
> >   to throttle writers.
> 
> Just one thing: If we have lots of locked buffers due to reads we are
> going to may unecessarily block writes, and thats not any good.

True.  I believe this change makes balance_dirty() work as it was
originally intended to work.   But in so doing, lots of things change.
Various places which have been tuned for the broken balance_dirty()
behaviour may need to be retuned.  It needs testing, thought, and
a comment from Linus would be helpful.

> But well, I prefer to fix interactivity than to care about that one kind
> of workload, so I'm ok with it.
> 
> > - Tweak VM to make it work the inactive list harder, before starting
> >   to evict pages or swap.
> 
> I would like to see he interactivity problems get fixed on block layer
> side first: Its not a VM issue initially. Actually, the thing is that if
> you tweak VM this way you're going to break some workloads.

Possibly.  I have a feeling that the VM is a bit too swaphappy,
especially in the presence of heavy write() loads.  I'd rather
see more aggressive dropbehind on the write() data, than see
useful cache data dropped.  But I'm not sure yet.
 
> > - Change the elevator so that once a request's latency has
> >   expired, we can still perform merges in front of that
> >   request.  But we no longer will insert new requests in
> >   front of that request.
> 
> Sounds fine... I've received quite many success reports already, right ?

A few people have reported success.  Nathan Grennan didn't. 

The elevator change also needs more testing and review.
There's a possibility that it could cause a seek-storm collapse
when interacting with readahead.   Currently, readhead does this:

	for (some pages) {
		alloc_page()
		page_cache_read()
	}

See the potential here for the alloc_page() to get abducted
by shrink_cache(), to perform IO, and to not return until after
the previous page_cache_read() has been submitted to the device?
Ouch.  Putting reads nearer the elevator head exposes this possibility.

It seems to not happen, due to the vagaries of the VM-of-the-minute,
and the workload.  But it could.

So the obvious change is to allocate all the readhead pages up-front
before issuing the reads.  I rewrote the readhead code to do this
(and dropped about 300 lines from filemap.c in the process), but given
that the condition doesn't trigger, it doesn't make much difference.

I've spent a week so far looking closely at various performance
and usability problems with 2.4. It's still a work-in-progress.
I don't feel ready to start offering anything for merging yet,
really.  Some of these things interact, and I'd prefer to get
more off-stream testing done, as well as code review.

Current patchset is at http://www.zip.com.au/~akpm/linux/2.4/2.4.17-pre1/

The list so far is:

vm-fixes.patch
	The balance_dirty() and less swap-happy changes
write-cluster.patch
	ext2 metadata prereading and various other hacks which
	prevent writes from stumbling over reads, and thus ruining
	write clustering.  This patch is in the early prototype stage
readhead.patch
	VM readhead rewrite.  Designed to avoid the above
	problem, and to make readhead growth more aggressive,
	and to make readhead shrinkth less aggressive.  I
	don't see why we should drop the readhead window on the
	floor if someone has read a few megs from a file and then
	seeks elsewhere within it.  Also uses common code for
	mmap readhead.  The madvise explicit dropbehind code
	accidentally died.  Oh well.
	Testing with paging-intensive workloads (start X11, staroffice6)
	indicates that we indeed do more IO, in less requests.  But
	walltime doesn't change.   I may not proceed with this.
mini-ll.patch
	A kinder, gentler low-latency patch, based on the one which
	Andrea is maintaining.  Doesn't drop any locks.  As far as
	I'm concerned, this can be merged today (six months ago, in
	fact).  It gives practically all the perceived benefit of
	the preemptive kernel patch and is clearly safe.
	A number of vendors are shipping kernels which are patched
	to add rescheduling points to copy_*_user(), which is
	much less effective than this patch.  They shouldn't
	be doing this.
elevator.patch
	The previously-described elevator changes
inline.patch
	Drops a large number of ill-chosen `inline' qualifiers
	from the kernel.  Removes a total of about 12,000 bytes
	of instructions, almost all from the very hottest parts of
	the kernel.  Should prove useful for computers which
	have an L1 cache which is faster than main memory.
block-alloc.patch
	My nemesis.  Fixing the long- and short-term fragmentation
	of ext2/ext3 blocks would be a more significant performance
	boost than anything else in the 2.4 series.  But it's just
	proving intractable.  I'll probably have to drop most of
	this, and look at online defrag.   There's potential for
	a 3x to 5x speedup here.

Also need to do something about the stalls which Nathan Grennan
has reported.  On ext3 it seems to be due to atime updates.
Not sure about ext2 yet.

  parent reply	other threads:[~2001-11-28 20:33 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-11-26 22:02 Unresponiveness of 2.4.16 Nathan G. Grennan
2001-11-26 22:17 ` Alan Cox
2001-11-26 23:34   ` Nicolas Pitre
2001-11-27  0:05     ` Steve Lion
2001-11-27  9:12     ` Ahmed Masud
2001-11-27 17:12       ` Andrew Morton
2001-11-27 20:31         ` Mike Fedyk
2001-11-27 20:57           ` Andrew Morton
2001-11-27 21:19             ` Martin Eriksson
2001-11-27 21:24             ` Mike Fedyk
2001-11-28 18:24             ` Marcelo Tosatti
2001-11-28 18:57               ` Marcelo Tosatti
2001-11-28 20:31               ` Andrew Morton [this message]
2001-11-28 20:56                 ` Andreas Dilger
2001-11-28 21:12                   ` Andrew Morton
2001-11-28 20:04                     ` Marcelo Tosatti
2001-11-28 21:26                       ` Andrew Morton
2001-11-26 23:59   ` Rik van Riel
2001-11-27  0:36     ` Andrew Morton
2001-11-27  0:46       ` Rik van Riel
2001-11-27  4:38       ` Mike Fedyk
2001-11-27  4:45         ` Andrew Morton
2001-11-27  1:45   ` Andrea Arcangeli
2001-11-26 22:21 ` Andrew Morton
2001-11-27  7:42   ` Jens Axboe
2001-11-27  7:58     ` Mike Fedyk
2001-11-27  8:01       ` Jens Axboe
2001-11-27  8:31     ` Andrew Morton
2001-11-27  8:38       ` Jens Axboe
2001-11-26 22:44 ` Lincoln Dale
2001-11-27  4:34   ` GOTO Masanori
2001-11-27  0:44 ` Lost Logic
2001-11-27  0:57   ` Lost Logic
2001-11-27  3:49 ` Sean Elble
2001-11-27  3:56   ` Doug Ledford
2001-11-27  4:00     ` Sean Elble
  -- strict thread matches above, loose matches on Subject: below --
2001-11-27  9:56 willy tarreau
2001-11-27 10:57 ` Heinz Diehl
2001-11-28  0:33 Torrey Hoffman
2001-11-28  0:48 ` Andrew Morton
2001-11-28 18:09   ` Marcelo Tosatti
2001-11-28 19:38     ` Andrew Morton
2001-11-28  1:31 Dieter Nützel
2001-11-28  2:13 ` Andrew Morton
2001-11-28  2:34   ` Mike Fedyk
2001-11-28  2:48     ` Andrew Morton
2001-11-28 20:21       ` Roger Larsson
2001-11-28  3:53     ` Dieter Nützel
     [not found]     ` <200111280353.fAS3rEB05638@zero.tech9.net>
2001-11-28  4:14       ` Robert Love
2001-11-28 18:56 Torrey Hoffman
2001-11-28 19:31 ` Andrew Morton
2001-11-28 19:42 Torrey Hoffman
2001-11-28 20:51 ` Dieter Nützel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3C054992.48F5C9E7@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo@conectiva.com.br \
    --cc=masud@googgun.com \
    --cc=mfedyk@matchmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox