All of lore.kernel.org
 help / color / mirror / Atom feed
From: Helge Hafting <helgehaf@aitel.hist.no>
To: erik@hensema.net
Cc: linux-kernel@vger.kernel.org
Subject: Re: Delaying writes to disk when there's no need
Date: Mon, 31 Mar 2003 15:42:00 +0200	[thread overview]
Message-ID: <3E8845A8.20107@aitel.hist.no> (raw)
In-Reply-To: slrnb8gbfp.1d6.erik@bender.home.hensema.net

Erik Hensema wrote:
[...]
> Helge Hafting already pointed out that writing out the data earlier isn't
> desirable. The problem isn't in the waiting: the problem is in the writing.
> I think the current kernel tries to write too much data too fast when
> there's absolutely no reason to do so. It should probably gently write out
> small amounts of data until there is a more pressing need for memory.
> 
I don't think the problem is "writing a large chunk", rather that this
chunk is scheduled for writing a bit too late.  Memory is filling up
and the process producing data us throttled while waiting for
the write to free up pages.  Then the "huge chunk" of pages is released,
and memory is allowed to fill up for too long again.

Seem to me the correct solution is to start writing out
things long before memory gets so full that we need to
throttle the producer.

This will result in somewhat smaller chunks and a somewhat
steadier stream of data.  It will work better, but not because
the chunks are smaller. (Block devices is supposed to handle
enormous chunks with no problems, and the bandwith utilization
is generally better the bigger chunks you can get.  50M to
disk in one go isn't "pushing" anything - it is "nice".)
The reason an earlier start works better is that memory never fills up
to the point where the producer is throttled, assuming
the io system can keep up with the producer forever.
Throttling will _always_ happen when that isn't the case.

The tricky part here is knowing the bandwith of the output
device, and start writing at such a time that memory
won't have time to fill up in the case where a
producer is almost as fast as the output device.

The problem is that this depends on several things:
1. How much more memory is there (varies a lot, but
    the kernel knows this one.)
2. How fast is the output device (varies a lot, different
    areas on a disk have different speed.  Different
    disks have different speed.  The speed of nfs depends
    on network speed, network congestion,
    roundtrip time, server load, and server disk speed.
    You probably cannot get good estimates for all cases,
    particularly not nfs in a shared net.
    To get this right we need both bandwith and latency.
3. How fast is data produced?  A global estimate may
    be possible, looking at how fast memory is dirtied.
    I have no idea if such an estimate is possible per
    block device.
4. The big problem is that there may be several unrelated
    processes dirtying memory to be written to several
    very different block devices.
    For this to work automatically we need a low estimate
    for the bandwith for each block device/filesystem,
    and memory dirying rate for each.


This seems hard to solve automatically.  A specific
case of a realtime program writing near disk speed is solvable
by having an extra thread that issue a fsync whenever the
amount of written but unsynced data gets near the point
where the time necessary to write it is long enough
to fill memory with the same rate of producing data.
Of course one wants a substantial safety margin here,
perhaps an assumption that only one third or so of memory
actually will be available for caching the important stuff.

A manual solution is possible if we can have two "knobs"
for this:
1. Treshold for when to start writing out stuff
2. Treshold for when to throttle processes.

The latter may or may not be necessary, the point is that the former
should kick in long before throttling is necessary.

This is usually expressed as how many % of memory that is dirty, but
I'm not sure that is the right thing.  It assumes that 100% will be
available after cleaning, which may be way off.

Something like % of memory that is still available (free,
or instantly freeable by reclaiming clean unpinned cache)

Helge Hafting





  reply	other threads:[~2003-03-31 13:28 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-03-26 20:31 Delaying writes to disk when there's no need Erik Hensema
2003-03-27  9:06 ` Helge Hafting
2003-03-27 11:22   ` Erik Hensema
2003-03-28 23:12 ` Pavel Machek
2003-03-31 12:00   ` Erik Hensema
2003-03-31 13:42     ` Helge Hafting [this message]
2003-03-31 14:45       ` Oliver Neukum
2003-03-31 22:02       ` Nick Piggin
2003-03-31 22:22         ` Chris Friesen
2003-03-31 22:35           ` Nick Piggin
2003-03-31 22:51             ` John Bradford
2003-03-31 22:58               ` Nick Piggin
2003-03-31 22:45         ` Andrew Morton
2003-03-31 23:03           ` Nick Piggin
2003-03-31 23:32           ` Ingo Oeser
2003-04-01  0:02             ` Andrew Morton
2003-04-01  0:43           ` Daniel Pittman
2003-04-01  1:09             ` Andrew Morton
2003-04-01  1:34               ` Daniel Pittman
2003-04-01  1:45                 ` Andrew Morton
     [not found]               ` <3E88EB3D.6020409@cyberone.com.au>
2003-04-01  1:39                 ` Andrew Morton
     [not found] <20030326204012$188c@gated-at.bofh.it>
     [not found] ` <20030327091007$22a5@gated-at.bofh.it>
     [not found]   ` <20030327113014$37b4@gated-at.bofh.it>
2003-03-28 10:18     ` Tim Connors
2003-03-30 17:38       ` Helge Hafting

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E8845A8.20107@aitel.hist.no \
    --to=helgehaf@aitel.hist.no \
    --cc=erik@hensema.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.