public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Phillips <phillips@phunq.net>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: davidsen@tmr.com, linux-kernel@vger.kernel.org, peterz@infradead.org
Subject: Re: [RFC] [PATCH] A clean approach to writeout throttling
Date: Mon, 10 Dec 2007 01:20:21 -0800	[thread overview]
Message-ID: <200712100120.22657.phillips@phunq.net> (raw)
In-Reply-To: <200712062313.43383.phillips@phunq.net>

Hi Andrew,

Unfortunately, I agreed with your suggestion too hastily.   Not only 
would it be complex to implement, It does not work.  It took me several 
days to put my finger on exactly why.  Here it is in a nutshell: 
resources may be consumed _after_ the gatekeeper runs the "go, no go" 
throttling decision.  To illustrate, throw 10,000 bios simultaneously 
at a block stack that is supposed to allow only about 1,000 in flight 
at a time.  If the block stack allocates memory somewhat late in its 
servicing scheme (for example, when it sends a network message) then it 
is possible that no actual resource consumption will have taken place 
before all 10,000 bios are allowed past the gate keeper, and deadlock 
is sure to occur sooner or later.

In general, we must throttle against the maximum requirement of inflight 
bios rather than against the measured consumption.  This achieves the 
invariant I have touted, namely that memory consumption on the block 
writeout path must be bounded.  We could therefore possibly use your 
suggestion or something resembling it to implement a debug check that 
the programmer did in fact do their bounds arithmetic  correctly, but 
it is not useful for enforcing the bound itself.

In case that coffin needs more nails in it, consider that we would not 
only need to account page allocations, but frees as well.  So what 
tells us that a page has returned to the reserve pool?  Oops, tough 
one.  The page may have been returned to a slab and thus not actually 
freed, though it remains available for satisfying new bio transactions.  
Because of such caching, your algorithm would quickly lose track of 
available resources and grind to a halt.

Never mind that keeping track of page frees is a nasty problem in 
itself.   They can occur in interrupt context, so forget the current-> 
idea.  Even keeping track of page allocations for bio transactions in 
normal context will be a mess, and that is the easy part.  I can just 
imagine the code attempting to implement this approach acreting into a 
monster that gets confusingly close to working without ever actually 
getting  there.

We do have a simple, elegant solution posted at the head of this thread, 
which is known to work.

Regards,

Daniel

  reply	other threads:[~2007-12-10  9:20 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-06  0:03 [RFC] [PATCH] A clean approach to writeout throttling Daniel Phillips
2007-12-06  1:24 ` Andrew Morton
2007-12-06  6:21   ` Daniel Phillips
2007-12-06  7:31     ` Andrew Morton
2007-12-06  9:48       ` Daniel Phillips
2007-12-06 11:55         ` Andrew Morton
2007-12-06 15:52           ` Rik van Riel
2007-12-06 17:34             ` Andrew Morton
2007-12-06 17:48               ` Rik van Riel
2007-12-06 20:04           ` Daniel Phillips
2007-12-06 20:27             ` Andrew Morton
2007-12-06 21:27               ` Daniel Phillips
2007-12-06 21:53     ` Bill Davidsen
2007-12-07  0:04       ` Daniel Phillips
2007-12-07  0:29         ` Andrew Morton
2007-12-07  7:13           ` Daniel Phillips
2007-12-10  9:20             ` Daniel Phillips [this message]
2007-12-10 10:47 ` Jens Axboe
2007-12-10 11:23   ` [RFC] [PATCH] A clean aEvgeniy pproach " Daniel Phillips
2007-12-10 11:41     ` Jens Axboe
2007-12-10 12:13       ` Daniel Phillips
2007-12-10 12:16         ` Jens Axboe
2007-12-10 12:27           ` Daniel Phillips
2007-12-10 12:32             ` Jens Axboe
2007-12-10 13:04               ` Daniel Phillips
2007-12-10 13:19                 ` Jens Axboe
2007-12-10 13:26                   ` Daniel Phillips
2007-12-10 13:30                     ` Jens Axboe
2007-12-10 13:43                       ` Daniel Phillips
2007-12-10 13:53                         ` Jens Axboe
2007-12-10 14:17                           ` Daniel Phillips
2007-12-11 13:15                             ` Jens Axboe
2007-12-11 19:38                               ` Daniel Phillips
2007-12-11 20:01                                 ` Jens Axboe
2007-12-11 20:11                                   ` Daniel Phillips
2007-12-11 20:07                               ` Daniel Phillips
2007-12-10 11:33   ` [RFC] [PATCH] A clean approach " Daniel Phillips
2007-12-10 21:31 ` Jonathan Corbet
2007-12-10 22:06   ` Pekka Enberg
2007-12-11  4:21   ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200712100120.22657.phillips@phunq.net \
    --to=phillips@phunq.net \
    --cc=akpm@linux-foundation.org \
    --cc=davidsen@tmr.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox