From: Daniel Phillips <phillips@phunq.net>
To: Mike Snitzer <snitzer@gmail.com>
Cc: Christoph Lameter <clameter@sgi.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, dkegel@google.com,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
David Miller <davem@davemloft.net>, Nick Piggin <npiggin@suse.de>
Subject: Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)
Date: Mon, 17 Sep 2007 17:28:24 -0700 [thread overview]
Message-ID: <200709171728.26180.phillips@phunq.net> (raw)
In-Reply-To: <170fa0d20709072212m4563ce76sa83092640491e4f3@mail.gmail.com>
On Friday 07 September 2007 22:12, Mike Snitzer wrote:
> Can you be specific about which changes to existing mainline code
> were needed to make recursive reclaim "work" in your tests (albeit
> less ideally than peterz's patchset in your view)?
Sorry, I was incommunicado out on the high seas all last week. OK, the
measures that actually prevent our ddsnap driver from deadlocking are:
- Statically prove bounded memory use of all code in the writeout
path.
- Implement any special measures required to be able to make such a
proof.
- All allocations performed by the block driver must have access
to dedicated memory resources.
- Disable the congestion_wait mechanism for our code as much as
possible, at least enough to obtain the maximum memory resources
that can be used on the writeout path.
The specific measure we implement in order to prove a bound is:
- Throttle IO on our block device to a known amount of traffic for
which we are sure that the MEMALLOC reserve will always be
adequate.
Note that the boundedness proof we use is somewhat loose at the moment.
It goes something like "we only need at most X kilobytes of reserve and
there are X megabytes available". Much of Peter's patch set is aimed
at getting more precise about this, but to be sure, handwaving just
like this has been part of core kernel since day one without too many
ill effects.
The way we provide guaranteed access to memory resources is:
- Run critical daemons in PF_MEMALLOC mode, including
any userspace daemons that must execute in the block IO path
(cluster coders take note!)
Right now, all writeout submitted to ddsnap gets handed off to a daemon
running in PF_MEMALLOC mode. This is a needless inefficiency that we
want to remove in future, and handle as many of those submissions as
possible entirely in the context of the submitter. To do this, further
measures are needed:
- Network writes performed by the block driver must have access to
dedicated memory resources.
We have not yet managed to trigger network read memory deadlock, but it
is just a matter of time, additional fancy virtual block devices, and
enough stress. So:
- Network reads need some fancy extra support because dedicated
memory resources must be consumed before knowing whether the
network traffic belongs to a block device or not.
Now, the interesting thing about this whole discussion is, none of the
measures that we are actually using at the moment are implemented in
either Peter's or Christoph's patch set. In other words, at present we
do not require either patch set in order to run under heavy load
without deadlocking. But in order to generalize our solution to a
wider range of virtual block devices and other problematic systems such
as userspace filesystems, we need to incorporate a number of elements
of Peter's patch set.
As far as Christoph's proposal goes, it is not required to prevent
deadlocks. Whether or not it is a good optimization is an open
question.
Of all the patches posted so far related to this work, the only
indispensable one is the bio throttling patch developed by Evgeniy and
I in a parallel thread. The other essential pieces are all implemented
in our block driver for now. Some of those can be generalized and
moved at least partially into core, and some cannot.
I do need to write some sort of primer on this, because there is no
fire-and-forget magic core kernel solution. There are helpful things
we can do in core, but some of it can only be implemented in the
drivers themselves.
Regards,
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-09-18 0:28 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-14 14:21 [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC) Christoph Lameter
2007-08-14 14:21 ` [RFC 1/3] Allow reclaim via __GFP_NOMEMALLOC reclaim Christoph Lameter
2007-08-14 14:21 ` [RFC 2/3] Use NOMEMALLOC reclaim to allow reclaim if PF_MEMALLOC is set Christoph Lameter
2007-08-14 14:21 ` [RFC 3/3] Test code for PF_MEMALLOC reclaim Christoph Lameter
2007-08-14 14:36 ` [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC) Peter Zijlstra
2007-08-14 15:29 ` Christoph Lameter
2007-08-14 19:32 ` Peter Zijlstra
2007-08-14 19:41 ` Christoph Lameter
2007-08-15 12:22 ` Nick Piggin
2007-08-15 13:12 ` Peter Zijlstra
2007-08-15 14:15 ` Andi Kleen
2007-08-15 13:55 ` Peter Zijlstra
2007-08-15 14:34 ` Andi Kleen
2007-08-15 20:32 ` Christoph Lameter
2007-08-15 20:29 ` Christoph Lameter
2007-08-16 3:29 ` Nick Piggin
2007-08-16 20:27 ` Christoph Lameter
2007-08-20 3:51 ` Peter Zijlstra
2007-08-20 19:15 ` Christoph Lameter
2007-08-21 0:32 ` Nick Piggin
2007-08-21 0:28 ` Nick Piggin
2007-08-21 15:29 ` Peter Zijlstra
2007-08-23 3:02 ` Nick Piggin
2007-09-12 22:39 ` Christoph Lameter
2007-09-05 9:20 ` Daniel Phillips
2007-09-05 10:42 ` Christoph Lameter
2007-09-05 11:42 ` Nick Piggin
2007-09-05 12:14 ` Christoph Lameter
2007-09-05 12:19 ` Nick Piggin
2007-09-10 19:29 ` Christoph Lameter
2007-09-10 19:37 ` Peter Zijlstra
2007-09-10 19:41 ` Christoph Lameter
2007-09-10 19:55 ` Peter Zijlstra
2007-09-10 20:17 ` Christoph Lameter
2007-09-10 20:48 ` Peter Zijlstra
2007-09-11 7:41 ` Nick Piggin
2007-09-12 10:52 ` Peter Zijlstra
2007-09-12 22:47 ` Christoph Lameter
2007-09-13 8:19 ` Peter Zijlstra
2007-09-13 18:32 ` Christoph Lameter
2007-09-13 19:24 ` Peter Zijlstra
2007-09-05 16:16 ` Daniel Phillips
2007-09-08 5:12 ` Mike Snitzer
2007-09-18 0:28 ` Daniel Phillips [this message]
2007-09-18 3:27 ` Mike Snitzer
[not found] ` <200709172211.26493.phillips@phunq.net>
2007-09-18 8:11 ` Wouter Verhelst
2007-09-18 9:58 ` Peter Zijlstra
2007-09-18 16:56 ` Daniel Phillips
2007-09-18 19:16 ` Peter Zijlstra
2007-09-18 9:30 ` Peter Zijlstra
2007-09-18 18:40 ` Daniel Phillips
2007-09-18 20:13 ` Mike Snitzer
2007-09-10 19:25 ` Christoph Lameter
2007-09-10 19:55 ` Peter Zijlstra
2007-09-10 20:22 ` Christoph Lameter
2007-09-10 20:48 ` Peter Zijlstra
2007-10-26 17:44 ` Pavel Machek
2007-10-26 17:55 ` Christoph Lameter
2007-10-27 22:58 ` Daniel Phillips
2007-10-27 23:08 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200709171728.26180.phillips@phunq.net \
--to=phillips@phunq.net \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=davem@davemloft.net \
--cc=dkegel@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
--cc=snitzer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).