From: Daniel Phillips <phillips@phunq.net>
To: Christoph Lameter <clameter@sgi.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, dkegel@google.com,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
David Miller <davem@davemloft.net>, Nick Piggin <npiggin@suse.de>
Subject: Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)
Date: Wed, 5 Sep 2007 02:20:53 -0700 [thread overview]
Message-ID: <200709050220.53801.phillips@phunq.net> (raw)
In-Reply-To: <20070814142103.204771292@sgi.com>
On Tuesday 14 August 2007 07:21, Christoph Lameter wrote:
> The following patchset implements recursive reclaim. Recursive
> reclaim is necessary if we run out of memory in the writeout patch
> from reclaim.
>
> This is f.e. important for stacked filesystems or anything that does
> complicated processing in the writeout path.
>
> Recursive reclaim works because it limits itself to only reclaim
> pages that do not require writeout. It will only remove clean pages
> from the LRU. The dirty throttling of the VM during regular reclaim
> insures that the amount of dirty pages is limited. If recursive
> reclaim causes too many clean pages to be removed then regular
> reclaim will throttle all processes until the dirty ratio is
> restored. This means that the amount of memory that can be reclaimed
> via recursive reclaim is limited to clean memory. The default ratio
> is 10%. This means that recursive reclaim can reclaim 90% of memory
> before failing. Reclaiming excessive amounts of clean pages may have
> a significant performance impact because this means that executable
> pages will be removed. However, it ensures that we will no longer
> fail in the writeout path.
>
> A patch is included to test this functionality. The test involved
> allocating 12 Megabytes from the reclaim paths when __PF_MEMALLOC is
> set. This is enough to exhaust the reserves.
Hi Christoph,
Over the last two weeks we have tested your patch set in the context of
ddsnap, which used to be prone to deadlock before we added a series of
anti-deadlock measures, including Peter's anti-deadlock patch set, our
own bio throttling code and judicious use of PF_MEMALLOC mode. This
cocktail of patches finally banished the deadlocks, none of which have
been seen during several months of heavy testing. The question in
which you are interested no doubt, is whether your patch set also
solves the same deadlocks.
The results are mixed. I will briefly describe the test setup now. If
you are interested in specific details for independent verification, we
can provide the full recipe separately. We used the patches here:
http://zumastor.googlecode.com/svn/trunk/ddsnap/patches/2.6.21.1/
driven by the scripted storage application here:
http://zumastor.googlecode.com/svn/trunk/zumastor/
If we remove our anti-deadlock measures, including the ddsnap.vm.fixes
(a roll-up of Peter's patch set) and the request throttling code in
dm-ddsnap.c, and apply your patch set instead, we hit deadlock on the
socket write path after a few hours (traceback tomorrow). So your
patch set by itself is a stability regression.
There is also some good news for you here. The combination of our
throttling code, plus your recursive reclaim patches and some fiddling
with PF_LESS_THROTTLE has so far survived testing without deadlocking.
In other words, as far as we have tested it, your patch set can
substitute for Peter's and produce the same effect, provided that we
throttle the block IO traffic.
Just to recap, we have identified two essential ingredients in the
recipe for writeout deadlock prevention:
1) Throttle block IO traffic to a bounded maximum memory use.
2) Guarantee availability of the required amount of memory.
Now we have learned that (1) is not optional with either the peterz or
the clameter approach, and we are wondering which is the better way to
handle (2).
If we accept for the moment that both approaches to (2) are equally
effective at preventing deadlock (this is debatable) then the next
criterion on the list for deciding the winner would be efficiency. A
slight oversimplification to be sure, since we are also interested in
issues of maintainability, provability and general forward progress.
However, since none of the latter is directly measurable, efficiency is
a good place to start.
It is clear which approach is more efficient: Peter's. This is because
no scanning is required to pop a free page off a free list, so scanning
work is not duplicated. How much more efficient is an open question.
Hopefully we will measure that soon.
Briefly touching on other factors:
* Peter's patch set is much bigger than yours. The active ingredients
need to be separated out from the other peterz bits such as reserve
management APIs so we can make a fairer comparison.
* Your patch set here does not address the question of atomic
allocation, though I see you have been busy with that elsewhere.
Adding code to take care of this means you will start catching up
with Peter in complexity.
* The questions Peter raised about how you will deal with loads
involving heavy anonymous allocations are still open. This looks
like more complexity on the way.
* You depend on maintaining a global dirty page limit while Peter's
approach does not. So we see the peterz approach as progress
towards eliminating one of the great thorns in our side:
congestion_wait deadlocks, which we currently hack around in a
thoroughly disgusting way (PF_LESS_THROTTLE abuse).
* Which approach allows us to run with a higher dirty page threshold?
More dirty page caching is better. We will test the two approaches
head to head on this issue pretty soon.
Regards,
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-09-05 9:20 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-14 14:21 [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC) Christoph Lameter
2007-08-14 14:21 ` [RFC 1/3] Allow reclaim via __GFP_NOMEMALLOC reclaim Christoph Lameter
2007-08-14 14:21 ` [RFC 2/3] Use NOMEMALLOC reclaim to allow reclaim if PF_MEMALLOC is set Christoph Lameter
2007-08-14 14:21 ` [RFC 3/3] Test code for PF_MEMALLOC reclaim Christoph Lameter
2007-08-14 14:36 ` [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC) Peter Zijlstra
2007-08-14 15:29 ` Christoph Lameter
2007-08-14 19:32 ` Peter Zijlstra
2007-08-14 19:41 ` Christoph Lameter
2007-08-15 12:22 ` Nick Piggin
2007-08-15 13:12 ` Peter Zijlstra
2007-08-15 14:15 ` Andi Kleen
2007-08-15 13:55 ` Peter Zijlstra
2007-08-15 14:34 ` Andi Kleen
2007-08-15 20:32 ` Christoph Lameter
2007-08-15 20:29 ` Christoph Lameter
2007-08-16 3:29 ` Nick Piggin
2007-08-16 20:27 ` Christoph Lameter
2007-08-20 3:51 ` Peter Zijlstra
2007-08-20 19:15 ` Christoph Lameter
2007-08-21 0:32 ` Nick Piggin
2007-08-21 0:28 ` Nick Piggin
2007-08-21 15:29 ` Peter Zijlstra
2007-08-23 3:02 ` Nick Piggin
2007-09-12 22:39 ` Christoph Lameter
2007-09-05 9:20 ` Daniel Phillips [this message]
2007-09-05 10:42 ` Christoph Lameter
2007-09-05 11:42 ` Nick Piggin
2007-09-05 12:14 ` Christoph Lameter
2007-09-05 12:19 ` Nick Piggin
2007-09-10 19:29 ` Christoph Lameter
2007-09-10 19:37 ` Peter Zijlstra
2007-09-10 19:41 ` Christoph Lameter
2007-09-10 19:55 ` Peter Zijlstra
2007-09-10 20:17 ` Christoph Lameter
2007-09-10 20:48 ` Peter Zijlstra
2007-09-11 7:41 ` Nick Piggin
2007-09-12 10:52 ` Peter Zijlstra
2007-09-12 22:47 ` Christoph Lameter
2007-09-13 8:19 ` Peter Zijlstra
2007-09-13 18:32 ` Christoph Lameter
2007-09-13 19:24 ` Peter Zijlstra
2007-09-05 16:16 ` Daniel Phillips
2007-09-08 5:12 ` Mike Snitzer
2007-09-18 0:28 ` Daniel Phillips
2007-09-18 3:27 ` Mike Snitzer
2007-09-18 9:30 ` Peter Zijlstra
[not found] ` <200709172211.26493.phillips@phunq.net>
2007-09-18 8:11 ` Wouter Verhelst
2007-09-18 9:58 ` Peter Zijlstra
2007-09-18 16:56 ` Daniel Phillips
2007-09-18 19:16 ` Peter Zijlstra
2007-09-18 18:40 ` Daniel Phillips
2007-09-18 20:13 ` Mike Snitzer
2007-09-10 19:25 ` Christoph Lameter
2007-09-10 19:55 ` Peter Zijlstra
2007-09-10 20:22 ` Christoph Lameter
2007-09-10 20:48 ` Peter Zijlstra
2007-10-26 17:44 ` Pavel Machek
2007-10-26 17:55 ` Christoph Lameter
2007-10-27 22:58 ` Daniel Phillips
2007-10-27 23:08 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200709050220.53801.phillips@phunq.net \
--to=phillips@phunq.net \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=davem@davemloft.net \
--cc=dkegel@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).