Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <npiggin@suse.de>,
	Daniel Phillips <phillips@phunq.net>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, dkegel@google.com,
	David Miller <davem@davemloft.net>
Subject: Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)
Date: Thu, 13 Sep 2007 10:19:12 +0200	[thread overview]
Message-ID: <1189671552.21778.158.camel@twins> (raw)
In-Reply-To: <Pine.LNX.4.64.0709121540370.4067@schroedinger.engr.sgi.com>

[-- Attachment #1: Type: text/plain, Size: 4492 bytes --]

On Wed, 2007-09-12 at 15:47 -0700, Christoph Lameter wrote:
> On Wed, 12 Sep 2007, Peter Zijlstra wrote:
> 
> > > assumes single critical user of memory. There are other consumers of 
> > > memory and if you have a load that depends on other things than networking 
> > > then you should not kill the other things that want memory.
> > 
> > The VM is a _critical_ user of memory. And I dare say it is the _most_
> > important user. 
> 
> The users of memory are various subsystems. The VM itself of course also 
> uses memory to manage memory but the important thing is that the VM 
> provides services to other subsystems

Exactly, and because it services every other subsystem and userspace,
its the most important one, if it doesn't work, nothing else will.

> > Every user of memory relies on the VM, and we only get into trouble if
> > the VM in turn relies on one of these users. Traditionally that has only
> > been the block layer, and we special cased that using mempools and
> > PF_MEMALLOC.
> > 
> > Why do you object to me doing a similar thing for networking?
> 
> I have not seen you using mempools for the networking layer. I would not 
> object to such a solution. It already exists for other subsystems.

Dude, listen, how often do I have to say this: I cannot use mempools for
the network subsystem because its build on kmalloc! What I've done is
build a replacement for mempools - a reserve system - that does work
similar to mempools but also provides the flexibility of kmalloc.

That is all, no more, no less.

> > The problem of circular dependancies on and with the VM is rather
> > limited to kernel IO subsystems, and we only have a limited amount of
> > them. 
> 
> The kernel has to use the filesystems and other subsystems for I/O. These 
> subsystems compete for memory in order to make progress. I would not 
> consider strictly them part of the VM. The kernel reclaim may trigger I/O 
> in multiple I/O subsystems simultaneously.

I'm confused by this, I've never claimed part of, or such a thing. All
I'm saying is that because of the circular dependency between the VM and
the IO subsystem used for swap (not file backed paging [*], just swap)
you have to do something special to avoid deadlocks.

[*] the dirty limit along with 'atomic' swap ensures that file backed
paging does not get into this tight spot.

> > You talk about something generic, do you mean an approach that is
> > generic across all these subsystems?
> 
> Yes an approach that is fair and does not allow one single subsystem to 
> hog all of memory.

I do no such thing! My reserve system works much like mempools, you
reserve a certain amount of pages and use no more.

> > If so, my approach would be it, I can replace mempools as we have them
> > with the reserve system I introduce.
> 
> Replacing the mempools for the block layer sounds pretty good. But how do 
> these various subsystems that may live in different portions of the system 
> for various devices avoid global serialization and livelock through your 
> system? 

The reserves are spread over all kernel mapped zones, the slab allocator
is still per cpu, the page allocator tries to get pages from the nearest
node.

> And how is fairness addresses? I may want to run a fileserver on 
> some nodes and a HPC application that relies on a fiberchannel connection 
> on other nodes. How do we guarantee that the HPC application is not 
> impacted if the network services of the fileserver flood the system with 
> messages and exhaust memory?

The network system reserves A pages, the block layer reserves B pages,
once they start getting pages from the reserves they go bean counting,
once they reach their respective limit they stop.

The serialisation impact of the bean counting depends on how
fine-grained you place them, currently I only have a machine wide
network bean counter because the network subsystem is machine wide -
initially I tried to do something per net-device but that doesn't work
out. If someone more skilled in this area comes along and sees a better
way to place the bean counters they are free to do so.

But do notice that the bean counting is only done once we hit the
reserves, the normal mode of operation is not penalised by the extra
overhead thereof.

Also note that mempools also serialise their access once the backing
allocator fails, so I don't differ from them in that respect either.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

next prev parent reply	other threads:[~2007-09-13  8:19 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-14 14:21 [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC) Christoph Lameter
2007-08-14 14:21 ` [RFC 1/3] Allow reclaim via __GFP_NOMEMALLOC reclaim Christoph Lameter
2007-08-14 14:21 ` [RFC 2/3] Use NOMEMALLOC reclaim to allow reclaim if PF_MEMALLOC is set Christoph Lameter
2007-08-14 14:21 ` [RFC 3/3] Test code for PF_MEMALLOC reclaim Christoph Lameter
2007-08-14 14:36 ` [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC) Peter Zijlstra
2007-08-14 15:29   ` Christoph Lameter
2007-08-14 19:32     ` Peter Zijlstra
2007-08-14 19:41       ` Christoph Lameter
2007-08-15 12:22 ` Nick Piggin
2007-08-15 13:12   ` Peter Zijlstra
2007-08-15 14:15     ` Andi Kleen
2007-08-15 13:55       ` Peter Zijlstra
2007-08-15 14:34         ` Andi Kleen
2007-08-15 20:32         ` Christoph Lameter
2007-08-15 20:29     ` Christoph Lameter
2007-08-16  3:29     ` Nick Piggin
2007-08-16 20:27       ` Christoph Lameter
2007-08-20  3:51       ` Peter Zijlstra
2007-08-20 19:15         ` Christoph Lameter
2007-08-21  0:32           ` Nick Piggin
2007-08-21  0:28         ` Nick Piggin
2007-08-21 15:29           ` Peter Zijlstra
2007-08-23  3:02             ` Nick Piggin
2007-09-12 22:39           ` Christoph Lameter
2007-09-05  9:20 ` Daniel Phillips
2007-09-05 10:42   ` Christoph Lameter
2007-09-05 11:42     ` Nick Piggin
2007-09-05 12:14       ` Christoph Lameter
2007-09-05 12:19         ` Nick Piggin
2007-09-10 19:29           ` Christoph Lameter
2007-09-10 19:37             ` Peter Zijlstra
2007-09-10 19:41               ` Christoph Lameter
2007-09-10 19:55                 ` Peter Zijlstra
2007-09-10 20:17                   ` Christoph Lameter
2007-09-10 20:48                     ` Peter Zijlstra
2007-09-11  7:41             ` Nick Piggin
2007-09-12 10:52         ` Peter Zijlstra
2007-09-12 22:47           ` Christoph Lameter
2007-09-13  8:19             ` Peter Zijlstra [this message]
2007-09-13 18:32               ` Christoph Lameter
2007-09-13 19:24                 ` Peter Zijlstra
2007-09-05 16:16     ` Daniel Phillips
2007-09-08  5:12       ` Mike Snitzer
2007-09-18  0:28         ` Daniel Phillips
2007-09-18  3:27           ` Mike Snitzer
2007-09-18  9:30             ` Peter Zijlstra
     [not found]             ` <200709172211.26493.phillips@phunq.net>
2007-09-18  8:11               ` Wouter Verhelst
2007-09-18  9:58               ` Peter Zijlstra
2007-09-18 16:56                 ` Daniel Phillips
2007-09-18 19:16                   ` Peter Zijlstra
2007-09-18 18:40             ` Daniel Phillips
2007-09-18 20:13               ` Mike Snitzer
2007-09-10 19:25       ` Christoph Lameter
2007-09-10 19:55         ` Peter Zijlstra
2007-09-10 20:22           ` Christoph Lameter
2007-09-10 20:48             ` Peter Zijlstra
2007-10-26 17:44               ` Pavel Machek
2007-10-26 17:55                 ` Christoph Lameter
2007-10-27 22:58                   ` Daniel Phillips
2007-10-27 23:08                 ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1189671552.21778.158.camel@twins \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=davem@davemloft.net \
    --cc=dkegel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=phillips@phunq.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).