All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: linux-mm <linux-mm@kvack.org>, clameter@sgi.com
Subject: Re: Excessive memory trapped in pageset lists
Date: Thu, 7 Apr 2005 21:34:36 -0500	[thread overview]
Message-ID: <20050408023436.GA1927@sgi.com> (raw)
In-Reply-To: <1112923481.21749.88.camel@localhost>

On Thu, Apr 07, 2005 at 06:24:41PM -0700, Dave Hansen wrote:
> On Thu, 2005-04-07 at 16:11 -0500, Jack Steiner wrote:
> >    28 pages/node/cpu * 512 cpus * 256nodes * 16384 bytes/page = 60GB  (Yikes!!!)
> ...
> > I have a couple of ideas for fixing this but it looks like Christoph is
> > actively making changes in this area. Christoph do you want to address
> > this issue or should I wait for your patch to stabilize?
> 
> What about only keeping the page lists populated for cpus which can
> locally allocate from the zone?
> 
> 	cpu_to_node(cpu) == page_nid(pfn_to_page(zone->zone_start_pfn)) 

Exactly. That is at the top of my list. What I haven't decided is whether to:

	- leave the list_heads for offnode pages in the per_cpu_pages
	  struct. Offnode lists would be unused but the amount of wasted space
	  is small - probably 0 because of the cacheline alignment 
	  of the per_cpu_pageset. This is the simplest solution
	  but is not clean because of the unused fields. Unless some
	  architectures want to control whether offnode pages
	  are kept in the lists (???).

	  	OR

	- remove the list_heads from the per_cpu_pageset and make it
	  a standalone array in the zone struct. Array size would be
	  MAX_CPUS_PER_NODE. I don't recall any notion of MAX_CPUS_PER_NODE
	  or a relative cpu number on a node (have I overlooked this?). 
	  This solution is cleaner in the long run but may involve more 
	  infrastructure than I wanted to get into at this point.

	  	OR

	- sane as above but have a SINGLE list_head per zone. The list
	  would be used by all cpus on the node. Thsi avoids the page coloring
	  issues I ran into earlier (see prev posting). Obviously, this requires 
	  a lock. However, only on-node cpus would normally take the lock. 
	  Another advantage of this scheme is that an offnode shaker could 
	  acquire the lock & drain the lists if memory became low.

I haven't fully thought thru these ideas. Maybe other alternatives would
be even better.... Suggestions????


> 
> There certainly aren't a lot of cases where frequent, persistent
> single-page allocations are occurring off-node, unless a node is empty.

Hmmmm. True, but one of our popular configurations consists of memory-only nodes.
I know of one site that has 240 memory-only nodes & 16 nodes with
both cpus & memory. For this configuration, most memory if offnode 
to EVERY cpu. (But I still don't want to cache offnode pages).


> If you go to an off-node 'struct zone', you're probably bouncing so many
> cachelines that you don't get any benefit from per-cpu-pages anyway.

Agree, although on the SGI systems, we set a global policy to roundrobin
all file pages across all nodes. However, I'm not suggesting we cache
offnode pages in the per_cpu_pageset. That gets us back to where we 
started - too much memory in percpu page lists. Also, creating a file
page already bounces a lot of cachelines around.

> 
> Maybe there could be a per-cpu-pages miss rate that's required to occur
> before the lists are even populated.  That would probably account better
> for cases where nodes are disproportionately populated with memory.
> This, along with the occasional flushing of the pages back into the
> general allocator if the miss rate isn't satisfied should give some good
> self-tuning behavior.

Makes sense.

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2005-04-08  2:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-07 21:11 Excessive memory trapped in pageset lists Jack Steiner
2005-04-08  1:24 ` Dave Hansen
2005-04-08  2:34   ` Jack Steiner [this message]
2005-04-08  5:18     ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050408023436.GA1927@sgi.com \
    --to=steiner@sgi.com \
    --cc=clameter@sgi.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.