From: Jack Steiner <steiner@sgi.com>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: linux-mm <linux-mm@kvack.org>, clameter@sgi.com
Subject: Re: Excessive memory trapped in pageset lists
Date: Thu, 7 Apr 2005 21:34:36 -0500 [thread overview]
Message-ID: <20050408023436.GA1927@sgi.com> (raw)
In-Reply-To: <1112923481.21749.88.camel@localhost>
On Thu, Apr 07, 2005 at 06:24:41PM -0700, Dave Hansen wrote:
> On Thu, 2005-04-07 at 16:11 -0500, Jack Steiner wrote:
> > 28 pages/node/cpu * 512 cpus * 256nodes * 16384 bytes/page = 60GB (Yikes!!!)
> ...
> > I have a couple of ideas for fixing this but it looks like Christoph is
> > actively making changes in this area. Christoph do you want to address
> > this issue or should I wait for your patch to stabilize?
>
> What about only keeping the page lists populated for cpus which can
> locally allocate from the zone?
>
> cpu_to_node(cpu) == page_nid(pfn_to_page(zone->zone_start_pfn))
Exactly. That is at the top of my list. What I haven't decided is whether to:
- leave the list_heads for offnode pages in the per_cpu_pages
struct. Offnode lists would be unused but the amount of wasted space
is small - probably 0 because of the cacheline alignment
of the per_cpu_pageset. This is the simplest solution
but is not clean because of the unused fields. Unless some
architectures want to control whether offnode pages
are kept in the lists (???).
OR
- remove the list_heads from the per_cpu_pageset and make it
a standalone array in the zone struct. Array size would be
MAX_CPUS_PER_NODE. I don't recall any notion of MAX_CPUS_PER_NODE
or a relative cpu number on a node (have I overlooked this?).
This solution is cleaner in the long run but may involve more
infrastructure than I wanted to get into at this point.
OR
- sane as above but have a SINGLE list_head per zone. The list
would be used by all cpus on the node. Thsi avoids the page coloring
issues I ran into earlier (see prev posting). Obviously, this requires
a lock. However, only on-node cpus would normally take the lock.
Another advantage of this scheme is that an offnode shaker could
acquire the lock & drain the lists if memory became low.
I haven't fully thought thru these ideas. Maybe other alternatives would
be even better.... Suggestions????
>
> There certainly aren't a lot of cases where frequent, persistent
> single-page allocations are occurring off-node, unless a node is empty.
Hmmmm. True, but one of our popular configurations consists of memory-only nodes.
I know of one site that has 240 memory-only nodes & 16 nodes with
both cpus & memory. For this configuration, most memory if offnode
to EVERY cpu. (But I still don't want to cache offnode pages).
> If you go to an off-node 'struct zone', you're probably bouncing so many
> cachelines that you don't get any benefit from per-cpu-pages anyway.
Agree, although on the SGI systems, we set a global policy to roundrobin
all file pages across all nodes. However, I'm not suggesting we cache
offnode pages in the per_cpu_pageset. That gets us back to where we
started - too much memory in percpu page lists. Also, creating a file
page already bounces a lot of cachelines around.
>
> Maybe there could be a per-cpu-pages miss rate that's required to occur
> before the lists are even populated. That would probably account better
> for cases where nodes are disproportionately populated with memory.
> This, along with the occasional flushing of the pages back into the
> general allocator if the miss rate isn't satisfied should give some good
> self-tuning behavior.
Makes sense.
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2005-04-08 2:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-07 21:11 Excessive memory trapped in pageset lists Jack Steiner
2005-04-08 1:24 ` Dave Hansen
2005-04-08 2:34 ` Jack Steiner [this message]
2005-04-08 5:18 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050408023436.GA1927@sgi.com \
--to=steiner@sgi.com \
--cc=clameter@sgi.com \
--cc=haveblue@us.ibm.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.