From: Paul Jackson <pj@sgi.com>
To: David Rientjes <rientjes@cs.washington.edu>
Cc: linux-mm@kvack.org, akpm@osdl.org, nickpiggin@yahoo.com.au,
ak@suse.de, mbligh@google.com, rohitseth@google.com,
menage@google.com, clameter@sgi.com
Subject: Re: [RFC] another way to speed up fake numa node page_alloc
Date: Wed, 4 Oct 2006 19:27:14 -0700 [thread overview]
Message-ID: <20061004192714.20412e08.pj@sgi.com> (raw)
In-Reply-To: <Pine.LNX.4.64N.0610041456480.19080@attu2.cs.washington.edu>
> Isn't this the exact behavior that ordered zonelists are supposed to solve
> for real NUMA systems? Has there been an _observed_ case where the cost
> to scan the zonelists was considered excessive on real NUMA systems?
Well ... the good news is I understood your comments this time.
I guess I should be happy it only took about 3 iterations.
Historically the ordered zonelists addressed the situation where one
almost always found free memory near the front of the ordered zonelist.
Yes, you are correct that I originally didn't think we had a problem
with real numa zonelist scans.
Three days ago, when I introduced this alternative patch that started
this current thread, I changed my position, stating at that time:
>
> There are two reasons I persued this alternative:
>
> 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems)
> have seen real customer loads where the cost to scan the zonelist
> was a problem, due to many nodes being full of memory before
> we got to a node we could use. Or at least, I think we have.
> This was related to me by another engineer, based on experiences
> from some time past. So this is not guaranteed. Most likely, though.
>
> The following approach should help such real numa systems just as
> much as it helps fake numa systems, or any combination thereof.
>
> 2) The effort to distinguish fake from real numa, using node_distance,
> so that we could cache a fake numa node and optimize choosing
> it over equivalent distance fake nodes, while continuing to
> properly scan all real nodes in distance order, was going to
> require a nasty blob of zonelist and node distance munging.
>
> The following approach has no new dependency on node distances or
> zone sorting.
David wrote:
> I was under the impression that there was nothing wrong with the way
> current real NUMA systems allocate pages. If not, please point me to the
> thread that _specifically_ discusses this with _data_ that shows it's
> inefficient.
See above. I don't have data, so cannot justify going far out of our
way.
If someone has a better way to skin this fake numa cat, that does not
benefit (or harm) real numa, that would still be worth careful
consideration.
> In fact, when this thread started you recommended as little
> changes as possible to the code to not interfere with what already works.
Yes, I did start with that recommendation. See above.
And see above for my current reasons for persuing this patch.
Some more things I like about this patch:
* Conceptually, it is very localized, making no changes to the
larger code or data structure, just adding a cache of some
hot data.
* Further, it makes few assumptions about the larger scheme of
things.
* It has no dependencies on zonelist sorting, node distances,
fake vs real numa nodes or any of that.
* It makes no discernable difference in the memory placement
behaviour of a system.
Downside - it's still a linear zonelist scan, and it's a cache bolted on
the side of things, rather than an inherently fast algorithm and data
structure.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-10-05 2:27 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-25 9:14 [RFC] another way to speed up fake numa node page_alloc Paul Jackson
2006-09-26 6:08 ` David Rientjes
2006-09-26 7:06 ` Paul Jackson
2006-09-26 18:17 ` David Rientjes
2006-09-26 19:24 ` Paul Jackson
2006-09-26 19:58 ` David Rientjes
2006-09-26 21:48 ` Paul Jackson
2006-10-02 6:18 ` Paul Jackson
2006-10-02 6:31 ` David Rientjes
2006-10-02 6:48 ` Paul Jackson
2006-10-02 7:05 ` David Rientjes
2006-10-02 8:41 ` Paul Jackson
2006-10-03 18:15 ` Paul Jackson
2006-10-03 19:37 ` David Rientjes
2006-10-04 15:45 ` Paul Jackson
2006-10-04 16:11 ` Christoph Lameter
2006-10-04 22:10 ` David Rientjes
2006-10-05 2:27 ` Paul Jackson [this message]
2006-10-05 2:37 ` David Rientjes
2006-10-05 2:53 ` Paul Jackson
2006-10-05 3:00 ` David Rientjes
2006-10-05 3:26 ` Paul Jackson
2006-10-05 3:49 ` David Rientjes
2006-10-05 4:07 ` Andrew Morton
2006-10-05 4:14 ` Paul Jackson
2006-10-05 4:50 ` David Rientjes
2006-10-05 4:53 ` Paul Jackson
2006-10-11 3:42 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061004192714.20412e08.pj@sgi.com \
--to=pj@sgi.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=clameter@sgi.com \
--cc=linux-mm@kvack.org \
--cc=mbligh@google.com \
--cc=menage@google.com \
--cc=nickpiggin@yahoo.com.au \
--cc=rientjes@cs.washington.edu \
--cc=rohitseth@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.