From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, ak@suse.de, mtk-manpages@gmx.net,
clameter@sgi.com, solo@google.com,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
eric.whitney@hp.com
Subject: [PATCH/RFC 2/5] Mem Policy: Use MPOL_PREFERRED for system-wide default policy
Date: Thu, 30 Aug 2007 14:51:07 -0400 [thread overview]
Message-ID: <20070830185107.22619.43577.sendpatchset@localhost> (raw)
In-Reply-To: <20070830185053.22619.96398.sendpatchset@localhost>
PATCH/RFC 2/5 Use MPOL_PREFERRED for system-wide default policy
Against: 2.6.23-rc3-mm1
V1 -> V2:
+ restore BUG()s in switch(policy) default cases -- per
Christoph
+ eliminate unneeded re-init of struct mempolicy policy member
before freeing
Currently, when one specifies MPOL_DEFAULT via a NUMA memory
policy API [set_mempolicy(), mbind() and internal versions],
the kernel simply installs a NULL struct mempolicy pointer in
the appropriate context: task policy, vma policy, or shared
policy. This causes any use of that policy to "fall back" to
the next most specific policy scope. The only use of MPOL_DEFAULT
to mean "local allocation" is in the system default policy.
There is another, "preferred" way to specify local allocation via
the APIs. That is using the MPOL_PREFERRED policy mode with an
empty nodemask. Internally, the empty nodemask gets converted to
a preferred_node id of '-1'. All internal usage of MPOL_PREFERRED
will convert the '-1' to the id of the node local to the cpu
where the allocation occurs.
System default policy, except during boot, is hard-coded to
"local allocation". By using the MPOL_PREFERRED mode with a
negative value of preferred node for system default policy,
MPOL_DEFAULT will never occur in the 'policy' member of a
struct mempolicy. Thus, we can remove all checks for
MPOL_DEFAULT when converting policy to a node id/zonelist in
the allocation paths.
In slab_node() return local node id when policy pointer is NULL.
No need to set a pol value to take the switch default. Replace
switch default with BUG()--i.e., shouldn't happen.
With this patch MPOL_DEFAULT is only used in the APIs, including
internal calls to do_set_mempolicy() and in the display of policy
in /proc/<pid>/numa_maps. It always means "fall back" to the the
next most specific policy scope. This simplifies the description
of memory policies quite a bit, with no visible change in behavior.
This patch updates Documentation to reflect this change.
Tested with set_mempolicy() using numactl with memtoy, and
tested mbind() with memtoy. All seems to work "as expected".
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Documentation/vm/numa_memory_policy.txt | 70 ++++++++++++--------------------
mm/mempolicy.c | 31 ++++++--------
2 files changed, 41 insertions(+), 60 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2007-08-29 11:43:06.000000000 -0400
+++ Linux/mm/mempolicy.c 2007-08-29 11:44:03.000000000 -0400
@@ -105,9 +105,13 @@ static struct kmem_cache *sn_cache;
policied. */
enum zone_type policy_zone = 0;
+/*
+ * run-time system-wide default policy => local allocation
+ */
struct mempolicy default_policy = {
.refcnt = ATOMIC_INIT(1), /* never free it */
- .policy = MPOL_DEFAULT,
+ .policy = MPOL_PREFERRED,
+ .v = { .preferred_node = -1 },
};
static void mpol_rebind_policy(struct mempolicy *pol,
@@ -180,7 +184,8 @@ static struct mempolicy *mpol_new(int mo
mode, nodes ? nodes_addr(*nodes)[0] : -1);
if (mode == MPOL_DEFAULT)
- return NULL;
+ return NULL; /* simply delete any existing policy */
+
policy = kmem_cache_alloc(policy_cache, GFP_KERNEL);
if (!policy)
return ERR_PTR(-ENOMEM);
@@ -493,8 +498,6 @@ static void get_zonemask(struct mempolic
node_set(zone_to_nid(p->v.zonelist->zones[i]),
*nodes);
break;
- case MPOL_DEFAULT:
- break;
case MPOL_INTERLEAVE:
*nodes = p->v.nodes;
break;
@@ -1106,8 +1109,7 @@ static struct mempolicy * get_vma_policy
if (vma->vm_ops && vma->vm_ops->get_policy) {
pol = vma->vm_ops->get_policy(vma, addr);
shared_pol = 1; /* if pol non-NULL, that is */
- } else if (vma->vm_policy &&
- vma->vm_policy->policy != MPOL_DEFAULT)
+ } else if (vma->vm_policy)
pol = vma->vm_policy;
}
if (!pol)
@@ -1136,7 +1138,6 @@ static struct zonelist *zonelist_policy(
return policy->v.zonelist;
/*FALL THROUGH*/
case MPOL_INTERLEAVE: /* should not happen */
- case MPOL_DEFAULT:
nd = numa_node_id();
break;
default:
@@ -1166,9 +1167,10 @@ static unsigned interleave_nodes(struct
*/
unsigned slab_node(struct mempolicy *policy)
{
- int pol = policy ? policy->policy : MPOL_DEFAULT;
+ if (!policy)
+ return numa_node_id();
- switch (pol) {
+ switch (policy->policy) {
case MPOL_INTERLEAVE:
return interleave_nodes(policy);
@@ -1182,10 +1184,10 @@ unsigned slab_node(struct mempolicy *pol
case MPOL_PREFERRED:
if (policy->v.preferred_node >= 0)
return policy->v.preferred_node;
- /* Fall through */
+ return numa_node_id();
default:
- return numa_node_id();
+ BUG();
}
}
@@ -1410,8 +1412,6 @@ int __mpol_equal(struct mempolicy *a, st
if (a->policy != b->policy)
return 0;
switch (a->policy) {
- case MPOL_DEFAULT:
- return 1;
case MPOL_INTERLEAVE:
return nodes_equal(a->v.nodes, b->v.nodes);
case MPOL_PREFERRED:
@@ -1436,7 +1436,6 @@ void __mpol_free(struct mempolicy *p)
return;
if (p->policy == MPOL_BIND)
kfree(p->v.zonelist);
- p->policy = MPOL_DEFAULT;
kmem_cache_free(policy_cache, p);
}
@@ -1603,7 +1602,7 @@ void mpol_shared_policy_init(struct shar
if (policy != MPOL_DEFAULT) {
struct mempolicy *newpol;
- /* Falls back to MPOL_DEFAULT on any error */
+ /* Falls back to NULL policy [MPOL_DEFAULT] on any error */
newpol = mpol_new(policy, policy_nodes);
if (!IS_ERR(newpol)) {
/* Create pseudo-vma that contains just the policy */
@@ -1724,8 +1723,6 @@ static void mpol_rebind_policy(struct me
return;
switch (pol->policy) {
- case MPOL_DEFAULT:
- break;
case MPOL_INTERLEAVE:
nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask);
pol->v.nodes = tmp;
Index: Linux/Documentation/vm/numa_memory_policy.txt
===================================================================
--- Linux.orig/Documentation/vm/numa_memory_policy.txt 2007-08-29 11:23:56.000000000 -0400
+++ Linux/Documentation/vm/numa_memory_policy.txt 2007-08-29 11:43:10.000000000 -0400
@@ -149,63 +149,47 @@ Components of Memory Policies
Linux memory policy supports the following 4 behavioral modes:
- Default Mode--MPOL_DEFAULT: The behavior specified by this mode is
- context or scope dependent.
+ Default Mode--MPOL_DEFAULT: This mode is only used in the memory
+ policy APIs. Internally, MPOL_DEFAULT is converted to the NULL
+ memory policy in all policy scopes. Any existing non-default policy
+ will simply be removed when MPOL_DEFAULT is specified. As a result,
+ MPOL_DEFAULT means "fall back to the next most specific policy scope."
+
+ For example, a NULL or default task policy will fall back to the
+ system default policy. A NULL or default vma policy will fall
+ back to the task policy.
- As mentioned in the Policy Scope section above, during normal
- system operation, the System Default Policy is hard coded to
- contain the Default mode.
-
- In this context, default mode means "local" allocation--that is
- attempt to allocate the page from the node associated with the cpu
- where the fault occurs. If the "local" node has no memory, or the
- node's memory can be exhausted [no free pages available], local
- allocation will "fallback to"--attempt to allocate pages from--
- "nearby" nodes, in order of increasing "distance".
-
- Implementation detail -- subject to change: "Fallback" uses
- a per node list of sibling nodes--called zonelists--built at
- boot time, or when nodes or memory are added or removed from
- the system [memory hotplug]. These per node zonelist are
- constructed with nodes in order of increasing distance based
- on information provided by the platform firmware.
-
- When a task/process policy or a shared policy contains the Default
- mode, this also means "local allocation", as described above.
-
- In the context of a VMA, Default mode means "fall back to task
- policy"--which may or may not specify Default mode. Thus, Default
- mode can not be counted on to mean local allocation when used
- on a non-shared region of the address space. However, see
- MPOL_PREFERRED below.
-
- The Default mode does not use the optional set of nodes.
+ When specified in one of the memory policy APIs, the Default mode
+ does not use the optional set of nodes.
MPOL_BIND: This mode specifies that memory must come from the
set of nodes specified by the policy.
The memory policy APIs do not specify an order in which the nodes
- will be searched. However, unlike "local allocation", the Bind
- policy does not consider the distance between the nodes. Rather,
- allocations will fallback to the nodes specified by the policy in
- order of numeric node id. Like everything in Linux, this is subject
- to change.
+ will be searched. However, unlike "local allocation" discussed
+ below, the Bind policy does not consider the distance between the
+ nodes. Rather, allocations will fallback to the nodes specified
+ by the policy in order of numeric node id. Like everything in
+ Linux, this is subject to change.
MPOL_PREFERRED: This mode specifies that the allocation should be
attempted from the single node specified in the policy. If that
- allocation fails, the kernel will search other nodes, exactly as
- it would for a local allocation that started at the preferred node
- in increasing distance from the preferred node. "Local" allocation
- policy can be viewed as a Preferred policy that starts at the node
- containing the cpu where the allocation takes place.
+ allocation fails, the kernel will search other nodes, in order of
+ increasing distance from the preferred node based on information
+ provided by the platform firmware.
Internally, the Preferred policy uses a single node--the
preferred_node member of struct mempolicy. A "distinguished
value of this preferred_node, currently '-1', is interpreted
as "the node containing the cpu where the allocation takes
- place"--local allocation. This is the way to specify
- local allocation for a specific range of addresses--i.e. for
- VMA policies.
+ place"--local allocation. "Local" allocation policy can be
+ viewed as a Preferred policy that starts at the node containing
+ the cpu where the allocation takes place.
+
+ As mentioned in the Policy Scope section above, during normal
+ system operation, the System Default Policy is hard coded to
+ specify "local allocation". This policy uses the Preferred
+ policy with the special negative value of preferred_node.
MPOL_INTERLEAVED: This mode specifies that page allocations be
interleaved, on a page granularity, across the nodes specified in
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-08-30 18:51 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-30 18:50 [PATCH/RFC 0/5] Memory Policy Cleanups and Enhancements Lee Schermerhorn
2007-08-30 18:51 ` [PATCH/RFC 1/5] Mem Policy: fix reference counting Lee Schermerhorn
2007-09-11 18:48 ` Mel Gorman
2007-09-11 18:12 ` Lee Schermerhorn
2007-09-13 9:45 ` Mel Gorman
2007-08-30 18:51 ` Lee Schermerhorn [this message]
2007-09-11 18:54 ` [PATCH/RFC 2/5] Mem Policy: Use MPOL_PREFERRED for system-wide default policy Mel Gorman
2007-09-11 18:22 ` Lee Schermerhorn
2007-09-13 9:48 ` Mel Gorman
2007-08-30 18:51 ` [PATCH/RFC 3/5] Mem Policy: MPOL_PREFERRED fixups for "local allocation" Lee Schermerhorn
2007-09-11 18:58 ` Mel Gorman
2007-09-11 18:34 ` Lee Schermerhorn
2007-09-12 22:10 ` Christoph Lameter
2007-09-13 13:51 ` Lee Schermerhorn
2007-09-13 18:18 ` Christoph Lameter
2007-09-13 9:55 ` Mel Gorman
2007-09-12 22:06 ` Christoph Lameter
2007-09-13 13:35 ` Lee Schermerhorn
2007-09-13 18:21 ` Christoph Lameter
2007-08-30 18:51 ` [PATCH/RFC 4/5] Mem Policy: cpuset-independent interleave policy Lee Schermerhorn
2007-09-12 21:20 ` Ethan Solomita
2007-09-12 22:14 ` Christoph Lameter
2007-09-13 13:26 ` Lee Schermerhorn
2007-09-13 17:17 ` Ethan Solomita
2007-09-12 21:59 ` Ethan Solomita
2007-09-13 13:32 ` Lee Schermerhorn
2007-09-13 17:19 ` Ethan Solomita
2007-09-13 18:20 ` Christoph Lameter
2007-10-09 6:15 ` Ethan Solomita
2007-10-09 13:39 ` Lee Schermerhorn
2007-10-09 18:49 ` Christoph Lameter
2007-10-09 19:02 ` Lee Schermerhorn
2007-08-30 18:51 ` [PATCH/RFC 5/5] Mem Policy: add MPOL_F_MEMS_ALLOWED get_mempolicy() flag Lee Schermerhorn
2007-09-11 19:07 ` Mel Gorman
2007-09-11 18:42 ` Lee Schermerhorn
2007-09-12 22:14 ` Christoph Lameter
2007-09-14 20:24 ` [PATCH] " Lee Schermerhorn
2007-09-14 20:27 ` Christoph Lameter
2007-09-11 16:20 ` [PATCH/RFC 0/5] Memory Policy Cleanups and Enhancements Lee Schermerhorn
2007-09-11 19:12 ` Mel Gorman
2007-09-11 18:45 ` Lee Schermerhorn
2007-09-12 22:17 ` Christoph Lameter
2007-09-13 13:57 ` Lee Schermerhorn
2007-09-13 15:31 ` Mel Gorman
2007-09-13 15:01 ` Lee Schermerhorn
2007-09-13 18:55 ` Mel Gorman
2007-09-13 18:19 ` Christoph Lameter
2007-09-13 18:23 ` Mel Gorman
2007-09-13 18:26 ` Christoph Lameter
2007-09-13 21:17 ` Andrew Morton
2007-09-14 2:20 ` Christoph Lameter
2007-09-14 8:53 ` Mel Gorman
2007-09-14 15:06 ` Lee Schermerhorn
2007-09-14 17:46 ` Mel Gorman
2007-09-14 18:41 ` Christoph Lameter
2007-09-16 18:02 ` Mel Gorman
2007-09-17 18:12 ` Christoph Lameter
2007-09-17 18:19 ` Christoph Lameter
2007-09-17 20:14 ` Mel Gorman
2007-09-17 19:16 ` Christoph Lameter
2007-09-17 20:03 ` Mel Gorman
2007-09-14 20:15 ` Lee Schermerhorn
2007-09-16 18:05 ` Mel Gorman
2007-09-16 19:34 ` Andrew Morton
2007-09-16 21:22 ` Mel Gorman
2007-09-17 13:29 ` Lee Schermerhorn
2007-09-17 18:14 ` Christoph Lameter
2007-09-13 15:49 ` Lee Schermerhorn
2007-09-13 18:22 ` Christoph Lameter
2007-09-17 19:00 ` [PATCH] Fix NUMA Memory Policy Reference Counting Lee Schermerhorn
2007-09-17 19:14 ` Christoph Lameter
2007-09-17 19:38 ` Lee Schermerhorn
2007-09-17 19:43 ` Christoph Lameter
2007-09-19 22:03 ` Lee Schermerhorn
2007-09-19 22:23 ` Christoph Lameter
2007-09-18 10:36 ` Mel Gorman
2007-09-17 19:32 ` [PATCH] 2.6.23-rc6: " Lee Schermerhorn
2007-09-17 19:37 ` Christoph Lameter
2007-09-17 20:19 ` Lee Schermerhorn
2007-09-17 21:23 ` Christoph Lameter
2007-09-17 22:25 ` Andi Kleen
2007-09-18 19:30 ` Christoph Lameter
2007-09-17 22:28 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070830185107.22619.43577.sendpatchset@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=linux-mm@kvack.org \
--cc=mtk-manpages@gmx.net \
--cc=solo@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.