From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: ak@suse.de, gregkh@suse.de, linux-mm@kvack.org, mel@skynet.ie,
eric.whitney@hp.com, Christoph Lameter <clameter@sgi.com>
Subject: [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2
Date: Thu, 11 Oct 2007 15:07:08 -0400 [thread overview]
Message-ID: <1192129628.5036.23.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.64.0710101415470.32488@schroedinger.engr.sgi.com>
Here's V2 of the second patch of the series, modified as suggested by
Christoph vis a vis include of <linux/mm.h> in mempolicy.h. I did not
change the hugetlbfs to add *_policy() ops as I think the current
mechanisms work as required now. I.e., even if I added the policy
vm_ops to hugetlbfs, I think the changes below to mempolicy and
shm_get_policy() would still be required in the general case.
I have removed the 'RFC'. Please review for possible merge.
Lee
====
PATCH 2/2 Mempolicy: Fixup Shm and Interleave Policy Reference Counting
Against: 2.6.23-rc8-mm2
V1 -> V2:
+ remove include of <linux/mm.h> from mempolicy.h and use
BUG_ON(), conditional on CONFIG_DEBUG_VM, in mpol_get()
In the memory policy reference counting cleanup patch, I missed
one path that needs to unreference the memory policy. After
computing the target node for interleave policy, we need to
drop the reference if the policy is not the system default nor
the current task's policy.
In huge_zonelist(), I was unconditionally unref'ing the policy
in the interleave path, even when it was a policy that didn't
need it. Fix this!
Note: I investigated moving the check for "policy_needs_unref"
to the mpol_free() wrapper, but this led to nasty circular header
dependencies. If we wanted to make mpol_free() an external
function, rather than a static inline, I could do this and
remove several checks. I'd still need to keep an explicit
check in alloc_page_vma() if we want to use a tail-call for
the fast path.
get_vma_policy() assumes that shared policies are referenced by
the get_policy() vm_op, if any. This is true for shmem_get_policy()
but not for shm_get_policy() when the "backing file" does not
support a get_policy() vm_op. The latter is the case for SHM_HUGETLB
segments. Because get_vma_policy() expects the get_policy() op to
have added a ref, it doesn't do so itself. This results in
premature freeing of the policy. Add the mpol_get() to the
shm_get_policy() op when the backing file doesn't support shared
policies.
Further, shm_get_policy() was falling back to current task's task
policy if the backing file did not support get_policy() vm_op and
the vma policy was null. This is not valid when get_vma_policy() is
called from show_numa_map() as task != current. Also, this did
not match the behavior of the shmem_get_policy() vm_op which did
NOT fall back to task policy. So, modify shm_get_policy() NOT to
fall back to current->mempolicy.
Now, turns out that get_vma_policy() was not handling fallback to
task policy correctly when the get_policy() vm_op returns NULL.
Rather, it was falling back directly to system default policy.
So, fix get_vma_policy() to use only non-NULL policy returned from
the vma get_policy op and indicate that this policy does not need
another ref count.
Add [VM_]BUG_ON()s to __mpol_free() and mpol_get() to trap attempts
to ref/unref the system default policy. Should no longer occur.
Document mempolicy return value reference semantics assumed by
the changes discussed above for the set_ and get_policy vm_ops
in <linux/mm.h>--where the prototypes are defined.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/mempolicy.h | 6 +++++-
include/linux/mm.h | 14 ++++++++++++++
ipc/shm.c | 6 +++---
mm/mempolicy.c | 32 ++++++++++++++++++++++++--------
4 files changed, 46 insertions(+), 12 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2007-10-10 14:58:12.000000000 -0400
+++ Linux/mm/mempolicy.c 2007-10-11 14:07:42.000000000 -0400
@@ -1112,19 +1112,25 @@ static struct mempolicy * get_vma_policy
struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol = task->mempolicy;
- int shared_pol = 0;
+ int pol_needs_ref = (task != current);
if (vma) {
if (vma->vm_ops && vma->vm_ops->get_policy) {
- pol = vma->vm_ops->get_policy(vma, addr);
- shared_pol = 1; /* if pol non-NULL, add ref below */
+ struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
+ addr);
+ if (vpol) {
+ pol = vpol;
+ pol_needs_ref = 0; /* get_policy() added ref */
+ }
} else if (vma->vm_policy &&
- vma->vm_policy->policy != MPOL_DEFAULT)
+ vma->vm_policy->policy != MPOL_DEFAULT) {
pol = vma->vm_policy;
+ pol_needs_ref++;
+ }
}
if (!pol)
pol = &default_policy;
- else if (!shared_pol && pol != current->mempolicy)
+ else if (pol_needs_ref)
mpol_get(pol); /* vma or other task's policy */
return pol;
}
@@ -1262,18 +1268,21 @@ struct zonelist *huge_zonelist(struct vm
{
struct mempolicy *pol = get_vma_policy(current, vma, addr);
struct zonelist *zl;
+ int policy_needs_unref = (pol != &default_policy && \
+ pol != current->mempolicy);
*mpol = NULL; /* probably no unref needed */
if (pol->policy == MPOL_INTERLEAVE) {
unsigned nid;
nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
- __mpol_free(pol); /* finished with pol */
+ if (unlikely(policy_needs_unref))
+ __mpol_free(pol); /* finished with pol */
return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags);
}
zl = zonelist_policy(GFP_HIGHUSER, pol);
- if (unlikely(pol != &default_policy && pol != current->mempolicy)) {
+ if (unlikely(policy_needs_unref)) {
if (pol->policy != MPOL_BIND)
__mpol_free(pol); /* finished with pol */
else
@@ -1325,6 +1334,9 @@ alloc_page_vma(gfp_t gfp, struct vm_area
{
struct mempolicy *pol = get_vma_policy(current, vma, addr);
struct zonelist *zl;
+ int policy_needs_unref = (pol != &default_policy && \
+ pol != current->mempolicy);
+
cpuset_update_task_memory_state();
@@ -1332,10 +1344,12 @@ alloc_page_vma(gfp_t gfp, struct vm_area
unsigned nid;
nid = interleave_nid(pol, vma, addr, PAGE_SHIFT);
+ if (unlikely(policy_needs_unref))
+ __mpol_free(pol);
return alloc_page_interleave(gfp, 0, nid);
}
zl = zonelist_policy(gfp, pol);
- if (pol != &default_policy && pol != current->mempolicy) {
+ if (unlikely(policy_needs_unref)) {
/*
* slow path: ref counted policy -- shared or vma
*/
@@ -1444,6 +1458,8 @@ int __mpol_equal(struct mempolicy *a, st
/* Slow path of a mpol destructor. */
void __mpol_free(struct mempolicy *p)
{
+ VM_BUG_ON(p == &default_policy);
+
if (!atomic_dec_and_test(&p->refcnt))
return;
if (p->policy == MPOL_BIND)
Index: Linux/ipc/shm.c
===================================================================
--- Linux.orig/ipc/shm.c 2007-10-10 14:58:12.000000000 -0400
+++ Linux/ipc/shm.c 2007-10-10 14:59:13.000000000 -0400
@@ -263,10 +263,10 @@ static struct mempolicy *shm_get_policy(
if (sfd->vm_ops->get_policy)
pol = sfd->vm_ops->get_policy(vma, addr);
- else if (vma->vm_policy)
+ else if (vma->vm_policy) {
pol = vma->vm_policy;
- else
- pol = current->mempolicy;
+ mpol_get(pol); /* get_vma_policy() assumes this */
+ }
return pol;
}
#endif
Index: Linux/include/linux/mempolicy.h
===================================================================
--- Linux.orig/include/linux/mempolicy.h 2007-10-10 14:58:12.000000000 -0400
+++ Linux/include/linux/mempolicy.h 2007-10-11 14:09:19.000000000 -0400
@@ -72,6 +72,8 @@ struct mempolicy {
nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */
};
+extern struct mempolicy default_policy;
+
/*
* Support for managing mempolicy data objects (clone, copy, destroy)
* The default fast path of a NULL MPOL_DEFAULT policy is always inlined.
@@ -97,6 +99,9 @@ static inline struct mempolicy *mpol_cop
static inline void mpol_get(struct mempolicy *pol)
{
+#ifdef CONFIG_DEBUG_VM
+ BUG_ON(pol == &default_policy);
+#endif
if (pol)
atomic_inc(&pol->refcnt);
}
@@ -149,7 +154,6 @@ extern void mpol_rebind_task(struct task
extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new);
extern void mpol_fix_fork_child_flag(struct task_struct *p);
-extern struct mempolicy default_policy;
extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol);
extern unsigned slab_node(struct mempolicy *policy);
Index: Linux/include/linux/mm.h
===================================================================
--- Linux.orig/include/linux/mm.h 2007-10-10 14:58:12.000000000 -0400
+++ Linux/include/linux/mm.h 2007-10-11 14:07:43.000000000 -0400
@@ -173,7 +173,21 @@ struct vm_operations_struct {
* writable, if an error is returned it will cause a SIGBUS */
int (*page_mkwrite)(struct vm_area_struct *vma, struct page *page);
#ifdef CONFIG_NUMA
+ /*
+ * set_policy() op must add a reference to any non-NULL @new mempolicy
+ * to hold the policy upon return. Caller should pass NULL @new to
+ * remove a policy and fall back to surrounding context--i.e. do not
+ * install a MPOL_DEFAULT policy, nor the task or system default
+ * mempolicy.
+ */
int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new);
+
+ /*
+ * get_policy() op must add reference [mpol_get()] to any mempolicy
+ * at (vma,addr). If no [shared/vma] mempolicy exists at that addr,
+ * get_policy() op must return NULL--i.e., do not "fallback" to task
+ * or system default policy.
+ */
struct mempolicy *(*get_policy)(struct vm_area_struct *vma,
unsigned long addr);
int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-10-11 19:07 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-10 20:58 [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes Lee Schermerhorn
2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn
2007-10-10 21:12 ` Christoph Lameter
2007-10-11 19:11 ` [PATCH " Lee Schermerhorn
2007-10-12 1:43 ` Christoph Lameter
2007-11-06 18:09 ` [PATCH ] " Lee Schermerhorn
2007-10-15 11:19 ` [PATCH 1/2] " Mel Gorman
2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn
2007-10-10 21:22 ` Christoph Lameter
2007-10-11 13:41 ` Lee Schermerhorn
2007-10-11 19:07 ` Lee Schermerhorn [this message]
2007-10-12 1:42 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Christoph Lameter
2007-10-12 14:35 ` Lee Schermerhorn
2007-10-12 17:27 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1192129628.5036.23.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=clameter@sgi.com \
--cc=eric.whitney@hp.com \
--cc=gregkh@suse.de \
--cc=linux-mm@kvack.org \
--cc=mel@skynet.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).