* [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes
@ 2007-10-10 20:58 Lee Schermerhorn
2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn
2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn
0 siblings, 2 replies; 14+ messages in thread
From: Lee Schermerhorn @ 2007-10-10 20:58 UTC (permalink / raw)
To: akpm; +Cc: ak, clameter, gregkh, linux-mm, mel, eric.whitney
PATCH 0/2 Memory Policy: More Reference Counting Fixes
While testing huge pages using shm segments created with the SHM_HUGETLB flag,
I came across additional problems with memory policy reference counting. While
tracking these down, I found even more... Some of the problems were introduced
by myself in my previous mempolicy ref counting patch, and some were paths that
I missed in my inspection and testing.
These 2 patches are both based against 2.6.23-rc8-mm2, but should also probably
go to the .23-stable tree when it opens. They have been tested on ia64 and
x86_64 platforms.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver
2007-10-10 20:58 [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes Lee Schermerhorn
@ 2007-10-10 20:58 ` Lee Schermerhorn
2007-10-10 21:12 ` Christoph Lameter
2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn
1 sibling, 1 reply; 14+ messages in thread
From: Lee Schermerhorn @ 2007-10-10 20:58 UTC (permalink / raw)
To: akpm; +Cc: ak, clameter, gregkh, linux-mm, mel, eric.whitney
PATCH 1/2 Fix memory policy usage in pci driver
Against: 2.6.23-rc8-mm2
In an attempt to ensure memory allocation from the proper node,
the pci driver temporarily replaces the current task's memory
policy with the system default policy. Trying to be a good
citizen, the driver then call's mpol_get() on the new policy.
When it's finished probing, it undoes the '_get by calling
mpol_free() [on the system default policy] and then restores
the current task's saved mempolicy.
A couple of issues here:
1) it's never necessary to set a task's mempolicy to the
system default policy in order to get system default
allocation behavior. Simply set the current task's
mempolicy to NULL and allocations will fall back to
system default policy.
2) we should never [need to] call mpol_free() on the system
default policy. [I plan on trapping this with a VM_BUG_ON()
in a subsequent patch.]
This patch removes the calls to mpol_get() and mpol_free()
and uses NULL for the temporary task mempolicy to effect
default allocation behavior.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
drivers/pci/pci-driver.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Index: Linux/drivers/pci/pci-driver.c
===================================================================
--- Linux.orig/drivers/pci/pci-driver.c 2007-10-09 14:31:57.000000000 -0400
+++ Linux/drivers/pci/pci-driver.c 2007-10-09 14:43:57.000000000 -0400
@@ -177,13 +177,11 @@ static int pci_call_probe(struct pci_dri
set_cpus_allowed(current, node_to_cpumask(node));
/* And set default memory allocation policy */
oldpol = current->mempolicy;
- current->mempolicy = &default_policy;
- mpol_get(current->mempolicy);
+ current->mempolicy = NULL; /* fall back to system default policy */
#endif
error = drv->probe(dev, id);
#ifdef CONFIG_NUMA
set_cpus_allowed(current, oldmask);
- mpol_free(current->mempolicy);
current->mempolicy = oldpol;
#endif
return error;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting
2007-10-10 20:58 [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes Lee Schermerhorn
2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn
@ 2007-10-10 20:58 ` Lee Schermerhorn
2007-10-10 21:22 ` Christoph Lameter
1 sibling, 1 reply; 14+ messages in thread
From: Lee Schermerhorn @ 2007-10-10 20:58 UTC (permalink / raw)
To: akpm; +Cc: ak, clameter, gregkh, linux-mm, mel, eric.whitney
PATCH 2/2 Mempolicy: Fixup Shm and Interleave Policy Reference Counting
Against: 2.6.23-rc8-mm2
* In the memory policy reference counting cleanup patch, I
missed one path that needs to unreference the memory policy.
After computing the target node for interleave policy, we need
to drop the reference if the policy is not the system default
nor the current task's policy.
* In huge_zonelist(), I was unconditionally unref'ing the policy
in the interleave path, even when it was a policy that didn't
need it. Fix this!
Note: I investigated moving the check for "policy_needs_unref"
to the mpol_free() wrapper, but this led to nasty circular header
dependencies. If we wanted to make mpol_free() an external
function, rather than a static inline, I could do this and
remove several checks. I'd still need to keep an explicit
check in alloc_page_vma() if we want to use a tail-call for
the fast path.
* get_vma_policy() assumes that shared policies are referenced by
the get_policy() vm_op, if any. This is true for shmem_get_policy()
but not for shm_get_policy() when the "backing file" does not
support a get_policy() vm_op. The latter is the case for SHM_HUGETLB
segments. Because get_vma_policy() expects the get_policy() op to
have added a ref, it doesn't do so itself. This results in
premature freeing of the policy. Add the mpol_get() to the
shm_get_policy() op when the backing file doesn't support shared
policies.
* Further, shm_get_policy() was falling back to current task's task
policy if the backing file did not support get_policy() vm_op and
the vma policy was null. This is not valid when get_vma_policy() is
called from show_numa_map() as task != current. Also, this did
not match the behavior of the shmem_get_policy() vm_op which did
NOT fall back to task policy. So, modify shm_get_policy() NOT to
fall back to current->mempolicy.
* Now, turns out that get_vma_policy() was not handling fallback to
task policy correctly when the get_policy() vm_op returns NULL.
Rather, it was falling back directly to system default policy.
So, fix get_vma_policy() to use only non-NULL policy returned from
the vma get_policy op and indicate that this policy does not need
another ref count.
* Document mempolicy return value reference semantics assumed by
the changes discussed above for the set_ and get_policy vm_ops
in <linux/mm.h>--where the prototypes are defined.
* Add VM_BUG_ON()s to __mpol_free() and mpol_get() to trap attempts
to ref/unref the system default policy. Should no longer occur.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/mempolicy.h | 5 ++++-
include/linux/mm.h | 14 ++++++++++++++
ipc/shm.c | 6 +++---
mm/mempolicy.c | 32 ++++++++++++++++++++++++--------
4 files changed, 45 insertions(+), 12 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2007-10-10 13:36:44.000000000 -0400
+++ Linux/mm/mempolicy.c 2007-10-10 14:25:52.000000000 -0400
@@ -1112,19 +1112,25 @@ static struct mempolicy * get_vma_policy
struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol = task->mempolicy;
- int shared_pol = 0;
+ int pol_needs_ref = (task != current);
if (vma) {
if (vma->vm_ops && vma->vm_ops->get_policy) {
- pol = vma->vm_ops->get_policy(vma, addr);
- shared_pol = 1; /* if pol non-NULL, add ref below */
+ struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
+ addr);
+ if (vpol) {
+ pol = vpol;
+ pol_needs_ref = 0; /* get_policy() added ref */
+ }
} else if (vma->vm_policy &&
- vma->vm_policy->policy != MPOL_DEFAULT)
+ vma->vm_policy->policy != MPOL_DEFAULT) {
pol = vma->vm_policy;
+ pol_needs_ref++;
+ }
}
if (!pol)
pol = &default_policy;
- else if (!shared_pol && pol != current->mempolicy)
+ else if (pol_needs_ref)
mpol_get(pol); /* vma or other task's policy */
return pol;
}
@@ -1262,18 +1268,21 @@ struct zonelist *huge_zonelist(struct vm
{
struct mempolicy *pol = get_vma_policy(current, vma, addr);
struct zonelist *zl;
+ int policy_needs_unref = (pol != &default_policy && \
+ pol != current->mempolicy);
*mpol = NULL; /* probably no unref needed */
if (pol->policy == MPOL_INTERLEAVE) {
unsigned nid;
nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
- __mpol_free(pol); /* finished with pol */
+ if (unlikely(policy_needs_unref))
+ __mpol_free(pol); /* finished with pol */
return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags);
}
zl = zonelist_policy(GFP_HIGHUSER, pol);
- if (unlikely(pol != &default_policy && pol != current->mempolicy)) {
+ if (unlikely(policy_needs_unref)) {
if (pol->policy != MPOL_BIND)
__mpol_free(pol); /* finished with pol */
else
@@ -1325,6 +1334,9 @@ alloc_page_vma(gfp_t gfp, struct vm_area
{
struct mempolicy *pol = get_vma_policy(current, vma, addr);
struct zonelist *zl;
+ int policy_needs_unref = (pol != &default_policy && \
+ pol != current->mempolicy);
+
cpuset_update_task_memory_state();
@@ -1332,10 +1344,12 @@ alloc_page_vma(gfp_t gfp, struct vm_area
unsigned nid;
nid = interleave_nid(pol, vma, addr, PAGE_SHIFT);
+ if (unlikely(policy_needs_unref))
+ __mpol_free(pol);
return alloc_page_interleave(gfp, 0, nid);
}
zl = zonelist_policy(gfp, pol);
- if (pol != &default_policy && pol != current->mempolicy) {
+ if (unlikely(policy_needs_unref)) {
/*
* slow path: ref counted policy -- shared or vma
*/
@@ -1444,6 +1458,8 @@ int __mpol_equal(struct mempolicy *a, st
/* Slow path of a mpol destructor. */
void __mpol_free(struct mempolicy *p)
{
+ VM_BUG_ON(p == &default_policy);
+
if (!atomic_dec_and_test(&p->refcnt))
return;
if (p->policy == MPOL_BIND)
Index: Linux/ipc/shm.c
===================================================================
--- Linux.orig/ipc/shm.c 2007-10-10 13:36:44.000000000 -0400
+++ Linux/ipc/shm.c 2007-10-10 14:21:59.000000000 -0400
@@ -263,10 +263,10 @@ static struct mempolicy *shm_get_policy(
if (sfd->vm_ops->get_policy)
pol = sfd->vm_ops->get_policy(vma, addr);
- else if (vma->vm_policy)
+ else if (vma->vm_policy) {
pol = vma->vm_policy;
- else
- pol = current->mempolicy;
+ mpol_get(pol); /* get_vma_policy() assumes this */
+ }
return pol;
}
#endif
Index: Linux/include/linux/mempolicy.h
===================================================================
--- Linux.orig/include/linux/mempolicy.h 2007-10-10 13:36:44.000000000 -0400
+++ Linux/include/linux/mempolicy.h 2007-10-10 14:20:28.000000000 -0400
@@ -2,6 +2,7 @@
#define _LINUX_MEMPOLICY_H 1
#include <linux/errno.h>
+#include <linux/mm.h>
/*
* NUMA memory policies for Linux.
@@ -72,6 +73,8 @@ struct mempolicy {
nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */
};
+extern struct mempolicy default_policy;
+
/*
* Support for managing mempolicy data objects (clone, copy, destroy)
* The default fast path of a NULL MPOL_DEFAULT policy is always inlined.
@@ -97,6 +100,7 @@ static inline struct mempolicy *mpol_cop
static inline void mpol_get(struct mempolicy *pol)
{
+ VM_BUG_ON(pol == &default_policy);
if (pol)
atomic_inc(&pol->refcnt);
}
@@ -149,7 +153,6 @@ extern void mpol_rebind_task(struct task
extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new);
extern void mpol_fix_fork_child_flag(struct task_struct *p);
-extern struct mempolicy default_policy;
extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol);
extern unsigned slab_node(struct mempolicy *policy);
Index: Linux/include/linux/mm.h
===================================================================
--- Linux.orig/include/linux/mm.h 2007-10-10 13:36:44.000000000 -0400
+++ Linux/include/linux/mm.h 2007-10-10 14:20:28.000000000 -0400
@@ -173,7 +173,21 @@ struct vm_operations_struct {
* writable, if an error is returned it will cause a SIGBUS */
int (*page_mkwrite)(struct vm_area_struct *vma, struct page *page);
#ifdef CONFIG_NUMA
+ /*
+ * set_policy() op must add a reference to any non-NULL @new mempolicy
+ * to hold the policy upon return. Caller should pass NULL @new to
+ * remove a policy and fall back to surrounding context--i.e. do not
+ * install a MPOL_DEFAULT policy, nor the task or system default
+ * mempolicy.
+ */
int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new);
+
+ /*
+ * get_policy() op must add reference [mpol_get()] to any mempolicy
+ * at (vma,addr). If no [shared/vma] mempolicy exists at that addr,
+ * get_policy() op must return NULL--i.e., do not "fallback" to task
+ * or system default policy.
+ */
struct mempolicy *(*get_policy)(struct vm_area_struct *vma,
unsigned long addr);
int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver
2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn
@ 2007-10-10 21:12 ` Christoph Lameter
2007-10-11 19:11 ` [PATCH " Lee Schermerhorn
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2007-10-10 21:12 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: akpm, ak, gregkh, linux-mm, mel, eric.whitney
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting
2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn
@ 2007-10-10 21:22 ` Christoph Lameter
2007-10-11 13:41 ` Lee Schermerhorn
2007-10-11 19:07 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Lee Schermerhorn
0 siblings, 2 replies; 14+ messages in thread
From: Christoph Lameter @ 2007-10-10 21:22 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: akpm, ak, gregkh, linux-mm, mel, eric.whitney
On Wed, 10 Oct 2007, Lee Schermerhorn wrote:
> * get_vma_policy() assumes that shared policies are referenced by
> the get_policy() vm_op, if any. This is true for shmem_get_policy()
> but not for shm_get_policy() when the "backing file" does not
> support a get_policy() vm_op. The latter is the case for SHM_HUGETLB
> segments. Because get_vma_policy() expects the get_policy() op to
> have added a ref, it doesn't do so itself. This results in
> premature freeing of the policy. Add the mpol_get() to the
> shm_get_policy() op when the backing file doesn't support shared
> policies.
Could you add support for SHM_HUGETLB segments instead to make this
consistent so that shared policies always use a get_policy function?
> * Further, shm_get_policy() was falling back to current task's task
> policy if the backing file did not support get_policy() vm_op and
> the vma policy was null. This is not valid when get_vma_policy() is
> called from show_numa_map() as task != current. Also, this did
> not match the behavior of the shmem_get_policy() vm_op which did
> NOT fall back to task policy. So, modify shm_get_policy() NOT to
> fall back to current->mempolicy.
Hmmm..... The show_numa_map issue is special. Maybe fix that one instead?
> Index: Linux/include/linux/mempolicy.h
> ===================================================================
> --- Linux.orig/include/linux/mempolicy.h 2007-10-10 13:36:44.000000000 -0400
> +++ Linux/include/linux/mempolicy.h 2007-10-10 14:20:28.000000000 -0400
> @@ -2,6 +2,7 @@
> #define _LINUX_MEMPOLICY_H 1
>
> #include <linux/errno.h>
> +#include <linux/mm.h>
I think we tried to avoid a heavy include here. mm.h is huge and draws in
lots of other include files. The additional include is only needed for the
VM_BUG_ON it seems? BUG_ON would also work.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting
2007-10-10 21:22 ` Christoph Lameter
@ 2007-10-11 13:41 ` Lee Schermerhorn
2007-10-11 19:07 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Lee Schermerhorn
1 sibling, 0 replies; 14+ messages in thread
From: Lee Schermerhorn @ 2007-10-11 13:41 UTC (permalink / raw)
To: Christoph Lameter; +Cc: akpm, ak, gregkh, linux-mm, mel, eric.whitney
On Wed, 2007-10-10 at 14:22 -0700, Christoph Lameter wrote:
> On Wed, 10 Oct 2007, Lee Schermerhorn wrote:
>
> > * get_vma_policy() assumes that shared policies are referenced by
> > the get_policy() vm_op, if any. This is true for shmem_get_policy()
> > but not for shm_get_policy() when the "backing file" does not
> > support a get_policy() vm_op. The latter is the case for SHM_HUGETLB
> > segments. Because get_vma_policy() expects the get_policy() op to
> > have added a ref, it doesn't do so itself. This results in
> > premature freeing of the policy. Add the mpol_get() to the
> > shm_get_policy() op when the backing file doesn't support shared
> > policies.
>
> Could you add support for SHM_HUGETLB segments instead to make this
> consistent so that shared policies always use a get_policy function?
I have patches that do this as part of my shared policy series that is
currently "on hold" while we sort these other things out. SHM_HUGETLB
segments do use the shm_get_policy() vm_op. However, it detects that
the hugetlb shm segment does not support get_policy(), so it just uses
the vma policy at that address. You should like this behavior! :-). My
patches implement shared policy for SHM_HUGETLB, which you don't like.
So, I think we should leave this as is... for now.
>
> > * Further, shm_get_policy() was falling back to current task's task
> > policy if the backing file did not support get_policy() vm_op and
> > the vma policy was null. This is not valid when get_vma_policy() is
> > called from show_numa_map() as task != current. Also, this did
> > not match the behavior of the shmem_get_policy() vm_op which did
> > NOT fall back to task policy. So, modify shm_get_policy() NOT to
> > fall back to current->mempolicy.
>
> Hmmm..... The show_numa_map issue is special. Maybe fix that one instead?
>
> > Index: Linux/include/linux/mempolicy.h
> > ===================================================================
> > --- Linux.orig/include/linux/mempolicy.h 2007-10-10 13:36:44.000000000 -0400
> > +++ Linux/include/linux/mempolicy.h 2007-10-10 14:20:28.000000000 -0400
> > @@ -2,6 +2,7 @@
> > #define _LINUX_MEMPOLICY_H 1
> >
> > #include <linux/errno.h>
> > +#include <linux/mm.h>
>
> I think we tried to avoid a heavy include here. mm.h is huge and draws in
> lots of other include files. The additional include is only needed for the
> VM_BUG_ON it seems? BUG_ON would also work.
Yeah, I know. However, I like the idea of having a separately
configurable VM debug check. I will remove the include and the
VM_BUG_ON for now. But, what would [any one else?] think about moving
VM_BUG_ON() to asm-generic/bug.h in a separate patch?
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2
2007-10-10 21:22 ` Christoph Lameter
2007-10-11 13:41 ` Lee Schermerhorn
@ 2007-10-11 19:07 ` Lee Schermerhorn
2007-10-12 1:42 ` Christoph Lameter
1 sibling, 1 reply; 14+ messages in thread
From: Lee Schermerhorn @ 2007-10-11 19:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: ak, gregkh, linux-mm, mel, eric.whitney, Christoph Lameter
Here's V2 of the second patch of the series, modified as suggested by
Christoph vis a vis include of <linux/mm.h> in mempolicy.h. I did not
change the hugetlbfs to add *_policy() ops as I think the current
mechanisms work as required now. I.e., even if I added the policy
vm_ops to hugetlbfs, I think the changes below to mempolicy and
shm_get_policy() would still be required in the general case.
I have removed the 'RFC'. Please review for possible merge.
Lee
====
PATCH 2/2 Mempolicy: Fixup Shm and Interleave Policy Reference Counting
Against: 2.6.23-rc8-mm2
V1 -> V2:
+ remove include of <linux/mm.h> from mempolicy.h and use
BUG_ON(), conditional on CONFIG_DEBUG_VM, in mpol_get()
In the memory policy reference counting cleanup patch, I missed
one path that needs to unreference the memory policy. After
computing the target node for interleave policy, we need to
drop the reference if the policy is not the system default nor
the current task's policy.
In huge_zonelist(), I was unconditionally unref'ing the policy
in the interleave path, even when it was a policy that didn't
need it. Fix this!
Note: I investigated moving the check for "policy_needs_unref"
to the mpol_free() wrapper, but this led to nasty circular header
dependencies. If we wanted to make mpol_free() an external
function, rather than a static inline, I could do this and
remove several checks. I'd still need to keep an explicit
check in alloc_page_vma() if we want to use a tail-call for
the fast path.
get_vma_policy() assumes that shared policies are referenced by
the get_policy() vm_op, if any. This is true for shmem_get_policy()
but not for shm_get_policy() when the "backing file" does not
support a get_policy() vm_op. The latter is the case for SHM_HUGETLB
segments. Because get_vma_policy() expects the get_policy() op to
have added a ref, it doesn't do so itself. This results in
premature freeing of the policy. Add the mpol_get() to the
shm_get_policy() op when the backing file doesn't support shared
policies.
Further, shm_get_policy() was falling back to current task's task
policy if the backing file did not support get_policy() vm_op and
the vma policy was null. This is not valid when get_vma_policy() is
called from show_numa_map() as task != current. Also, this did
not match the behavior of the shmem_get_policy() vm_op which did
NOT fall back to task policy. So, modify shm_get_policy() NOT to
fall back to current->mempolicy.
Now, turns out that get_vma_policy() was not handling fallback to
task policy correctly when the get_policy() vm_op returns NULL.
Rather, it was falling back directly to system default policy.
So, fix get_vma_policy() to use only non-NULL policy returned from
the vma get_policy op and indicate that this policy does not need
another ref count.
Add [VM_]BUG_ON()s to __mpol_free() and mpol_get() to trap attempts
to ref/unref the system default policy. Should no longer occur.
Document mempolicy return value reference semantics assumed by
the changes discussed above for the set_ and get_policy vm_ops
in <linux/mm.h>--where the prototypes are defined.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/mempolicy.h | 6 +++++-
include/linux/mm.h | 14 ++++++++++++++
ipc/shm.c | 6 +++---
mm/mempolicy.c | 32 ++++++++++++++++++++++++--------
4 files changed, 46 insertions(+), 12 deletions(-)
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c 2007-10-10 14:58:12.000000000 -0400
+++ Linux/mm/mempolicy.c 2007-10-11 14:07:42.000000000 -0400
@@ -1112,19 +1112,25 @@ static struct mempolicy * get_vma_policy
struct vm_area_struct *vma, unsigned long addr)
{
struct mempolicy *pol = task->mempolicy;
- int shared_pol = 0;
+ int pol_needs_ref = (task != current);
if (vma) {
if (vma->vm_ops && vma->vm_ops->get_policy) {
- pol = vma->vm_ops->get_policy(vma, addr);
- shared_pol = 1; /* if pol non-NULL, add ref below */
+ struct mempolicy *vpol = vma->vm_ops->get_policy(vma,
+ addr);
+ if (vpol) {
+ pol = vpol;
+ pol_needs_ref = 0; /* get_policy() added ref */
+ }
} else if (vma->vm_policy &&
- vma->vm_policy->policy != MPOL_DEFAULT)
+ vma->vm_policy->policy != MPOL_DEFAULT) {
pol = vma->vm_policy;
+ pol_needs_ref++;
+ }
}
if (!pol)
pol = &default_policy;
- else if (!shared_pol && pol != current->mempolicy)
+ else if (pol_needs_ref)
mpol_get(pol); /* vma or other task's policy */
return pol;
}
@@ -1262,18 +1268,21 @@ struct zonelist *huge_zonelist(struct vm
{
struct mempolicy *pol = get_vma_policy(current, vma, addr);
struct zonelist *zl;
+ int policy_needs_unref = (pol != &default_policy && \
+ pol != current->mempolicy);
*mpol = NULL; /* probably no unref needed */
if (pol->policy == MPOL_INTERLEAVE) {
unsigned nid;
nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
- __mpol_free(pol); /* finished with pol */
+ if (unlikely(policy_needs_unref))
+ __mpol_free(pol); /* finished with pol */
return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags);
}
zl = zonelist_policy(GFP_HIGHUSER, pol);
- if (unlikely(pol != &default_policy && pol != current->mempolicy)) {
+ if (unlikely(policy_needs_unref)) {
if (pol->policy != MPOL_BIND)
__mpol_free(pol); /* finished with pol */
else
@@ -1325,6 +1334,9 @@ alloc_page_vma(gfp_t gfp, struct vm_area
{
struct mempolicy *pol = get_vma_policy(current, vma, addr);
struct zonelist *zl;
+ int policy_needs_unref = (pol != &default_policy && \
+ pol != current->mempolicy);
+
cpuset_update_task_memory_state();
@@ -1332,10 +1344,12 @@ alloc_page_vma(gfp_t gfp, struct vm_area
unsigned nid;
nid = interleave_nid(pol, vma, addr, PAGE_SHIFT);
+ if (unlikely(policy_needs_unref))
+ __mpol_free(pol);
return alloc_page_interleave(gfp, 0, nid);
}
zl = zonelist_policy(gfp, pol);
- if (pol != &default_policy && pol != current->mempolicy) {
+ if (unlikely(policy_needs_unref)) {
/*
* slow path: ref counted policy -- shared or vma
*/
@@ -1444,6 +1458,8 @@ int __mpol_equal(struct mempolicy *a, st
/* Slow path of a mpol destructor. */
void __mpol_free(struct mempolicy *p)
{
+ VM_BUG_ON(p == &default_policy);
+
if (!atomic_dec_and_test(&p->refcnt))
return;
if (p->policy == MPOL_BIND)
Index: Linux/ipc/shm.c
===================================================================
--- Linux.orig/ipc/shm.c 2007-10-10 14:58:12.000000000 -0400
+++ Linux/ipc/shm.c 2007-10-10 14:59:13.000000000 -0400
@@ -263,10 +263,10 @@ static struct mempolicy *shm_get_policy(
if (sfd->vm_ops->get_policy)
pol = sfd->vm_ops->get_policy(vma, addr);
- else if (vma->vm_policy)
+ else if (vma->vm_policy) {
pol = vma->vm_policy;
- else
- pol = current->mempolicy;
+ mpol_get(pol); /* get_vma_policy() assumes this */
+ }
return pol;
}
#endif
Index: Linux/include/linux/mempolicy.h
===================================================================
--- Linux.orig/include/linux/mempolicy.h 2007-10-10 14:58:12.000000000 -0400
+++ Linux/include/linux/mempolicy.h 2007-10-11 14:09:19.000000000 -0400
@@ -72,6 +72,8 @@ struct mempolicy {
nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */
};
+extern struct mempolicy default_policy;
+
/*
* Support for managing mempolicy data objects (clone, copy, destroy)
* The default fast path of a NULL MPOL_DEFAULT policy is always inlined.
@@ -97,6 +99,9 @@ static inline struct mempolicy *mpol_cop
static inline void mpol_get(struct mempolicy *pol)
{
+#ifdef CONFIG_DEBUG_VM
+ BUG_ON(pol == &default_policy);
+#endif
if (pol)
atomic_inc(&pol->refcnt);
}
@@ -149,7 +154,6 @@ extern void mpol_rebind_task(struct task
extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new);
extern void mpol_fix_fork_child_flag(struct task_struct *p);
-extern struct mempolicy default_policy;
extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol);
extern unsigned slab_node(struct mempolicy *policy);
Index: Linux/include/linux/mm.h
===================================================================
--- Linux.orig/include/linux/mm.h 2007-10-10 14:58:12.000000000 -0400
+++ Linux/include/linux/mm.h 2007-10-11 14:07:43.000000000 -0400
@@ -173,7 +173,21 @@ struct vm_operations_struct {
* writable, if an error is returned it will cause a SIGBUS */
int (*page_mkwrite)(struct vm_area_struct *vma, struct page *page);
#ifdef CONFIG_NUMA
+ /*
+ * set_policy() op must add a reference to any non-NULL @new mempolicy
+ * to hold the policy upon return. Caller should pass NULL @new to
+ * remove a policy and fall back to surrounding context--i.e. do not
+ * install a MPOL_DEFAULT policy, nor the task or system default
+ * mempolicy.
+ */
int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new);
+
+ /*
+ * get_policy() op must add reference [mpol_get()] to any mempolicy
+ * at (vma,addr). If no [shared/vma] mempolicy exists at that addr,
+ * get_policy() op must return NULL--i.e., do not "fallback" to task
+ * or system default policy.
+ */
struct mempolicy *(*get_policy)(struct vm_area_struct *vma,
unsigned long addr);
int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] Mem Policy: fix mempolicy usage in pci driver
2007-10-10 21:12 ` Christoph Lameter
@ 2007-10-11 19:11 ` Lee Schermerhorn
2007-10-12 1:43 ` Christoph Lameter
2007-10-15 11:19 ` [PATCH 1/2] " Mel Gorman
0 siblings, 2 replies; 14+ messages in thread
From: Lee Schermerhorn @ 2007-10-11 19:11 UTC (permalink / raw)
To: Andrew Morton; +Cc: ak, gregkh, linux-mm, mel, eric.whitney, Christoph Lameter
On Wed, 2007-10-10 at 14:12 -0700, Christoph Lameter wrote:
> Acked-by: Christoph Lameter <clameter@sgi.com>
>
Resend with 'RFC' removed. Please review/consider for merge.
Note: this is required BEFORE patch 2 of this series to avoid hitting
the [VM_]BUG_ON()s added by the second patch.
Lee
====
PATCH 1/2 Fix memory policy usage in pci driver
Against: 2.6.23-rc8-mm2
In an attempt to ensure memory allocation from the local node,
the pci driver temporarily replaces the current task's memory
policy with the system default policy. Trying to be a good
citizen, the driver then call's mpol_get() on the new policy.
When it's finished probing, it undoes the '_get by calling
mpol_free() [on the system default policy] and then restores
the current task's saved mempolicy.
A couple of issues here:
1) it's never necessary to set a task's mempolicy to the
system default policy in order to get system default
allocation behavior. Simply set the current task's
mempolicy to NULL and allocations will fall back to
system default policy.
2) we should never [need to] call mpol_free() on the system
default policy. [I plan on trapping this with a VM_BUG_ON()
in a subsequent patch.]
This patch removes the calls to mpol_get() and mpol_free()
and uses NULL for the temporary task mempolicy to effect
default allocation behavior.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
drivers/pci/pci-driver.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Index: Linux/drivers/pci/pci-driver.c
===================================================================
--- Linux.orig/drivers/pci/pci-driver.c 2007-10-09 14:31:57.000000000
-0400
+++ Linux/drivers/pci/pci-driver.c 2007-10-09 14:43:57.000000000 -0400
@@ -177,13 +177,11 @@ static int pci_call_probe(struct pci_dri
set_cpus_allowed(current, node_to_cpumask(node));
/* And set default memory allocation policy */
oldpol = current->mempolicy;
- current->mempolicy = &default_policy;
- mpol_get(current->mempolicy);
+ current->mempolicy = NULL; /* fall back to system default policy */
#endif
error = drv->probe(dev, id);
#ifdef CONFIG_NUMA
set_cpus_allowed(current, oldmask);
- mpol_free(current->mempolicy);
current->mempolicy = oldpol;
#endif
return error;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2
2007-10-11 19:07 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Lee Schermerhorn
@ 2007-10-12 1:42 ` Christoph Lameter
2007-10-12 14:35 ` Lee Schermerhorn
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2007-10-12 1:42 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: Andrew Morton, ak, gregkh, linux-mm, mel, eric.whitney
On Thu, 11 Oct 2007, Lee Schermerhorn wrote:
> I have removed the 'RFC'. Please review for possible merge.
I am still concerned with all this special casing which gets very
difficult to follow. Isnt there some way to simplify the refcount handling
here? It seems that the refcount fix introduced more bugs. One solution
would be to revert that patch instead.
> V1 -> V2:
> + remove include of <linux/mm.h> from mempolicy.h and use
> BUG_ON(), conditional on CONFIG_DEBUG_VM, in mpol_get()
Drop the BUG_ON completely? If this is a bug fix release then lets keep
this as minimal as possible.
Could you make this a series of separate patches. Each for one
issue?
> get_vma_policy() assumes that shared policies are referenced by
> the get_policy() vm_op, if any. This is true for shmem_get_policy()
> but not for shm_get_policy() when the "backing file" does not
> support a get_policy() vm_op. The latter is the case for SHM_HUGETLB
> segments. Because get_vma_policy() expects the get_policy() op to
> have added a ref, it doesn't do so itself. This results in
> premature freeing of the policy. Add the mpol_get() to the
> shm_get_policy() op when the backing file doesn't support shared
> policies.
Maybe get_vma_policy() should make no such assumption? Why is
get_vma_policy taking a refcount at all? The vma policies are guaranteed
based on the process that is running. But what keeps the shared
policies from being freed? Isnt there an inherent race here that cannot be
remedied by taking a refcount?
> Further, shm_get_policy() was falling back to current task's task
> policy if the backing file did not support get_policy() vm_op and
> the vma policy was null. This is not valid when get_vma_policy() is
> called from show_numa_map() as task != current. Also, this did
> not match the behavior of the shmem_get_policy() vm_op which did
> NOT fall back to task policy. So, modify shm_get_policy() NOT to
> fall back to current->mempolicy.
get_vma_policy() is passed a pointer to the task struct. It does *not*
fall back to the current tasks policy.
> Now, turns out that get_vma_policy() was not handling fallback to
> task policy correctly when the get_policy() vm_op returns NULL.
> Rather, it was falling back directly to system default policy.
> So, fix get_vma_policy() to use only non-NULL policy returned from
> the vma get_policy op and indicate that this policy does not need
> another ref count.
Nope. Its falling back to the task policy.
static struct mempolicy * get_vma_policy(struct task_struct *task,
struct vm_area_struct *vma, unsigned long addr)
{
--> struct mempolicy *pol = task->mempolicy;
int shared_pol = 0;
if (vma) {
if (vma->vm_ops && vma->vm_ops->get_policy) {
pol = vma->vm_ops->get_policy(vma, addr);
shared_pol = 1; /* if pol non-NULL, add ref below */
} else if (vma->vm_policy &&
vma->vm_policy->policy != MPOL_DEFAULT)
pol = vma->vm_policy;
}
if (!pol)
pol = &default_policy;
else if (!shared_pol && pol != current->mempolicy)
mpol_get(pol); /* vma or other task's policy */
return pol;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] Mem Policy: fix mempolicy usage in pci driver
2007-10-11 19:11 ` [PATCH " Lee Schermerhorn
@ 2007-10-12 1:43 ` Christoph Lameter
2007-11-06 18:09 ` [PATCH ] " Lee Schermerhorn
2007-10-15 11:19 ` [PATCH 1/2] " Mel Gorman
1 sibling, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2007-10-12 1:43 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: Andrew Morton, ak, gregkh, linux-mm, mel, eric.whitney
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2
2007-10-12 1:42 ` Christoph Lameter
@ 2007-10-12 14:35 ` Lee Schermerhorn
2007-10-12 17:27 ` Christoph Lameter
0 siblings, 1 reply; 14+ messages in thread
From: Lee Schermerhorn @ 2007-10-12 14:35 UTC (permalink / raw)
To: Christoph Lameter; +Cc: Andrew Morton, ak, gregkh, linux-mm, mel, eric.whitney
On Thu, 2007-10-11 at 18:42 -0700, Christoph Lameter wrote:
> On Thu, 11 Oct 2007, Lee Schermerhorn wrote:
>
> > I have removed the 'RFC'. Please review for possible merge.
>
> I am still concerned with all this special casing which gets very
> difficult to follow. Isnt there some way to simplify the refcount handling
> here? It seems that the refcount fix introduced more bugs. One solution
> would be to revert that patch instead.
I've tried to remove the special cases by making all [existing]
get_policy ops consistent and then documenting the rules on the
prototypes for any possible future ops. The basic rule is--don't fall
back to task/sysdefault policy--let get_vma_policy() handle that. This
is required to get the ref counting right.
If you apply the patch and look at get_vma_policy, you'll see that it
will only add a reference count for other task's policy or vma policy.
Now, the policy returned by the get_policy() op is essentially a vma
policy, but the policy op must add the ref itself 1) to prevent races
with other tasks, in the case of shared policy or 2) to be consistent
with #1 for SHM_HUGETLB segments which [currently] don't use shared
policy.
I try to avoid taking ref on current task's policy and sys default
policy because 1) it's not necessary and 2) these should be the vast
majority of the cases. In previous discussions with yourself and Andi
[back in June/July] you agreed this was the way to go. All my recent
changes have been an attempt to do this in the most consistent manner
possible.
>
> > V1 -> V2:
> > + remove include of <linux/mm.h> from mempolicy.h and use
> > BUG_ON(), conditional on CONFIG_DEBUG_VM, in mpol_get()
>
> Drop the BUG_ON completely? If this is a bug fix release then lets keep
> this as minimal as possible.
I could, I guess, and just add it into my tree for testing. But, I fear
that, in the future, someone might add some code like I removed from the
pci driver and break some other assumptions. The bug on is only
compiled when DEBUG_VM is configured. In that case, the overhead in
mpol_get() is the least of the additional overhead!
But if you REALLY don't like it [and I agree it's ugly, with the
explicit #ifdef], I guess I can remove it. I'd sure like to hear other
opinions, tho'
>
> Could you make this a series of separate patches. Each for one
> issue?
Yeah. I can do that. I'm outta here for a week+ after today and won't
get to it until later in the month, as I have a lot of other things to
tie off today.
In the meantime, however, if anyone tries to apply a policy [mbind] to a
SHM_HUGETLB segment, they will BUG-out on the 2nd page fault with the
current upstream [2.6.23] code. Kind of serious I think...
>
> > get_vma_policy() assumes that shared policies are referenced by
> > the get_policy() vm_op, if any. This is true for shmem_get_policy()
> > but not for shm_get_policy() when the "backing file" does not
> > support a get_policy() vm_op. The latter is the case for SHM_HUGETLB
> > segments. Because get_vma_policy() expects the get_policy() op to
> > have added a ref, it doesn't do so itself. This results in
> > premature freeing of the policy. Add the mpol_get() to the
> > shm_get_policy() op when the backing file doesn't support shared
> > policies.
>
> Maybe get_vma_policy() should make no such assumption? Why is
> get_vma_policy taking a refcount at all? The vma policies are guaranteed
> based on the process that is running. But what keeps the shared
> policies from being freed? Isnt there an inherent race here that cannot be
> remedied by taking a refcount?
>
> > Further, shm_get_policy() was falling back to current task's task
> > policy if the backing file did not support get_policy() vm_op and
> > the vma policy was null. This is not valid when get_vma_policy() is
> > called from show_numa_map() as task != current. Also, this did
> > not match the behavior of the shmem_get_policy() vm_op which did
> > NOT fall back to task policy. So, modify shm_get_policy() NOT to
> > fall back to current->mempolicy.
>
> get_vma_policy() is passed a pointer to the task struct. It does *not*
> fall back to the current tasks policy.
>
> > Now, turns out that get_vma_policy() was not handling fallback to
> > task policy correctly when the get_policy() vm_op returns NULL.
> > Rather, it was falling back directly to system default policy.
> > So, fix get_vma_policy() to use only non-NULL policy returned from
> > the vma get_policy op and indicate that this policy does not need
> > another ref count.
>
> Nope. Its falling back to the task policy.
But, the get_policy() vm_op can overwrite 'pol' with a NULL return
value. This can happen when you have a real shmem segment with default
policy == NULL/no policy. See below:
>
> static struct mempolicy * get_vma_policy(struct task_struct *task,
> struct vm_area_struct *vma, unsigned long addr)
> {
> --> struct mempolicy *pol = task->mempolicy;
> int shared_pol = 0;
>
> if (vma) {
> if (vma->vm_ops && vma->vm_ops->get_policy) {
> pol = vma->vm_ops->get_policy(vma, addr);
^^^ possibly NULL for shmem w/ default policy
> shared_pol = 1; /* if pol non-NULL, add ref below */
> } else if (vma->vm_policy &&
> vma->vm_policy->policy != MPOL_DEFAULT)
> pol = vma->vm_policy;
> }
> if (!pol)
> pol = &default_policy;
^^^ could get here w/ NULL shmem policy and
!NULL task policy. Incorrect fallback.
> else if (!shared_pol && pol != current->mempolicy)
> mpol_get(pol); /* vma or other task's policy */
> return pol;
> }
>
Patch 2/2 clears all of this up. I think. I did test it, but could
have missed something... again.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2
2007-10-12 14:35 ` Lee Schermerhorn
@ 2007-10-12 17:27 ` Christoph Lameter
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2007-10-12 17:27 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: Andrew Morton, ak, gregkh, linux-mm, mel, eric.whitney
On Fri, 12 Oct 2007, Lee Schermerhorn wrote:
> In the meantime, however, if anyone tries to apply a policy [mbind] to a
> SHM_HUGETLB segment, they will BUG-out on the 2nd page fault with the
> current upstream [2.6.23] code. Kind of serious I think...
And even after the fix they will have trouble with cpusets constraints
that do not match the mpol bind set?
> > Nope. Its falling back to the task policy.
>
> But, the get_policy() vm_op can overwrite 'pol' with a NULL return
> value. This can happen when you have a real shmem segment with default
> policy == NULL/no policy. See below:
Ah. The logical fix then is to define an additional temporary variable and
only overwrite the pol variable if a policy has been returned? That would
be consistent with how the vma policies are handled?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] Mem Policy: fix mempolicy usage in pci driver
2007-10-11 19:11 ` [PATCH " Lee Schermerhorn
2007-10-12 1:43 ` Christoph Lameter
@ 2007-10-15 11:19 ` Mel Gorman
1 sibling, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2007-10-15 11:19 UTC (permalink / raw)
To: Lee Schermerhorn
Cc: Andrew Morton, ak, gregkh, linux-mm, eric.whitney,
Christoph Lameter
On (11/10/07 15:11), Lee Schermerhorn didst pronounce:
> On Wed, 2007-10-10 at 14:12 -0700, Christoph Lameter wrote:
> > Acked-by: Christoph Lameter <clameter@sgi.com>
> >
>
> Resend with 'RFC' removed. Please review/consider for merge.
>
> Note: this is required BEFORE patch 2 of this series to avoid hitting
> the [VM_]BUG_ON()s added by the second patch.
>
> Lee
>
> ====
> PATCH 1/2 Fix memory policy usage in pci driver
>
> Against: 2.6.23-rc8-mm2
>
> In an attempt to ensure memory allocation from the local node,
> the pci driver temporarily replaces the current task's memory
> policy with the system default policy. Trying to be a good
> citizen, the driver then call's mpol_get() on the new policy.
> When it's finished probing, it undoes the '_get by calling
> mpol_free() [on the system default policy] and then restores
> the current task's saved mempolicy.
>
> A couple of issues here:
>
> 1) it's never necessary to set a task's mempolicy to the
> system default policy in order to get system default
> allocation behavior. Simply set the current task's
> mempolicy to NULL and allocations will fall back to
> system default policy.
>
> 2) we should never [need to] call mpol_free() on the system
> default policy. [I plan on trapping this with a VM_BUG_ON()
> in a subsequent patch.]
>
> This patch removes the calls to mpol_get() and mpol_free()
> and uses NULL for the temporary task mempolicy to effect
> default allocation behavior.
>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
>
> drivers/pci/pci-driver.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> Index: Linux/drivers/pci/pci-driver.c
> ===================================================================
> --- Linux.orig/drivers/pci/pci-driver.c 2007-10-09 14:31:57.000000000
> -0400
> +++ Linux/drivers/pci/pci-driver.c 2007-10-09 14:43:57.000000000 -0400
> @@ -177,13 +177,11 @@ static int pci_call_probe(struct pci_dri
> set_cpus_allowed(current, node_to_cpumask(node));
> /* And set default memory allocation policy */
> oldpol = current->mempolicy;
> - current->mempolicy = &default_policy;
> - mpol_get(current->mempolicy);
> + current->mempolicy = NULL; /* fall back to system default policy */
> #endif
> error = drv->probe(dev, id);
> #ifdef CONFIG_NUMA
> set_cpus_allowed(current, oldmask);
> - mpol_free(current->mempolicy);
> current->mempolicy = oldpol;
> #endif
> return error;
>
>
--
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH ] Mem Policy: fix mempolicy usage in pci driver
2007-10-12 1:43 ` Christoph Lameter
@ 2007-11-06 18:09 ` Lee Schermerhorn
0 siblings, 0 replies; 14+ messages in thread
From: Lee Schermerhorn @ 2007-11-06 18:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: Christoph Lameter, ak, gregkh, linux-mm, mel, eric.whitney
I see that you're starting to build up the next -mm tree. Would you
please apply the following patch? I posted it earlier [twice] and
Christoph Ack'd it [twice]. Haven't heard any objections.
Lee
--------------
PATCH Mem Policy: Fix memory policy usage in pci driver
Against: 2.6.23-mm1
In an attempt to ensure memory allocation from the local node,
the pci driver temporarily replaces the current task's memory
policy with the system default policy. Trying to be a good
citizen, the driver then call's mpol_get() on the new policy.
When it's finished probing, it undoes the '_get by calling
mpol_free() [on the system default policy] and then restores
the current task's saved mempolicy.
A couple of issues here:
1) it's never necessary to set a task's mempolicy to the
system default policy in order to get system default
allocation behavior. Simply set the current task's
mempolicy to NULL and allocations will fall back to
system default policy.
2) we should never [need to] call mpol_free() on the system
default policy. [I plan on trapping this with a VM_BUG_ON()
in a subsequent patch.]
This patch removes the calls to mpol_get() and mpol_free()
and uses NULL for the temporary task mempolicy to effect
default allocation behavior.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Twice:
Acked-by: Christoph Lameter <clameter@sgi.com>
drivers/pci/pci-driver.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Index: Linux/drivers/pci/pci-driver.c
===================================================================
--- Linux.orig/drivers/pci/pci-driver.c 2007-10-09 14:31:57.000000000 -0400
+++ Linux/drivers/pci/pci-driver.c 2007-10-09 14:43:57.000000000 -0400
@@ -177,13 +177,11 @@ static int pci_call_probe(struct pci_dri
set_cpus_allowed(current, node_to_cpumask(node));
/* And set default memory allocation policy */
oldpol = current->mempolicy;
- current->mempolicy = &default_policy;
- mpol_get(current->mempolicy);
+ current->mempolicy = NULL; /* fall back to system default policy */
#endif
error = drv->probe(dev, id);
#ifdef CONFIG_NUMA
set_cpus_allowed(current, oldmask);
- mpol_free(current->mempolicy);
current->mempolicy = oldpol;
#endif
return error;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-11-06 18:09 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-10 20:58 [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes Lee Schermerhorn
2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn
2007-10-10 21:12 ` Christoph Lameter
2007-10-11 19:11 ` [PATCH " Lee Schermerhorn
2007-10-12 1:43 ` Christoph Lameter
2007-11-06 18:09 ` [PATCH ] " Lee Schermerhorn
2007-10-15 11:19 ` [PATCH 1/2] " Mel Gorman
2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn
2007-10-10 21:22 ` Christoph Lameter
2007-10-11 13:41 ` Lee Schermerhorn
2007-10-11 19:07 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Lee Schermerhorn
2007-10-12 1:42 ` Christoph Lameter
2007-10-12 14:35 ` Lee Schermerhorn
2007-10-12 17:27 ` Christoph Lameter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).