* [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes @ 2007-10-10 20:58 Lee Schermerhorn 2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn 2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn 0 siblings, 2 replies; 14+ messages in thread From: Lee Schermerhorn @ 2007-10-10 20:58 UTC (permalink / raw) To: akpm; +Cc: ak, clameter, gregkh, linux-mm, mel, eric.whitney PATCH 0/2 Memory Policy: More Reference Counting Fixes While testing huge pages using shm segments created with the SHM_HUGETLB flag, I came across additional problems with memory policy reference counting. While tracking these down, I found even more... Some of the problems were introduced by myself in my previous mempolicy ref counting patch, and some were paths that I missed in my inspection and testing. These 2 patches are both based against 2.6.23-rc8-mm2, but should also probably go to the .23-stable tree when it opens. They have been tested on ia64 and x86_64 platforms. Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver 2007-10-10 20:58 [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes Lee Schermerhorn @ 2007-10-10 20:58 ` Lee Schermerhorn 2007-10-10 21:12 ` Christoph Lameter 2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn 1 sibling, 1 reply; 14+ messages in thread From: Lee Schermerhorn @ 2007-10-10 20:58 UTC (permalink / raw) To: akpm; +Cc: ak, clameter, gregkh, linux-mm, mel, eric.whitney PATCH 1/2 Fix memory policy usage in pci driver Against: 2.6.23-rc8-mm2 In an attempt to ensure memory allocation from the proper node, the pci driver temporarily replaces the current task's memory policy with the system default policy. Trying to be a good citizen, the driver then call's mpol_get() on the new policy. When it's finished probing, it undoes the '_get by calling mpol_free() [on the system default policy] and then restores the current task's saved mempolicy. A couple of issues here: 1) it's never necessary to set a task's mempolicy to the system default policy in order to get system default allocation behavior. Simply set the current task's mempolicy to NULL and allocations will fall back to system default policy. 2) we should never [need to] call mpol_free() on the system default policy. [I plan on trapping this with a VM_BUG_ON() in a subsequent patch.] This patch removes the calls to mpol_get() and mpol_free() and uses NULL for the temporary task mempolicy to effect default allocation behavior. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> drivers/pci/pci-driver.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) Index: Linux/drivers/pci/pci-driver.c =================================================================== --- Linux.orig/drivers/pci/pci-driver.c 2007-10-09 14:31:57.000000000 -0400 +++ Linux/drivers/pci/pci-driver.c 2007-10-09 14:43:57.000000000 -0400 @@ -177,13 +177,11 @@ static int pci_call_probe(struct pci_dri set_cpus_allowed(current, node_to_cpumask(node)); /* And set default memory allocation policy */ oldpol = current->mempolicy; - current->mempolicy = &default_policy; - mpol_get(current->mempolicy); + current->mempolicy = NULL; /* fall back to system default policy */ #endif error = drv->probe(dev, id); #ifdef CONFIG_NUMA set_cpus_allowed(current, oldmask); - mpol_free(current->mempolicy); current->mempolicy = oldpol; #endif return error; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver 2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn @ 2007-10-10 21:12 ` Christoph Lameter 2007-10-11 19:11 ` [PATCH " Lee Schermerhorn 0 siblings, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2007-10-10 21:12 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: akpm, ak, gregkh, linux-mm, mel, eric.whitney -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] Mem Policy: fix mempolicy usage in pci driver 2007-10-10 21:12 ` Christoph Lameter @ 2007-10-11 19:11 ` Lee Schermerhorn 2007-10-12 1:43 ` Christoph Lameter 2007-10-15 11:19 ` [PATCH 1/2] " Mel Gorman 0 siblings, 2 replies; 14+ messages in thread From: Lee Schermerhorn @ 2007-10-11 19:11 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, gregkh, linux-mm, mel, eric.whitney, Christoph Lameter On Wed, 2007-10-10 at 14:12 -0700, Christoph Lameter wrote: > Acked-by: Christoph Lameter <clameter@sgi.com> > Resend with 'RFC' removed. Please review/consider for merge. Note: this is required BEFORE patch 2 of this series to avoid hitting the [VM_]BUG_ON()s added by the second patch. Lee ==== PATCH 1/2 Fix memory policy usage in pci driver Against: 2.6.23-rc8-mm2 In an attempt to ensure memory allocation from the local node, the pci driver temporarily replaces the current task's memory policy with the system default policy. Trying to be a good citizen, the driver then call's mpol_get() on the new policy. When it's finished probing, it undoes the '_get by calling mpol_free() [on the system default policy] and then restores the current task's saved mempolicy. A couple of issues here: 1) it's never necessary to set a task's mempolicy to the system default policy in order to get system default allocation behavior. Simply set the current task's mempolicy to NULL and allocations will fall back to system default policy. 2) we should never [need to] call mpol_free() on the system default policy. [I plan on trapping this with a VM_BUG_ON() in a subsequent patch.] This patch removes the calls to mpol_get() and mpol_free() and uses NULL for the temporary task mempolicy to effect default allocation behavior. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> drivers/pci/pci-driver.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) Index: Linux/drivers/pci/pci-driver.c =================================================================== --- Linux.orig/drivers/pci/pci-driver.c 2007-10-09 14:31:57.000000000 -0400 +++ Linux/drivers/pci/pci-driver.c 2007-10-09 14:43:57.000000000 -0400 @@ -177,13 +177,11 @@ static int pci_call_probe(struct pci_dri set_cpus_allowed(current, node_to_cpumask(node)); /* And set default memory allocation policy */ oldpol = current->mempolicy; - current->mempolicy = &default_policy; - mpol_get(current->mempolicy); + current->mempolicy = NULL; /* fall back to system default policy */ #endif error = drv->probe(dev, id); #ifdef CONFIG_NUMA set_cpus_allowed(current, oldmask); - mpol_free(current->mempolicy); current->mempolicy = oldpol; #endif return error; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] Mem Policy: fix mempolicy usage in pci driver 2007-10-11 19:11 ` [PATCH " Lee Schermerhorn @ 2007-10-12 1:43 ` Christoph Lameter 2007-11-06 18:09 ` [PATCH ] " Lee Schermerhorn 2007-10-15 11:19 ` [PATCH 1/2] " Mel Gorman 1 sibling, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2007-10-12 1:43 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: Andrew Morton, ak, gregkh, linux-mm, mel, eric.whitney -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH ] Mem Policy: fix mempolicy usage in pci driver 2007-10-12 1:43 ` Christoph Lameter @ 2007-11-06 18:09 ` Lee Schermerhorn 0 siblings, 0 replies; 14+ messages in thread From: Lee Schermerhorn @ 2007-11-06 18:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Christoph Lameter, ak, gregkh, linux-mm, mel, eric.whitney I see that you're starting to build up the next -mm tree. Would you please apply the following patch? I posted it earlier [twice] and Christoph Ack'd it [twice]. Haven't heard any objections. Lee -------------- PATCH Mem Policy: Fix memory policy usage in pci driver Against: 2.6.23-mm1 In an attempt to ensure memory allocation from the local node, the pci driver temporarily replaces the current task's memory policy with the system default policy. Trying to be a good citizen, the driver then call's mpol_get() on the new policy. When it's finished probing, it undoes the '_get by calling mpol_free() [on the system default policy] and then restores the current task's saved mempolicy. A couple of issues here: 1) it's never necessary to set a task's mempolicy to the system default policy in order to get system default allocation behavior. Simply set the current task's mempolicy to NULL and allocations will fall back to system default policy. 2) we should never [need to] call mpol_free() on the system default policy. [I plan on trapping this with a VM_BUG_ON() in a subsequent patch.] This patch removes the calls to mpol_get() and mpol_free() and uses NULL for the temporary task mempolicy to effect default allocation behavior. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Twice: Acked-by: Christoph Lameter <clameter@sgi.com> drivers/pci/pci-driver.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) Index: Linux/drivers/pci/pci-driver.c =================================================================== --- Linux.orig/drivers/pci/pci-driver.c 2007-10-09 14:31:57.000000000 -0400 +++ Linux/drivers/pci/pci-driver.c 2007-10-09 14:43:57.000000000 -0400 @@ -177,13 +177,11 @@ static int pci_call_probe(struct pci_dri set_cpus_allowed(current, node_to_cpumask(node)); /* And set default memory allocation policy */ oldpol = current->mempolicy; - current->mempolicy = &default_policy; - mpol_get(current->mempolicy); + current->mempolicy = NULL; /* fall back to system default policy */ #endif error = drv->probe(dev, id); #ifdef CONFIG_NUMA set_cpus_allowed(current, oldmask); - mpol_free(current->mempolicy); current->mempolicy = oldpol; #endif return error; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] Mem Policy: fix mempolicy usage in pci driver 2007-10-11 19:11 ` [PATCH " Lee Schermerhorn 2007-10-12 1:43 ` Christoph Lameter @ 2007-10-15 11:19 ` Mel Gorman 1 sibling, 0 replies; 14+ messages in thread From: Mel Gorman @ 2007-10-15 11:19 UTC (permalink / raw) To: Lee Schermerhorn Cc: Andrew Morton, ak, gregkh, linux-mm, eric.whitney, Christoph Lameter On (11/10/07 15:11), Lee Schermerhorn didst pronounce: > On Wed, 2007-10-10 at 14:12 -0700, Christoph Lameter wrote: > > Acked-by: Christoph Lameter <clameter@sgi.com> > > > > Resend with 'RFC' removed. Please review/consider for merge. > > Note: this is required BEFORE patch 2 of this series to avoid hitting > the [VM_]BUG_ON()s added by the second patch. > > Lee > > ==== > PATCH 1/2 Fix memory policy usage in pci driver > > Against: 2.6.23-rc8-mm2 > > In an attempt to ensure memory allocation from the local node, > the pci driver temporarily replaces the current task's memory > policy with the system default policy. Trying to be a good > citizen, the driver then call's mpol_get() on the new policy. > When it's finished probing, it undoes the '_get by calling > mpol_free() [on the system default policy] and then restores > the current task's saved mempolicy. > > A couple of issues here: > > 1) it's never necessary to set a task's mempolicy to the > system default policy in order to get system default > allocation behavior. Simply set the current task's > mempolicy to NULL and allocations will fall back to > system default policy. > > 2) we should never [need to] call mpol_free() on the system > default policy. [I plan on trapping this with a VM_BUG_ON() > in a subsequent patch.] > > This patch removes the calls to mpol_get() and mpol_free() > and uses NULL for the temporary task mempolicy to effect > default allocation behavior. > > Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> > > drivers/pci/pci-driver.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > Index: Linux/drivers/pci/pci-driver.c > =================================================================== > --- Linux.orig/drivers/pci/pci-driver.c 2007-10-09 14:31:57.000000000 > -0400 > +++ Linux/drivers/pci/pci-driver.c 2007-10-09 14:43:57.000000000 -0400 > @@ -177,13 +177,11 @@ static int pci_call_probe(struct pci_dri > set_cpus_allowed(current, node_to_cpumask(node)); > /* And set default memory allocation policy */ > oldpol = current->mempolicy; > - current->mempolicy = &default_policy; > - mpol_get(current->mempolicy); > + current->mempolicy = NULL; /* fall back to system default policy */ > #endif > error = drv->probe(dev, id); > #ifdef CONFIG_NUMA > set_cpus_allowed(current, oldmask); > - mpol_free(current->mempolicy); > current->mempolicy = oldpol; > #endif > return error; > > -- -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting 2007-10-10 20:58 [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes Lee Schermerhorn 2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn @ 2007-10-10 20:58 ` Lee Schermerhorn 2007-10-10 21:22 ` Christoph Lameter 1 sibling, 1 reply; 14+ messages in thread From: Lee Schermerhorn @ 2007-10-10 20:58 UTC (permalink / raw) To: akpm; +Cc: ak, clameter, gregkh, linux-mm, mel, eric.whitney PATCH 2/2 Mempolicy: Fixup Shm and Interleave Policy Reference Counting Against: 2.6.23-rc8-mm2 * In the memory policy reference counting cleanup patch, I missed one path that needs to unreference the memory policy. After computing the target node for interleave policy, we need to drop the reference if the policy is not the system default nor the current task's policy. * In huge_zonelist(), I was unconditionally unref'ing the policy in the interleave path, even when it was a policy that didn't need it. Fix this! Note: I investigated moving the check for "policy_needs_unref" to the mpol_free() wrapper, but this led to nasty circular header dependencies. If we wanted to make mpol_free() an external function, rather than a static inline, I could do this and remove several checks. I'd still need to keep an explicit check in alloc_page_vma() if we want to use a tail-call for the fast path. * get_vma_policy() assumes that shared policies are referenced by the get_policy() vm_op, if any. This is true for shmem_get_policy() but not for shm_get_policy() when the "backing file" does not support a get_policy() vm_op. The latter is the case for SHM_HUGETLB segments. Because get_vma_policy() expects the get_policy() op to have added a ref, it doesn't do so itself. This results in premature freeing of the policy. Add the mpol_get() to the shm_get_policy() op when the backing file doesn't support shared policies. * Further, shm_get_policy() was falling back to current task's task policy if the backing file did not support get_policy() vm_op and the vma policy was null. This is not valid when get_vma_policy() is called from show_numa_map() as task != current. Also, this did not match the behavior of the shmem_get_policy() vm_op which did NOT fall back to task policy. So, modify shm_get_policy() NOT to fall back to current->mempolicy. * Now, turns out that get_vma_policy() was not handling fallback to task policy correctly when the get_policy() vm_op returns NULL. Rather, it was falling back directly to system default policy. So, fix get_vma_policy() to use only non-NULL policy returned from the vma get_policy op and indicate that this policy does not need another ref count. * Document mempolicy return value reference semantics assumed by the changes discussed above for the set_ and get_policy vm_ops in <linux/mm.h>--where the prototypes are defined. * Add VM_BUG_ON()s to __mpol_free() and mpol_get() to trap attempts to ref/unref the system default policy. Should no longer occur. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> include/linux/mempolicy.h | 5 ++++- include/linux/mm.h | 14 ++++++++++++++ ipc/shm.c | 6 +++--- mm/mempolicy.c | 32 ++++++++++++++++++++++++-------- 4 files changed, 45 insertions(+), 12 deletions(-) Index: Linux/mm/mempolicy.c =================================================================== --- Linux.orig/mm/mempolicy.c 2007-10-10 13:36:44.000000000 -0400 +++ Linux/mm/mempolicy.c 2007-10-10 14:25:52.000000000 -0400 @@ -1112,19 +1112,25 @@ static struct mempolicy * get_vma_policy struct vm_area_struct *vma, unsigned long addr) { struct mempolicy *pol = task->mempolicy; - int shared_pol = 0; + int pol_needs_ref = (task != current); if (vma) { if (vma->vm_ops && vma->vm_ops->get_policy) { - pol = vma->vm_ops->get_policy(vma, addr); - shared_pol = 1; /* if pol non-NULL, add ref below */ + struct mempolicy *vpol = vma->vm_ops->get_policy(vma, + addr); + if (vpol) { + pol = vpol; + pol_needs_ref = 0; /* get_policy() added ref */ + } } else if (vma->vm_policy && - vma->vm_policy->policy != MPOL_DEFAULT) + vma->vm_policy->policy != MPOL_DEFAULT) { pol = vma->vm_policy; + pol_needs_ref++; + } } if (!pol) pol = &default_policy; - else if (!shared_pol && pol != current->mempolicy) + else if (pol_needs_ref) mpol_get(pol); /* vma or other task's policy */ return pol; } @@ -1262,18 +1268,21 @@ struct zonelist *huge_zonelist(struct vm { struct mempolicy *pol = get_vma_policy(current, vma, addr); struct zonelist *zl; + int policy_needs_unref = (pol != &default_policy && \ + pol != current->mempolicy); *mpol = NULL; /* probably no unref needed */ if (pol->policy == MPOL_INTERLEAVE) { unsigned nid; nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT); - __mpol_free(pol); /* finished with pol */ + if (unlikely(policy_needs_unref)) + __mpol_free(pol); /* finished with pol */ return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags); } zl = zonelist_policy(GFP_HIGHUSER, pol); - if (unlikely(pol != &default_policy && pol != current->mempolicy)) { + if (unlikely(policy_needs_unref)) { if (pol->policy != MPOL_BIND) __mpol_free(pol); /* finished with pol */ else @@ -1325,6 +1334,9 @@ alloc_page_vma(gfp_t gfp, struct vm_area { struct mempolicy *pol = get_vma_policy(current, vma, addr); struct zonelist *zl; + int policy_needs_unref = (pol != &default_policy && \ + pol != current->mempolicy); + cpuset_update_task_memory_state(); @@ -1332,10 +1344,12 @@ alloc_page_vma(gfp_t gfp, struct vm_area unsigned nid; nid = interleave_nid(pol, vma, addr, PAGE_SHIFT); + if (unlikely(policy_needs_unref)) + __mpol_free(pol); return alloc_page_interleave(gfp, 0, nid); } zl = zonelist_policy(gfp, pol); - if (pol != &default_policy && pol != current->mempolicy) { + if (unlikely(policy_needs_unref)) { /* * slow path: ref counted policy -- shared or vma */ @@ -1444,6 +1458,8 @@ int __mpol_equal(struct mempolicy *a, st /* Slow path of a mpol destructor. */ void __mpol_free(struct mempolicy *p) { + VM_BUG_ON(p == &default_policy); + if (!atomic_dec_and_test(&p->refcnt)) return; if (p->policy == MPOL_BIND) Index: Linux/ipc/shm.c =================================================================== --- Linux.orig/ipc/shm.c 2007-10-10 13:36:44.000000000 -0400 +++ Linux/ipc/shm.c 2007-10-10 14:21:59.000000000 -0400 @@ -263,10 +263,10 @@ static struct mempolicy *shm_get_policy( if (sfd->vm_ops->get_policy) pol = sfd->vm_ops->get_policy(vma, addr); - else if (vma->vm_policy) + else if (vma->vm_policy) { pol = vma->vm_policy; - else - pol = current->mempolicy; + mpol_get(pol); /* get_vma_policy() assumes this */ + } return pol; } #endif Index: Linux/include/linux/mempolicy.h =================================================================== --- Linux.orig/include/linux/mempolicy.h 2007-10-10 13:36:44.000000000 -0400 +++ Linux/include/linux/mempolicy.h 2007-10-10 14:20:28.000000000 -0400 @@ -2,6 +2,7 @@ #define _LINUX_MEMPOLICY_H 1 #include <linux/errno.h> +#include <linux/mm.h> /* * NUMA memory policies for Linux. @@ -72,6 +73,8 @@ struct mempolicy { nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */ }; +extern struct mempolicy default_policy; + /* * Support for managing mempolicy data objects (clone, copy, destroy) * The default fast path of a NULL MPOL_DEFAULT policy is always inlined. @@ -97,6 +100,7 @@ static inline struct mempolicy *mpol_cop static inline void mpol_get(struct mempolicy *pol) { + VM_BUG_ON(pol == &default_policy); if (pol) atomic_inc(&pol->refcnt); } @@ -149,7 +153,6 @@ extern void mpol_rebind_task(struct task extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new); extern void mpol_fix_fork_child_flag(struct task_struct *p); -extern struct mempolicy default_policy; extern struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol); extern unsigned slab_node(struct mempolicy *policy); Index: Linux/include/linux/mm.h =================================================================== --- Linux.orig/include/linux/mm.h 2007-10-10 13:36:44.000000000 -0400 +++ Linux/include/linux/mm.h 2007-10-10 14:20:28.000000000 -0400 @@ -173,7 +173,21 @@ struct vm_operations_struct { * writable, if an error is returned it will cause a SIGBUS */ int (*page_mkwrite)(struct vm_area_struct *vma, struct page *page); #ifdef CONFIG_NUMA + /* + * set_policy() op must add a reference to any non-NULL @new mempolicy + * to hold the policy upon return. Caller should pass NULL @new to + * remove a policy and fall back to surrounding context--i.e. do not + * install a MPOL_DEFAULT policy, nor the task or system default + * mempolicy. + */ int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new); + + /* + * get_policy() op must add reference [mpol_get()] to any mempolicy + * at (vma,addr). If no [shared/vma] mempolicy exists at that addr, + * get_policy() op must return NULL--i.e., do not "fallback" to task + * or system default policy. + */ struct mempolicy *(*get_policy)(struct vm_area_struct *vma, unsigned long addr); int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting 2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn @ 2007-10-10 21:22 ` Christoph Lameter 2007-10-11 13:41 ` Lee Schermerhorn 2007-10-11 19:07 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Lee Schermerhorn 0 siblings, 2 replies; 14+ messages in thread From: Christoph Lameter @ 2007-10-10 21:22 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: akpm, ak, gregkh, linux-mm, mel, eric.whitney On Wed, 10 Oct 2007, Lee Schermerhorn wrote: > * get_vma_policy() assumes that shared policies are referenced by > the get_policy() vm_op, if any. This is true for shmem_get_policy() > but not for shm_get_policy() when the "backing file" does not > support a get_policy() vm_op. The latter is the case for SHM_HUGETLB > segments. Because get_vma_policy() expects the get_policy() op to > have added a ref, it doesn't do so itself. This results in > premature freeing of the policy. Add the mpol_get() to the > shm_get_policy() op when the backing file doesn't support shared > policies. Could you add support for SHM_HUGETLB segments instead to make this consistent so that shared policies always use a get_policy function? > * Further, shm_get_policy() was falling back to current task's task > policy if the backing file did not support get_policy() vm_op and > the vma policy was null. This is not valid when get_vma_policy() is > called from show_numa_map() as task != current. Also, this did > not match the behavior of the shmem_get_policy() vm_op which did > NOT fall back to task policy. So, modify shm_get_policy() NOT to > fall back to current->mempolicy. Hmmm..... The show_numa_map issue is special. Maybe fix that one instead? > Index: Linux/include/linux/mempolicy.h > =================================================================== > --- Linux.orig/include/linux/mempolicy.h 2007-10-10 13:36:44.000000000 -0400 > +++ Linux/include/linux/mempolicy.h 2007-10-10 14:20:28.000000000 -0400 > @@ -2,6 +2,7 @@ > #define _LINUX_MEMPOLICY_H 1 > > #include <linux/errno.h> > +#include <linux/mm.h> I think we tried to avoid a heavy include here. mm.h is huge and draws in lots of other include files. The additional include is only needed for the VM_BUG_ON it seems? BUG_ON would also work. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting 2007-10-10 21:22 ` Christoph Lameter @ 2007-10-11 13:41 ` Lee Schermerhorn 2007-10-11 19:07 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Lee Schermerhorn 1 sibling, 0 replies; 14+ messages in thread From: Lee Schermerhorn @ 2007-10-11 13:41 UTC (permalink / raw) To: Christoph Lameter; +Cc: akpm, ak, gregkh, linux-mm, mel, eric.whitney On Wed, 2007-10-10 at 14:22 -0700, Christoph Lameter wrote: > On Wed, 10 Oct 2007, Lee Schermerhorn wrote: > > > * get_vma_policy() assumes that shared policies are referenced by > > the get_policy() vm_op, if any. This is true for shmem_get_policy() > > but not for shm_get_policy() when the "backing file" does not > > support a get_policy() vm_op. The latter is the case for SHM_HUGETLB > > segments. Because get_vma_policy() expects the get_policy() op to > > have added a ref, it doesn't do so itself. This results in > > premature freeing of the policy. Add the mpol_get() to the > > shm_get_policy() op when the backing file doesn't support shared > > policies. > > Could you add support for SHM_HUGETLB segments instead to make this > consistent so that shared policies always use a get_policy function? I have patches that do this as part of my shared policy series that is currently "on hold" while we sort these other things out. SHM_HUGETLB segments do use the shm_get_policy() vm_op. However, it detects that the hugetlb shm segment does not support get_policy(), so it just uses the vma policy at that address. You should like this behavior! :-). My patches implement shared policy for SHM_HUGETLB, which you don't like. So, I think we should leave this as is... for now. > > > * Further, shm_get_policy() was falling back to current task's task > > policy if the backing file did not support get_policy() vm_op and > > the vma policy was null. This is not valid when get_vma_policy() is > > called from show_numa_map() as task != current. Also, this did > > not match the behavior of the shmem_get_policy() vm_op which did > > NOT fall back to task policy. So, modify shm_get_policy() NOT to > > fall back to current->mempolicy. > > Hmmm..... The show_numa_map issue is special. Maybe fix that one instead? > > > Index: Linux/include/linux/mempolicy.h > > =================================================================== > > --- Linux.orig/include/linux/mempolicy.h 2007-10-10 13:36:44.000000000 -0400 > > +++ Linux/include/linux/mempolicy.h 2007-10-10 14:20:28.000000000 -0400 > > @@ -2,6 +2,7 @@ > > #define _LINUX_MEMPOLICY_H 1 > > > > #include <linux/errno.h> > > +#include <linux/mm.h> > > I think we tried to avoid a heavy include here. mm.h is huge and draws in > lots of other include files. The additional include is only needed for the > VM_BUG_ON it seems? BUG_ON would also work. Yeah, I know. However, I like the idea of having a separately configurable VM debug check. I will remove the include and the VM_BUG_ON for now. But, what would [any one else?] think about moving VM_BUG_ON() to asm-generic/bug.h in a separate patch? Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 2007-10-10 21:22 ` Christoph Lameter 2007-10-11 13:41 ` Lee Schermerhorn @ 2007-10-11 19:07 ` Lee Schermerhorn 2007-10-12 1:42 ` Christoph Lameter 1 sibling, 1 reply; 14+ messages in thread From: Lee Schermerhorn @ 2007-10-11 19:07 UTC (permalink / raw) To: Andrew Morton; +Cc: ak, gregkh, linux-mm, mel, eric.whitney, Christoph Lameter Here's V2 of the second patch of the series, modified as suggested by Christoph vis a vis include of <linux/mm.h> in mempolicy.h. I did not change the hugetlbfs to add *_policy() ops as I think the current mechanisms work as required now. I.e., even if I added the policy vm_ops to hugetlbfs, I think the changes below to mempolicy and shm_get_policy() would still be required in the general case. I have removed the 'RFC'. Please review for possible merge. Lee ==== PATCH 2/2 Mempolicy: Fixup Shm and Interleave Policy Reference Counting Against: 2.6.23-rc8-mm2 V1 -> V2: + remove include of <linux/mm.h> from mempolicy.h and use BUG_ON(), conditional on CONFIG_DEBUG_VM, in mpol_get() In the memory policy reference counting cleanup patch, I missed one path that needs to unreference the memory policy. After computing the target node for interleave policy, we need to drop the reference if the policy is not the system default nor the current task's policy. In huge_zonelist(), I was unconditionally unref'ing the policy in the interleave path, even when it was a policy that didn't need it. Fix this! Note: I investigated moving the check for "policy_needs_unref" to the mpol_free() wrapper, but this led to nasty circular header dependencies. If we wanted to make mpol_free() an external function, rather than a static inline, I could do this and remove several checks. I'd still need to keep an explicit check in alloc_page_vma() if we want to use a tail-call for the fast path. get_vma_policy() assumes that shared policies are referenced by the get_policy() vm_op, if any. This is true for shmem_get_policy() but not for shm_get_policy() when the "backing file" does not support a get_policy() vm_op. The latter is the case for SHM_HUGETLB segments. Because get_vma_policy() expects the get_policy() op to have added a ref, it doesn't do so itself. This results in premature freeing of the policy. Add the mpol_get() to the shm_get_policy() op when the backing file doesn't support shared policies. Further, shm_get_policy() was falling back to current task's task policy if the backing file did not support get_policy() vm_op and the vma policy was null. This is not valid when get_vma_policy() is called from show_numa_map() as task != current. Also, this did not match the behavior of the shmem_get_policy() vm_op which did NOT fall back to task policy. So, modify shm_get_policy() NOT to fall back to current->mempolicy. Now, turns out that get_vma_policy() was not handling fallback to task policy correctly when the get_policy() vm_op returns NULL. Rather, it was falling back directly to system default policy. So, fix get_vma_policy() to use only non-NULL policy returned from the vma get_policy op and indicate that this policy does not need another ref count. Add [VM_]BUG_ON()s to __mpol_free() and mpol_get() to trap attempts to ref/unref the system default policy. Should no longer occur. Document mempolicy return value reference semantics assumed by the changes discussed above for the set_ and get_policy vm_ops in <linux/mm.h>--where the prototypes are defined. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> include/linux/mempolicy.h | 6 +++++- include/linux/mm.h | 14 ++++++++++++++ ipc/shm.c | 6 +++--- mm/mempolicy.c | 32 ++++++++++++++++++++++++-------- 4 files changed, 46 insertions(+), 12 deletions(-) Index: Linux/mm/mempolicy.c =================================================================== --- Linux.orig/mm/mempolicy.c 2007-10-10 14:58:12.000000000 -0400 +++ Linux/mm/mempolicy.c 2007-10-11 14:07:42.000000000 -0400 @@ -1112,19 +1112,25 @@ static struct mempolicy * get_vma_policy struct vm_area_struct *vma, unsigned long addr) { struct mempolicy *pol = task->mempolicy; - int shared_pol = 0; + int pol_needs_ref = (task != current); if (vma) { if (vma->vm_ops && vma->vm_ops->get_policy) { - pol = vma->vm_ops->get_policy(vma, addr); - shared_pol = 1; /* if pol non-NULL, add ref below */ + struct mempolicy *vpol = vma->vm_ops->get_policy(vma, + addr); + if (vpol) { + pol = vpol; + pol_needs_ref = 0; /* get_policy() added ref */ + } } else if (vma->vm_policy && - vma->vm_policy->policy != MPOL_DEFAULT) + vma->vm_policy->policy != MPOL_DEFAULT) { pol = vma->vm_policy; + pol_needs_ref++; + } } if (!pol) pol = &default_policy; - else if (!shared_pol && pol != current->mempolicy) + else if (pol_needs_ref) mpol_get(pol); /* vma or other task's policy */ return pol; } @@ -1262,18 +1268,21 @@ struct zonelist *huge_zonelist(struct vm { struct mempolicy *pol = get_vma_policy(current, vma, addr); struct zonelist *zl; + int policy_needs_unref = (pol != &default_policy && \ + pol != current->mempolicy); *mpol = NULL; /* probably no unref needed */ if (pol->policy == MPOL_INTERLEAVE) { unsigned nid; nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT); - __mpol_free(pol); /* finished with pol */ + if (unlikely(policy_needs_unref)) + __mpol_free(pol); /* finished with pol */ return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags); } zl = zonelist_policy(GFP_HIGHUSER, pol); - if (unlikely(pol != &default_policy && pol != current->mempolicy)) { + if (unlikely(policy_needs_unref)) { if (pol->policy != MPOL_BIND) __mpol_free(pol); /* finished with pol */ else @@ -1325,6 +1334,9 @@ alloc_page_vma(gfp_t gfp, struct vm_area { struct mempolicy *pol = get_vma_policy(current, vma, addr); struct zonelist *zl; + int policy_needs_unref = (pol != &default_policy && \ + pol != current->mempolicy); + cpuset_update_task_memory_state(); @@ -1332,10 +1344,12 @@ alloc_page_vma(gfp_t gfp, struct vm_area unsigned nid; nid = interleave_nid(pol, vma, addr, PAGE_SHIFT); + if (unlikely(policy_needs_unref)) + __mpol_free(pol); return alloc_page_interleave(gfp, 0, nid); } zl = zonelist_policy(gfp, pol); - if (pol != &default_policy && pol != current->mempolicy) { + if (unlikely(policy_needs_unref)) { /* * slow path: ref counted policy -- shared or vma */ @@ -1444,6 +1458,8 @@ int __mpol_equal(struct mempolicy *a, st /* Slow path of a mpol destructor. */ void __mpol_free(struct mempolicy *p) { + VM_BUG_ON(p == &default_policy); + if (!atomic_dec_and_test(&p->refcnt)) return; if (p->policy == MPOL_BIND) Index: Linux/ipc/shm.c =================================================================== --- Linux.orig/ipc/shm.c 2007-10-10 14:58:12.000000000 -0400 +++ Linux/ipc/shm.c 2007-10-10 14:59:13.000000000 -0400 @@ -263,10 +263,10 @@ static struct mempolicy *shm_get_policy( if (sfd->vm_ops->get_policy) pol = sfd->vm_ops->get_policy(vma, addr); - else if (vma->vm_policy) + else if (vma->vm_policy) { pol = vma->vm_policy; - else - pol = current->mempolicy; + mpol_get(pol); /* get_vma_policy() assumes this */ + } return pol; } #endif Index: Linux/include/linux/mempolicy.h =================================================================== --- Linux.orig/include/linux/mempolicy.h 2007-10-10 14:58:12.000000000 -0400 +++ Linux/include/linux/mempolicy.h 2007-10-11 14:09:19.000000000 -0400 @@ -72,6 +72,8 @@ struct mempolicy { nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */ }; +extern struct mempolicy default_policy; + /* * Support for managing mempolicy data objects (clone, copy, destroy) * The default fast path of a NULL MPOL_DEFAULT policy is always inlined. @@ -97,6 +99,9 @@ static inline struct mempolicy *mpol_cop static inline void mpol_get(struct mempolicy *pol) { +#ifdef CONFIG_DEBUG_VM + BUG_ON(pol == &default_policy); +#endif if (pol) atomic_inc(&pol->refcnt); } @@ -149,7 +154,6 @@ extern void mpol_rebind_task(struct task extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new); extern void mpol_fix_fork_child_flag(struct task_struct *p); -extern struct mempolicy default_policy; extern struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol); extern unsigned slab_node(struct mempolicy *policy); Index: Linux/include/linux/mm.h =================================================================== --- Linux.orig/include/linux/mm.h 2007-10-10 14:58:12.000000000 -0400 +++ Linux/include/linux/mm.h 2007-10-11 14:07:43.000000000 -0400 @@ -173,7 +173,21 @@ struct vm_operations_struct { * writable, if an error is returned it will cause a SIGBUS */ int (*page_mkwrite)(struct vm_area_struct *vma, struct page *page); #ifdef CONFIG_NUMA + /* + * set_policy() op must add a reference to any non-NULL @new mempolicy + * to hold the policy upon return. Caller should pass NULL @new to + * remove a policy and fall back to surrounding context--i.e. do not + * install a MPOL_DEFAULT policy, nor the task or system default + * mempolicy. + */ int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new); + + /* + * get_policy() op must add reference [mpol_get()] to any mempolicy + * at (vma,addr). If no [shared/vma] mempolicy exists at that addr, + * get_policy() op must return NULL--i.e., do not "fallback" to task + * or system default policy. + */ struct mempolicy *(*get_policy)(struct vm_area_struct *vma, unsigned long addr); int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 2007-10-11 19:07 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Lee Schermerhorn @ 2007-10-12 1:42 ` Christoph Lameter 2007-10-12 14:35 ` Lee Schermerhorn 0 siblings, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2007-10-12 1:42 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: Andrew Morton, ak, gregkh, linux-mm, mel, eric.whitney On Thu, 11 Oct 2007, Lee Schermerhorn wrote: > I have removed the 'RFC'. Please review for possible merge. I am still concerned with all this special casing which gets very difficult to follow. Isnt there some way to simplify the refcount handling here? It seems that the refcount fix introduced more bugs. One solution would be to revert that patch instead. > V1 -> V2: > + remove include of <linux/mm.h> from mempolicy.h and use > BUG_ON(), conditional on CONFIG_DEBUG_VM, in mpol_get() Drop the BUG_ON completely? If this is a bug fix release then lets keep this as minimal as possible. Could you make this a series of separate patches. Each for one issue? > get_vma_policy() assumes that shared policies are referenced by > the get_policy() vm_op, if any. This is true for shmem_get_policy() > but not for shm_get_policy() when the "backing file" does not > support a get_policy() vm_op. The latter is the case for SHM_HUGETLB > segments. Because get_vma_policy() expects the get_policy() op to > have added a ref, it doesn't do so itself. This results in > premature freeing of the policy. Add the mpol_get() to the > shm_get_policy() op when the backing file doesn't support shared > policies. Maybe get_vma_policy() should make no such assumption? Why is get_vma_policy taking a refcount at all? The vma policies are guaranteed based on the process that is running. But what keeps the shared policies from being freed? Isnt there an inherent race here that cannot be remedied by taking a refcount? > Further, shm_get_policy() was falling back to current task's task > policy if the backing file did not support get_policy() vm_op and > the vma policy was null. This is not valid when get_vma_policy() is > called from show_numa_map() as task != current. Also, this did > not match the behavior of the shmem_get_policy() vm_op which did > NOT fall back to task policy. So, modify shm_get_policy() NOT to > fall back to current->mempolicy. get_vma_policy() is passed a pointer to the task struct. It does *not* fall back to the current tasks policy. > Now, turns out that get_vma_policy() was not handling fallback to > task policy correctly when the get_policy() vm_op returns NULL. > Rather, it was falling back directly to system default policy. > So, fix get_vma_policy() to use only non-NULL policy returned from > the vma get_policy op and indicate that this policy does not need > another ref count. Nope. Its falling back to the task policy. static struct mempolicy * get_vma_policy(struct task_struct *task, struct vm_area_struct *vma, unsigned long addr) { --> struct mempolicy *pol = task->mempolicy; int shared_pol = 0; if (vma) { if (vma->vm_ops && vma->vm_ops->get_policy) { pol = vma->vm_ops->get_policy(vma, addr); shared_pol = 1; /* if pol non-NULL, add ref below */ } else if (vma->vm_policy && vma->vm_policy->policy != MPOL_DEFAULT) pol = vma->vm_policy; } if (!pol) pol = &default_policy; else if (!shared_pol && pol != current->mempolicy) mpol_get(pol); /* vma or other task's policy */ return pol; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 2007-10-12 1:42 ` Christoph Lameter @ 2007-10-12 14:35 ` Lee Schermerhorn 2007-10-12 17:27 ` Christoph Lameter 0 siblings, 1 reply; 14+ messages in thread From: Lee Schermerhorn @ 2007-10-12 14:35 UTC (permalink / raw) To: Christoph Lameter; +Cc: Andrew Morton, ak, gregkh, linux-mm, mel, eric.whitney On Thu, 2007-10-11 at 18:42 -0700, Christoph Lameter wrote: > On Thu, 11 Oct 2007, Lee Schermerhorn wrote: > > > I have removed the 'RFC'. Please review for possible merge. > > I am still concerned with all this special casing which gets very > difficult to follow. Isnt there some way to simplify the refcount handling > here? It seems that the refcount fix introduced more bugs. One solution > would be to revert that patch instead. I've tried to remove the special cases by making all [existing] get_policy ops consistent and then documenting the rules on the prototypes for any possible future ops. The basic rule is--don't fall back to task/sysdefault policy--let get_vma_policy() handle that. This is required to get the ref counting right. If you apply the patch and look at get_vma_policy, you'll see that it will only add a reference count for other task's policy or vma policy. Now, the policy returned by the get_policy() op is essentially a vma policy, but the policy op must add the ref itself 1) to prevent races with other tasks, in the case of shared policy or 2) to be consistent with #1 for SHM_HUGETLB segments which [currently] don't use shared policy. I try to avoid taking ref on current task's policy and sys default policy because 1) it's not necessary and 2) these should be the vast majority of the cases. In previous discussions with yourself and Andi [back in June/July] you agreed this was the way to go. All my recent changes have been an attempt to do this in the most consistent manner possible. > > > V1 -> V2: > > + remove include of <linux/mm.h> from mempolicy.h and use > > BUG_ON(), conditional on CONFIG_DEBUG_VM, in mpol_get() > > Drop the BUG_ON completely? If this is a bug fix release then lets keep > this as minimal as possible. I could, I guess, and just add it into my tree for testing. But, I fear that, in the future, someone might add some code like I removed from the pci driver and break some other assumptions. The bug on is only compiled when DEBUG_VM is configured. In that case, the overhead in mpol_get() is the least of the additional overhead! But if you REALLY don't like it [and I agree it's ugly, with the explicit #ifdef], I guess I can remove it. I'd sure like to hear other opinions, tho' > > Could you make this a series of separate patches. Each for one > issue? Yeah. I can do that. I'm outta here for a week+ after today and won't get to it until later in the month, as I have a lot of other things to tie off today. In the meantime, however, if anyone tries to apply a policy [mbind] to a SHM_HUGETLB segment, they will BUG-out on the 2nd page fault with the current upstream [2.6.23] code. Kind of serious I think... > > > get_vma_policy() assumes that shared policies are referenced by > > the get_policy() vm_op, if any. This is true for shmem_get_policy() > > but not for shm_get_policy() when the "backing file" does not > > support a get_policy() vm_op. The latter is the case for SHM_HUGETLB > > segments. Because get_vma_policy() expects the get_policy() op to > > have added a ref, it doesn't do so itself. This results in > > premature freeing of the policy. Add the mpol_get() to the > > shm_get_policy() op when the backing file doesn't support shared > > policies. > > Maybe get_vma_policy() should make no such assumption? Why is > get_vma_policy taking a refcount at all? The vma policies are guaranteed > based on the process that is running. But what keeps the shared > policies from being freed? Isnt there an inherent race here that cannot be > remedied by taking a refcount? > > > Further, shm_get_policy() was falling back to current task's task > > policy if the backing file did not support get_policy() vm_op and > > the vma policy was null. This is not valid when get_vma_policy() is > > called from show_numa_map() as task != current. Also, this did > > not match the behavior of the shmem_get_policy() vm_op which did > > NOT fall back to task policy. So, modify shm_get_policy() NOT to > > fall back to current->mempolicy. > > get_vma_policy() is passed a pointer to the task struct. It does *not* > fall back to the current tasks policy. > > > Now, turns out that get_vma_policy() was not handling fallback to > > task policy correctly when the get_policy() vm_op returns NULL. > > Rather, it was falling back directly to system default policy. > > So, fix get_vma_policy() to use only non-NULL policy returned from > > the vma get_policy op and indicate that this policy does not need > > another ref count. > > Nope. Its falling back to the task policy. But, the get_policy() vm_op can overwrite 'pol' with a NULL return value. This can happen when you have a real shmem segment with default policy == NULL/no policy. See below: > > static struct mempolicy * get_vma_policy(struct task_struct *task, > struct vm_area_struct *vma, unsigned long addr) > { > --> struct mempolicy *pol = task->mempolicy; > int shared_pol = 0; > > if (vma) { > if (vma->vm_ops && vma->vm_ops->get_policy) { > pol = vma->vm_ops->get_policy(vma, addr); ^^^ possibly NULL for shmem w/ default policy > shared_pol = 1; /* if pol non-NULL, add ref below */ > } else if (vma->vm_policy && > vma->vm_policy->policy != MPOL_DEFAULT) > pol = vma->vm_policy; > } > if (!pol) > pol = &default_policy; ^^^ could get here w/ NULL shmem policy and !NULL task policy. Incorrect fallback. > else if (!shared_pol && pol != current->mempolicy) > mpol_get(pol); /* vma or other task's policy */ > return pol; > } > Patch 2/2 clears all of this up. I think. I did test it, but could have missed something... again. Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 2007-10-12 14:35 ` Lee Schermerhorn @ 2007-10-12 17:27 ` Christoph Lameter 0 siblings, 0 replies; 14+ messages in thread From: Christoph Lameter @ 2007-10-12 17:27 UTC (permalink / raw) To: Lee Schermerhorn; +Cc: Andrew Morton, ak, gregkh, linux-mm, mel, eric.whitney On Fri, 12 Oct 2007, Lee Schermerhorn wrote: > In the meantime, however, if anyone tries to apply a policy [mbind] to a > SHM_HUGETLB segment, they will BUG-out on the 2nd page fault with the > current upstream [2.6.23] code. Kind of serious I think... And even after the fix they will have trouble with cpusets constraints that do not match the mpol bind set? > > Nope. Its falling back to the task policy. > > But, the get_policy() vm_op can overwrite 'pol' with a NULL return > value. This can happen when you have a real shmem segment with default > policy == NULL/no policy. See below: Ah. The logical fix then is to define an additional temporary variable and only overwrite the pol variable if a policy has been returned? That would be consistent with how the vma policies are handled? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-11-06 18:09 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-10 20:58 [PATCH/RFC 0/2] More Mempolicy Reference Counting Fixes Lee Schermerhorn 2007-10-10 20:58 ` [PATCH/RFC 1/2] Mem Policy: fix mempolicy usage in pci driver Lee Schermerhorn 2007-10-10 21:12 ` Christoph Lameter 2007-10-11 19:11 ` [PATCH " Lee Schermerhorn 2007-10-12 1:43 ` Christoph Lameter 2007-11-06 18:09 ` [PATCH ] " Lee Schermerhorn 2007-10-15 11:19 ` [PATCH 1/2] " Mel Gorman 2007-10-10 20:58 ` [PATCH/RFC 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting Lee Schermerhorn 2007-10-10 21:22 ` Christoph Lameter 2007-10-11 13:41 ` Lee Schermerhorn 2007-10-11 19:07 ` [PATCH 2/2] Mem Policy: Fixup Shm and Interleave Policy Reference Counting - V2 Lee Schermerhorn 2007-10-12 1:42 ` Christoph Lameter 2007-10-12 14:35 ` Lee Schermerhorn 2007-10-12 17:27 ` Christoph Lameter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).