All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, ak@suse.de,
	mtk-manpages@gmx.net, clameter@sgi.com, solo@google.com,
	eric.whitney@hp.com
Subject: Re: [PATCH/RFC 1/5] Mem Policy:  fix reference counting
Date: Tue, 11 Sep 2007 19:48:22 +0100	[thread overview]
Message-ID: <1189536502.32731.83.camel@localhost> (raw)
In-Reply-To: <20070830185100.22619.197.sendpatchset@localhost>

You know this stuff better than I do. Take suggestions here with a large
grain of salt.

On Thu, 2007-08-30 at 14:51 -0400, Lee Schermerhorn wrote:
> PATCHRFC  1/5 Memory Policy: fix reference counting
> 
> Against 2.6.23-rc3-mm1
> 
> This patch proposes fixes to the reference counting of memory policy
> in the page allocation paths and in show_numa_map().
> 
> Shared policy lookup [shmem] has always added a reference to the
> policy, but this was never unrefed after page allocation or after
> formatting the numa map data.  
> 
> Default system policy should not require additional ref counting,
> nor should the current task's task policy.  However, show_numa_map()
> calls get_vma_policy() to examine what may be [likely is] another
> task's policy.  The latter case needs protection against freeing
> of the policy.
> 
> This patch adds a reference count to a mempolicy returned by
> get_vma_policy() when the policy is a vma policy or another
> task's mempolicy.  Again, shared policy is already reference
> counted on lookup.  A matching "unref" [__mpol_free()] is performed
> in alloc_page_vma() for shared and vma policies, and in
> show_numa_map() for shared and another task's mempolicy.
> We can call __mpol_free() directly, saving an admittedly
> inexpensive inline NULL test, because we know we have a non-NULL
> policy.
> 
> Handling policy ref counts for hugepages is a bit tricker.
> huge_zonelist() returns a zone list that might come from a 
> shared or vma 'BIND policy.  In this case, we should hold the
> reference until after the huge page allocation in 
> dequeue_hugepage().  The patch modifies huge_zonelist() to
> return a pointer to the mempolicy if it needs to be unref'd
> after allocation.
> 
> Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  include/linux/mempolicy.h |    4 +-
>  mm/hugetlb.c              |    4 +-
>  mm/mempolicy.c            |   79 ++++++++++++++++++++++++++++++++++++++++------
>  3 files changed, 75 insertions(+), 12 deletions(-)
> 
> Index: Linux/mm/mempolicy.c
> ===================================================================
> --- Linux.orig/mm/mempolicy.c	2007-08-29 10:05:19.000000000 -0400
> +++ Linux/mm/mempolicy.c	2007-08-29 13:31:42.000000000 -0400
> @@ -1083,21 +1083,37 @@ asmlinkage long compat_sys_mbind(compat_
>  
>  #endif
>  
> -/* Return effective policy for a VMA */
> +/*
> + * get_vma_policy(@task, @vma, @addr)
> + * @task - task for fallback if vma policy == default
> + * @vma   - virtual memory area whose policy is sought
> + * @addr  - address in @vma for shared policy lookup
> + *
> + * Returns effective policy for a VMA at specified address.
> + * Falls back to @task or system default policy, as necessary.
> + * Returned policy has extra reference count if shared, vma,
> + * or some other task's policy [show_numa_maps() can pass
> + * @task != current].  It is the caller's responsibility to
> + * free the reference in these cases.
> + */
>  static struct mempolicy * get_vma_policy(struct task_struct *task,
>  		struct vm_area_struct *vma, unsigned long addr)
>  {
>  	struct mempolicy *pol = task->mempolicy;
> +	int shared_pol = 0;
>  
>  	if (vma) {
> -		if (vma->vm_ops && vma->vm_ops->get_policy)
> +		if (vma->vm_ops && vma->vm_ops->get_policy) {
>  			pol = vma->vm_ops->get_policy(vma, addr);
> -		else if (vma->vm_policy &&
> +			shared_pol = 1;	/* if pol non-NULL, that is */

What do you mean here by "pol non-NULL, that is". Where do you check
that vm_ops->get_policy() returned a non-NULL value?

Should the comment be 

/* Policy if set is shared, check later */

and rename the variable to check_shared_pol?

> +		} else if (vma->vm_policy &&
>  				vma->vm_policy->policy != MPOL_DEFAULT)
>  			pol = vma->vm_policy;
>  	}
>  	if (!pol)
>  		pol = &default_policy;
> +	else if (!shared_pol && pol != current->mempolicy)
> +		mpol_get(pol);	/* vma or other task's policy */
>  	return pol;
>  }
>  
> @@ -1213,19 +1229,45 @@ static inline unsigned interleave_nid(st
>  }
>  
>  #ifdef CONFIG_HUGETLBFS
> -/* Return a zonelist suitable for a huge page allocation. */
> +/*
> + * huge_zonelist(@vma, @addr, @gfp_flags, @mpol)
> + * @vma = virtual memory area whose policy is sought
> + * @addr = address in @vma for shared policy lookup and interleave policy
> + * @gfp_flags = for requested zone
> + * @mpol = pointer to mempolicy pointer for reference counted 'BIND policy
> + *
> + * Returns a zonelist suitable for a huge page allocation.
> + * If the effective policy is 'BIND, returns pointer to policy's zonelist.

This comment here becomes redundant if applied on top of one-zonelist as
you suggest you will be doing later. The zonelist returned for MPOL_BIND
is the nodes zonelist but it is filtered based on a nodemask.

> + * If it is also a policy for which get_vma_policy() returns an extra
> + * reference, we must hold that reference until after allocation.
> + * In that case, return policy via @mpol so hugetlb allocation can drop
> + * the reference.  For non-'BIND referenced policies, we can/do drop the
> + * reference here, so the caller doesn't need to know about the special case
> + * for default and current task policy.
> + */
>  struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
> -							gfp_t gfp_flags)
> +				gfp_t gfp_flags, struct mempolicy **mpol)
>  {
>  	struct mempolicy *pol = get_vma_policy(current, vma, addr);
> +	struct zonelist *zl;
>  
> +	*mpol = NULL;		/* probably no unref needed */
>  	if (pol->policy == MPOL_INTERLEAVE) {
>  		unsigned nid;
>  
>  		nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
> +		__mpol_free(pol);

So, __mpol_free() here acts as a put on the get_vma_policy() right?
Either that needs commenting or __mpol_free() needs to be renamed to
__mpol_put() assuming that when the count reaches 0, it really gets
free.

>  		return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags);
>  	}
> -	return zonelist_policy(GFP_HIGHUSER, pol);
> +
> +	zl = zonelist_policy(GFP_HIGHUSER, pol);
> +	if (unlikely(pol != &default_policy && pol != current->mempolicy)) {
> +		if (pol->policy != MPOL_BIND)
> +			__mpol_free(pol);	/* finished with pol */
> +		else
> +			*mpol = pol;	/* unref needed after allocation */
> +	}
> +	return zl;
>  }
>  #endif
>  
> @@ -1270,6 +1312,7 @@ struct page *
>  alloc_page_vma(gfp_t gfp, struct vm_area_struct *vma, unsigned long addr)
>  {
>  	struct mempolicy *pol = get_vma_policy(current, vma, addr);
> +	struct zonelist *zl;
>  
>  	cpuset_update_task_memory_state();
>  
> @@ -1279,7 +1322,19 @@ alloc_page_vma(gfp_t gfp, struct vm_area
>  		nid = interleave_nid(pol, vma, addr, PAGE_SHIFT);
>  		return alloc_page_interleave(gfp, 0, nid);
>  	}
> -	return __alloc_pages(gfp, 0, zonelist_policy(gfp, pol));
> +	zl = zonelist_policy(gfp, pol);
> +	if (pol != &default_policy && pol != current->mempolicy) {
> +		/*
> +		 * slow path: ref counted policy -- shared or vma
> +		 */
> +		struct page *page =  __alloc_pages(gfp, 0, zl);
> +		__mpol_free(pol);
> +		return page;
> +	}
> +	/*
> +	 * fast path:  default or task policy
> +	 */
> +	return __alloc_pages(gfp, 0, zl);
>  }
>  
>  /**
> @@ -1878,6 +1933,7 @@ int show_numa_map(struct seq_file *m, vo
>  	struct numa_maps *md;
>  	struct file *file = vma->vm_file;
>  	struct mm_struct *mm = vma->vm_mm;
> +	struct mempolicy *pol;
>  	int n;
>  	char buffer[50];
>  
> @@ -1888,8 +1944,13 @@ int show_numa_map(struct seq_file *m, vo
>  	if (!md)
>  		return 0;
>  
> -	mpol_to_str(buffer, sizeof(buffer),
> -			    get_vma_policy(priv->task, vma, vma->vm_start));
> +	pol = get_vma_policy(priv->task, vma, vma->vm_start);
> +	mpol_to_str(buffer, sizeof(buffer), pol);
> +	/*
> +	 * unref shared or other task's mempolicy
> +	 */
> +	if (pol != &default_policy && pol != current->mempolicy)
> +		__mpol_free(pol);
>  
>  	seq_printf(m, "%08lx %s", vma->vm_start, buffer);
>  
> Index: Linux/include/linux/mempolicy.h
> ===================================================================
> --- Linux.orig/include/linux/mempolicy.h	2007-08-29 10:56:14.000000000 -0400
> +++ Linux/include/linux/mempolicy.h	2007-08-29 13:32:05.000000000 -0400
> @@ -150,7 +150,7 @@ extern void mpol_fix_fork_child_flag(str
>  
>  extern struct mempolicy default_policy;
>  extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
> -		unsigned long addr, gfp_t gfp_flags);
> +		unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol);
>  extern unsigned slab_node(struct mempolicy *policy);
>  
>  extern enum zone_type policy_zone;
> @@ -238,7 +238,7 @@ static inline void mpol_fix_fork_child_f
>  }
>  
>  static inline struct zonelist *huge_zonelist(struct vm_area_struct *vma,
> -		unsigned long addr, gfp_t gfp_flags)
> +		unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol)
>  {
>  	return NODE_DATA(0)->node_zonelists + gfp_zone(gfp_flags);
>  }
> Index: Linux/mm/hugetlb.c
> ===================================================================
> --- Linux.orig/mm/hugetlb.c	2007-08-29 10:56:13.000000000 -0400
> +++ Linux/mm/hugetlb.c	2007-08-29 13:19:39.000000000 -0400
> @@ -71,8 +71,9 @@ static struct page *dequeue_huge_page(st
>  {
>  	int nid;
>  	struct page *page = NULL;
> +	struct mempolicy *mpol;
>  	struct zonelist *zonelist = huge_zonelist(vma, address,
> -						htlb_alloc_mask);
> +					htlb_alloc_mask, &mpol);
>  	struct zone **z;
>  
>  	for (z = zonelist->zones; *z; z++) {
> @@ -87,6 +88,7 @@ static struct page *dequeue_huge_page(st
>  			break;
>  		}
>  	}
> +	mpol_free(mpol);	/* maybe need unref */
>  	return page;
>  }


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-09-11 18:48 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-30 18:50 [PATCH/RFC 0/5] Memory Policy Cleanups and Enhancements Lee Schermerhorn
2007-08-30 18:51 ` [PATCH/RFC 1/5] Mem Policy: fix reference counting Lee Schermerhorn
2007-09-11 18:48   ` Mel Gorman [this message]
2007-09-11 18:12     ` Lee Schermerhorn
2007-09-13  9:45       ` Mel Gorman
2007-08-30 18:51 ` [PATCH/RFC 2/5] Mem Policy: Use MPOL_PREFERRED for system-wide default policy Lee Schermerhorn
2007-09-11 18:54   ` Mel Gorman
2007-09-11 18:22     ` Lee Schermerhorn
2007-09-13  9:48       ` Mel Gorman
2007-08-30 18:51 ` [PATCH/RFC 3/5] Mem Policy: MPOL_PREFERRED fixups for "local allocation" Lee Schermerhorn
2007-09-11 18:58   ` Mel Gorman
2007-09-11 18:34     ` Lee Schermerhorn
2007-09-12 22:10       ` Christoph Lameter
2007-09-13 13:51         ` Lee Schermerhorn
2007-09-13 18:18           ` Christoph Lameter
2007-09-13  9:55       ` Mel Gorman
2007-09-12 22:06   ` Christoph Lameter
2007-09-13 13:35     ` Lee Schermerhorn
2007-09-13 18:21       ` Christoph Lameter
2007-08-30 18:51 ` [PATCH/RFC 4/5] Mem Policy: cpuset-independent interleave policy Lee Schermerhorn
2007-09-12 21:20   ` Ethan Solomita
2007-09-12 22:14     ` Christoph Lameter
2007-09-13 13:26     ` Lee Schermerhorn
2007-09-13 17:17       ` Ethan Solomita
2007-09-12 21:59   ` Ethan Solomita
2007-09-13 13:32     ` Lee Schermerhorn
2007-09-13 17:19       ` Ethan Solomita
2007-09-13 18:20       ` Christoph Lameter
2007-10-09  6:15       ` Ethan Solomita
2007-10-09 13:39         ` Lee Schermerhorn
2007-10-09 18:49         ` Christoph Lameter
2007-10-09 19:02           ` Lee Schermerhorn
2007-08-30 18:51 ` [PATCH/RFC 5/5] Mem Policy: add MPOL_F_MEMS_ALLOWED get_mempolicy() flag Lee Schermerhorn
2007-09-11 19:07   ` Mel Gorman
2007-09-11 18:42     ` Lee Schermerhorn
2007-09-12 22:14   ` Christoph Lameter
2007-09-14 20:24   ` [PATCH] " Lee Schermerhorn
2007-09-14 20:27     ` Christoph Lameter
2007-09-11 16:20 ` [PATCH/RFC 0/5] Memory Policy Cleanups and Enhancements Lee Schermerhorn
2007-09-11 19:12   ` Mel Gorman
2007-09-11 18:45     ` Lee Schermerhorn
2007-09-12 22:17   ` Christoph Lameter
2007-09-13 13:57     ` Lee Schermerhorn
2007-09-13 15:31       ` Mel Gorman
2007-09-13 15:01         ` Lee Schermerhorn
2007-09-13 18:55           ` Mel Gorman
2007-09-13 18:19       ` Christoph Lameter
2007-09-13 18:23         ` Mel Gorman
2007-09-13 18:26           ` Christoph Lameter
2007-09-13 21:17             ` Andrew Morton
2007-09-14  2:20               ` Christoph Lameter
2007-09-14  8:53               ` Mel Gorman
2007-09-14 15:06                 ` Lee Schermerhorn
2007-09-14 17:46                   ` Mel Gorman
2007-09-14 18:41                     ` Christoph Lameter
2007-09-16 18:02                       ` Mel Gorman
2007-09-17 18:12                         ` Christoph Lameter
2007-09-17 18:19                           ` Christoph Lameter
2007-09-17 20:14                             ` Mel Gorman
2007-09-17 19:16                               ` Christoph Lameter
2007-09-17 20:03                           ` Mel Gorman
2007-09-14 20:15                 ` Lee Schermerhorn
2007-09-16 18:05                   ` Mel Gorman
2007-09-16 19:34                     ` Andrew Morton
2007-09-16 21:22                       ` Mel Gorman
2007-09-17 13:29                     ` Lee Schermerhorn
2007-09-17 18:14                     ` Christoph Lameter
2007-09-13 15:49     ` Lee Schermerhorn
2007-09-13 18:22       ` Christoph Lameter
2007-09-17 19:00 ` [PATCH] Fix NUMA Memory Policy Reference Counting Lee Schermerhorn
2007-09-17 19:14   ` Christoph Lameter
2007-09-17 19:38     ` Lee Schermerhorn
2007-09-17 19:43       ` Christoph Lameter
2007-09-19 22:03         ` Lee Schermerhorn
2007-09-19 22:23           ` Christoph Lameter
2007-09-18 10:36   ` Mel Gorman
2007-09-17 19:32 ` [PATCH] 2.6.23-rc6: " Lee Schermerhorn
2007-09-17 19:37   ` Christoph Lameter
2007-09-17 20:19     ` Lee Schermerhorn
2007-09-17 21:23       ` Christoph Lameter
2007-09-17 22:25     ` Andi Kleen
2007-09-18 19:30       ` Christoph Lameter
2007-09-17 22:28   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1189536502.32731.83.camel@localhost \
    --to=mel@csn.ul.ie \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=eric.whitney@hp.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=mtk-manpages@gmx.net \
    --cc=solo@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.