All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Schermerhorn <lee.schermerhorn@hp.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, nacc@us.ibm.com, ak@suse.de,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	clameter@sgi.com
Subject: [PATCH/RFC 6/11] Shared Policy:  Factor alloc_page_pol routine
Date: Mon, 25 Jun 2007 15:53:05 -0400	[thread overview]
Message-ID: <20070625195305.21210.37682.sendpatchset@localhost> (raw)
In-Reply-To: <20070625195224.21210.89898.sendpatchset@localhost>

Shared Mapped File Policy 6/11 Factor alloc_page_pol routine

Against 2.6.22-rc4-mm2

Implement alloc_page_pol() to allocate a page given a policy and
an offset [for interleaving].  No vma nor addr needed.  This
function will be used to allocate page_cache pages given the
policy at a given page offset in a subsequent patch.

Revise alloc_page_vma() to just call alloc_page_pol() after looking
up the vma policy, to eliminate duplicate code.  This change rippled
into the interleaving functions.  I was able to eliminate
interleave_nid() by computing the offset at the call sites and
calling [modified] offset_il_node() directly.

	removed vma arg from offset_il_node(), as it wasn't
	used and is not available when called from 
	alloc_page_pol().

Note:  re: alloc_page_vma() -- can be called w/ vma == NULL via
read_swap_cache_async() from swapin_readahead().  Can't compute
a page offset in this case.  This means that pages read by swap
readahead don't/can't follow vma policy.  This is current 
behavior.

Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>

 include/linux/gfp.h       |    3 +
 include/linux/hugetlb.h   |    9 ++++
 include/linux/mempolicy.h |    2 +
 include/linux/mm.h        |    6 ++-
 mm/mempolicy.c            |   89 ++++++++++++++++++++++++++--------------------
 5 files changed, 71 insertions(+), 38 deletions(-)

Index: Linux/include/linux/gfp.h
===================================================================
--- Linux.orig/include/linux/gfp.h	2007-06-25 14:58:25.000000000 -0400
+++ Linux/include/linux/gfp.h	2007-06-25 14:58:57.000000000 -0400
@@ -192,10 +192,13 @@ alloc_pages(gfp_t gfp_mask, unsigned int
 }
 extern struct page *alloc_page_vma(gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr);
+struct mempolicy;
+extern struct page *alloc_page_pol(gfp_t, struct mempolicy *, pgoff_t);
 #else
 #define alloc_pages(gfp_mask, order) \
 		alloc_pages_node(numa_node_id(), gfp_mask, order)
 #define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
+#define alloc_page_pol(gfp_mask, pol, off)  alloc_pages(gfp_mask, 0)
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
 
Index: Linux/include/linux/hugetlb.h
===================================================================
--- Linux.orig/include/linux/hugetlb.h	2007-06-25 14:58:25.000000000 -0400
+++ Linux/include/linux/hugetlb.h	2007-06-25 14:58:57.000000000 -0400
@@ -14,6 +14,14 @@ static inline int is_vm_hugetlb_page(str
 	return vma->vm_flags & VM_HUGETLB;
 }
 
+static inline int vma_page_shift(struct vm_area_struct *vma)
+{
+	if (unlikely(is_vm_hugetlb_page(vma)))
+		return HPAGE_SHIFT;
+	else
+		return PAGE_SHIFT;
+}
+
 int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
@@ -127,6 +135,7 @@ static inline unsigned long hugetlb_tota
 #define HPAGE_MASK	PAGE_MASK		/* Keep the compiler happy */
 #define HPAGE_SIZE	PAGE_SIZE
 #endif
+#define vma_page_shift(VMA)		PAGE_SHIFT
 
 #endif /* !CONFIG_HUGETLB_PAGE */
 
Index: Linux/include/linux/mempolicy.h
===================================================================
--- Linux.orig/include/linux/mempolicy.h	2007-06-25 14:58:25.000000000 -0400
+++ Linux/include/linux/mempolicy.h	2007-06-25 14:58:57.000000000 -0400
@@ -124,6 +124,8 @@ extern int mpol_parse_options(char *valu
 			      nodemask_t *policy_nodes);
 
 extern struct mempolicy default_policy;
+extern struct mempolicy *get_file_policy(struct task_struct *,
+		struct address_space *, pgoff_t);
 extern struct zonelist *huge_zonelist(struct vm_area_struct *vma,
 		unsigned long addr, gfp_t gfp_flags);
 extern unsigned slab_node(struct mempolicy *policy);
Index: Linux/include/linux/mm.h
===================================================================
--- Linux.orig/include/linux/mm.h	2007-06-25 14:58:25.000000000 -0400
+++ Linux/include/linux/mm.h	2007-06-25 14:58:57.000000000 -0400
@@ -1058,11 +1058,15 @@ extern void setup_per_cpu_pageset(void);
 
 /*
  * Address to offset for shared mapping policy lookup.
+ * When used for interleaving hugepagefs pages [when shift
+ * == HPAGE_SHIFT], actually returns hugepage offset in
+ * mapping; NOT file page offset.
  */
 static inline pgoff_t vma_addr_to_pgoff(struct vm_area_struct *vma,
 		unsigned long addr, int shift)
 {
-	return ((addr - vma->vm_start) >> shift) + vma->vm_pgoff;
+	return ((addr - vma->vm_start) >> shift) +
+		(vma->vm_pgoff >> (shift - PAGE_SHIFT));
 }
 
 static inline pgoff_t vma_pgoff_to_addr(struct vm_area_struct *vma,
Index: Linux/mm/mempolicy.c
===================================================================
--- Linux.orig/mm/mempolicy.c	2007-06-25 14:58:25.000000000 -0400
+++ Linux/mm/mempolicy.c	2007-06-25 14:58:57.000000000 -0400
@@ -21,6 +21,7 @@
  *
  * bind           Only allocate memory on a specific set of nodes,
  *                no fallback.
+//TODO:  following still applicable?
  *                FIXME: memory is allocated starting with the first node
  *                to the last. It would be better if bind would truly restrict
  *                the allocation to memory nodes instead
@@ -35,6 +36,7 @@
  *                use the process policy. This is what Linux always did
  *		  in a NUMA aware kernel and still does by, ahem, default.
  *
+//TODO:  following needs paragraph rewording.  haven't figured out what to say.
  * The process policy is applied for most non interrupt memory allocations
  * in that process' context. Interrupts ignore the policies and always
  * try to allocate on the local CPU. The VMA policy is only applied for memory
@@ -50,15 +52,18 @@
  * Same with GFP_DMA allocations.
  *
  * For shmfs/tmpfs/hugetlbfs shared memory the policy is shared between
- * all users and remembered even when nobody has memory mapped.
+ * all users and remembered even when nobody has memory mapped. Shared
+ * policies handle sub-ranges of the object using a red/black tree.
+ *
+ * For mmap()ed files, the policy is shared between all 'SHARED mappers
+ * and is remembered as long as the inode exists.  Private mappings
+ * still use vma policy for COWed pages, but use the shared policy
+ * [default, if none] for initial and read-only faults.
  */
 
 /* Notebook:
-   fix mmap readahead to honour policy and enable policy for any page cache
-   object
    statistics for bigpages
-   global policy for page cache? currently it uses process policy. Requires
-   first item above.
+   global policy for page cache?
    handle mremap for shared memory (currently ignored for the policy)
    grows down?
    make bind policy root only? It can trigger oom much faster and the
@@ -1135,6 +1140,22 @@ static struct mempolicy * get_vma_policy
 	return pol;
 }
 
+/*
+ * Return effective policy for file [address_space] at pgoff
+ */
+struct mempolicy *get_file_policy(struct task_struct *task,
+		struct address_space *x, pgoff_t pgoff)
+{
+	struct shared_policy *sp = x->spolicy;
+	struct mempolicy *pol = task->mempolicy;
+
+	if (sp)
+		pol = mpol_shared_policy_lookup(sp, pgoff);
+	if (!pol)
+		pol = &default_policy;
+	return pol;
+}
+
 /* Return a zonelist representing a mempolicy */
 static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy)
 {
@@ -1207,9 +1228,8 @@ unsigned slab_node(struct mempolicy *pol
 	}
 }
 
-/* Do static interleaving for a VMA with known offset. */
-static unsigned offset_il_node(struct mempolicy *pol,
-		struct vm_area_struct *vma, unsigned long off)
+/* Do static interleaving for a policy with known offset. */
+static unsigned offset_il_node(struct mempolicy *pol, pgoff_t off)
 {
 	unsigned nnodes = nodes_weight(pol->v.nodes);
 	unsigned target = (unsigned)off % nnodes;
@@ -1224,28 +1244,6 @@ static unsigned offset_il_node(struct me
 	return nid;
 }
 
-/* Determine a node number for interleave */
-static inline unsigned interleave_nid(struct mempolicy *pol,
-		 struct vm_area_struct *vma, unsigned long addr, int shift)
-{
-	if (vma) {
-		unsigned long off;
-
-		/*
-		 * for small pages, there is no difference between
-		 * shift and PAGE_SHIFT, so the bit-shift is safe.
-		 * for huge pages, since vm_pgoff is in units of small
-		 * pages, we need to shift off the always 0 bits to get
-		 * a useful offset.
-		 */
-		BUG_ON(shift < PAGE_SHIFT);
-		off = vma->vm_pgoff >> (shift - PAGE_SHIFT);
-		off += (addr - vma->vm_start) >> shift;
-		return offset_il_node(pol, vma, off);
-	} else
-		return interleave_nodes(pol);
-}
-
 #ifdef CONFIG_HUGETLBFS
 /* Return a zonelist suitable for a huge page allocation. */
 struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr,
@@ -1256,7 +1254,8 @@ struct zonelist *huge_zonelist(struct vm
 	if (pol->policy == MPOL_INTERLEAVE) {
 		unsigned nid;
 
-		nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT);
+		nid = offset_il_node(pol,
+				vma_addr_to_pgoff(vma, addr, HPAGE_SHIFT));
 		return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags);
 	}
 	return zonelist_policy(GFP_HIGHUSER, pol);
@@ -1278,6 +1277,23 @@ static struct page *alloc_page_interleav
 	return page;
 }
 
+/*
+ * alloc_page_pol() -- allocate a page based on policy,offset.
+ * Used for mmap()ed file policy allocations where policy is based
+ * on file offset rather than a vma,addr pair
+ */
+struct page *alloc_page_pol(gfp_t gfp, struct mempolicy *pol, pgoff_t pgoff)
+{
+	if (unlikely(pol->policy == MPOL_INTERLEAVE)) {
+		unsigned nid;
+
+		nid = offset_il_node(pol, pgoff);
+		return alloc_page_interleave(gfp, 0, nid);
+	}
+	return __alloc_pages(gfp, 0, zonelist_policy(gfp, pol));
+}
+EXPORT_SYMBOL(alloc_page_pol);
+
 /**
  * 	alloc_page_vma	- Allocate a page for a VMA.
  *
@@ -1304,16 +1320,15 @@ struct page *
 alloc_page_vma(gfp_t gfp, struct vm_area_struct *vma, unsigned long addr)
 {
 	struct mempolicy *pol = get_vma_policy(current, vma, addr);
+	pgoff_t pgoff = 0;
 
 	cpuset_update_task_memory_state();
 
-	if (unlikely(pol->policy == MPOL_INTERLEAVE)) {
-		unsigned nid;
-
-		nid = interleave_nid(pol, vma, addr, PAGE_SHIFT);
-		return alloc_page_interleave(gfp, 0, nid);
+	if (likely(vma)) {
+		int shift = vma_page_shift(vma);
+		pgoff = vma_addr_to_pgoff(vma, addr, shift);
 	}
-	return __alloc_pages(gfp, 0, zonelist_policy(gfp, pol));
+	return alloc_page_pol(gfp, pol, pgoff);
 }
 
 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-06-25 19:53 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-25 19:52 [PATCH/RFC 0/11] Shared Policy Overview Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 1/11] Shared Policy: move shared policy to inode/mapping Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 2/11] Shared Policy: allocate shared policies as needed Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 3/11] Shared Policy: let vma policy ops handle sub-vma policies Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 4/11] Shared Policy: fix show_numa_maps() Lee Schermerhorn
2007-06-25 19:52 ` [PATCH/RFC 5/11] Shared Policy: Add hugepage shmem policy vm_ops Lee Schermerhorn
2007-06-25 19:53 ` Lee Schermerhorn [this message]
2007-06-25 19:53 ` [PATCH/RFC 7/11] Shared Policy: use shared policy for page cache allocations Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 8/11] Shared Policy: fix migration of private mappings Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 9/11] Shared Policy: mapped file policy persistence model Lee Schermerhorn
2007-06-25 19:53 ` [PATCH/RFC 10/11] Shared Policy: per cpuset shared file policy control Lee Schermerhorn
2007-06-25 21:10   ` Paul Jackson
2007-06-27 17:33     ` Lee Schermerhorn
2007-06-27 19:52       ` Paul Jackson
2007-06-27 20:22         ` Lee Schermerhorn
2007-06-27 20:36           ` Paul Jackson
2007-06-25 19:53 ` [PATCH/RFC 11/11] Shared Policy: add generic file set/get policy vm ops Lee Schermerhorn
2007-06-26 22:17 ` [PATCH/RFC 0/11] Shared Policy Overview Christoph Lameter
2007-06-27 13:43   ` Lee Schermerhorn
2007-06-26 22:21 ` Christoph Lameter
2007-06-26 22:42   ` Andi Kleen
2007-06-27  3:25     ` Christoph Lameter
2007-06-27 20:14       ` Lee Schermerhorn
2007-06-27 18:14   ` Lee Schermerhorn
2007-06-27 21:37     ` Christoph Lameter
2007-06-27 22:01       ` Andi Kleen
2007-06-27 22:08         ` Christoph Lameter
2007-06-27 23:46         ` Paul E. McKenney
2007-06-28  0:14           ` Andi Kleen
2007-06-29 21:47           ` Lee Schermerhorn
2007-06-28 13:42         ` Lee Schermerhorn
2007-06-28 22:02           ` Andi Kleen
2007-06-29 17:14             ` Lee Schermerhorn
2007-06-29 17:42               ` Andi Kleen
2007-06-30 18:34                 ` [PATCH/RFC] Fix Mempolicy Ref Counts - was " Lee Schermerhorn
2007-07-03 18:09                   ` Christoph Lameter
2007-06-29  1:39           ` Christoph Lameter
2007-06-29  9:01             ` Andi Kleen
2007-06-29 14:05               ` Christoph Lameter
2007-06-29 17:41                 ` Lee Schermerhorn
2007-06-29 20:15                   ` Christoph Lameter
2007-06-29 13:22             ` Lee Schermerhorn
2007-06-29 14:18               ` Christoph Lameter
2007-06-27 23:36       ` Lee Schermerhorn
2007-06-29  1:41         ` Christoph Lameter
2007-06-29 13:30           ` Lee Schermerhorn
2007-06-29 14:20             ` Andi Kleen
2007-06-29 21:40               ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070625195305.21210.37682.sendpatchset@localhost \
    --to=lee.schermerhorn@hp.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=nacc@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.