[RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management
@ 2012-11-06 19:39 Srivatsa S. Bhat
  2012-11-06 19:39 ` [RFC PATCH 01/10] mm: Introduce the memory regions data structure Srivatsa S. Bhat
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:39 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

Hi,

This is a forward-port of the Memory Power Management patchset that Ankita
Garg had posted last year [5], to current mainline (3.7-rc3). This design
introduces memory regions in-between the node->zone hierarchy, and is hence
termed as the "Hierarchy" design.

I'll be immediately posting another patchset that implements an alternative
design (very different from this one) developed based on the review feedback
that was received [7] for the above patchset last year. This new design alters
the buddy-lists and keeps them region-sorted, and is hence identified as
the "Sorted-buddy" design.

The idea behind forward-porting the earlier patchset ("Hierarchy" design) and
also posting a new alternative ("Sorted-buddy" design), is to enable people
to see and evaluate the 2 designs side-by-side and compare how well they meet
the various requirements and also perhaps to identify the best parts of both
the designs, for further improvement.

Though the original implementation of this "Hierarchy" design was targetted for
and tested on ARM platforms, this forward-port includes changes to make it work
on x86 as well (minimalistic config). Original patchset description follows...

-----------------------------------------------------------------------------

Modern systems offer higher CPU performance and large amount of memory in
each generation in order to support application demands.  Memory subsystem has
began to offer wide range of capabilities for managing power consumption,
which is driving the need to relook at the way memory is managed by the
operating system. Linux VM subsystem has sophisticated algorithms to
optimally  manage the scarce resources for best overall system performance.
Apart from the capacity and location of memory areas, the VM subsystem tracks
special addressability restrictions in zones and relative distance from CPU as
NUMA nodes if necessary. Power management capabilities in the memory subsystem
and inclusion of different class of main memory like PCM, or non-volatile RAM,
brings in new boundaries and attributes that needs to be tagged within the
Linux VM subsystem for exploitation by the kernel and applications.

This patchset proposes a generic memory regions infrastructure that can be
used to tag boundaries of memory blocks which belongs to a specific memory
power management domain and further enable exploitation of platform memory
power management capabilities.

How can Linux VM help memory power savings?

o Consolidate memory allocations and/or references such that they are
not spread across the entire memory address space.  Basically area of memory
that is not being referenced, can reside in low power state.

o Support targeted memory reclaim, where certain areas of memory that can be
easily freed can be offlined, allowing those areas of memory to be put into
lower power states.

What is a Memory Region ?
-------------------------

Memory regions is a generic memory management framework that enables the
virtual memory manager to consider memory characteristics when making memory
allocation and deallocation decisions. It is a layer of abstraction under the
real NUMA nodes, that encapsulate knowledge of the underlying memory hardware.
This layer is created at boot time, with information from firmware regarding
the granularity at which memory power can be managed on the platform. For
example, on platforms with support for Partial Array Self-Refresh (PASR) [1],
regions could be aligned to memory unit that can be independently put into
self-refresh or turned off (content destructive power off). On the other hand,
platforms with support for multiple memory controllers that control the power
states of memory, one memory region could be created for all the memory under
a single memory controller.

The aim of the alignment is to ensure that memory allocations, deallocations
and reclaim are performed within a defined hardware boundary. By creating
zones under regions, the buddy allocator would operate at the level of
regions. The proposed data structure is as shown in the Figure below:

             -----------------------------
             |N0 |N1 |N2 |N3 |.. |.. |Nn |
             -----------------------------
             / \ \
            /   \  \
           /     \   \
 ------------    |  ------------
 | Mem Rgn0 |    |  | Mem Rgn3 |
 ------------    |  ------------
    |            |         |
    |      ------------    | ---------
    |      | Mem Rgn1 |    ->| zones |
    |      ------------      ---------
    |          |     ---------
    |          ----->| zones |
    | ---------      ---------
    ->| zones |
      ---------

Memory regions enable the following :

o Sequential allocation of memory in the order of memory regions, thus
  ensuring that greater number of memory regions are devoid of allocations to
  begin with
o With time however, the memory allocations will tend to be spread across
  different regions. But the notion of a region boundary and region level
  memory statistics will enable specific regions to be evacuated using
  targetted allocation and reclaim.

Lumpy reclaim and other memory compaction work by Mel Gorman, would further
aid in consolidation of memory [4].

Memory regions is just a base infrastructure that would enable the Linux VM to
be aware of the physical memory hardware characterisitics, a pre-requisite to
implementing other sophisticated algorithms and techniques to actually
conserve power.

Advantages
-----------

Memory regions framework works with existing memory management data
structures and only adds one more layer of abstraction that is required to
capture special boundaries and properties.  Most VM code paths work similar
to current implementation with additional traversal of zone data structures
in pre-defined order.

Alternative Approach:

There are other ways in which memory belonging to the same power domain could
be grouped together. Fake NUMA nodes under a real NUMA node could encapsulate
information about the memory hardware units that can be independently power
managed. With minimal code changes, the same functionality as memory regions
can be achieved. However, the fake NUMA nodes is a non-intuitive solution,
that breaks the NUMA semantics and is not generic in nature. It would present
an incorrect view of the system to the administrator, by showing that it has a
greater number of NUMA nodes than actually present.

Challenges
----------

o Memory interleaving is typically used on all platforms to increase the
  memory bandwidth and hence memory performance. However, in the presence of
  interleaving, the amount of idle memory within the hardware domain reduces,
  impacting power savings. For a given platform, it is important to select an
  interleaving scheme that gives good performance with optimum power savings.

This is a RFC patchset with minimal functionality to demonstrate the
requirement and proposed implementation options. It has been tested on TI
OMAP4 Panda board with 1Gb RAM and the Samsung Exynos 4210 board. The patch
applies on kernel version 2.6.39-rc5 (this version applies on 3.7-rc3), compiled
with the default config files for the two platforms. I have turned off cgroup,
memory hotplug and kexec to begin. Support to these framework can be easily
extended. The u-boot bootloader does not yet export information regarding the
physical memory bank boundaries and hence the regions are not correctly aligned
to hardware and hence hard coded for test/demo purposes. Also, the code assumes
that atleast one region is present in the node. Compile time exclusion of memory
regions is a todo.

Results (from [5])
------------------------------
Ran pagetest, a simple C program that allocates and touches a required number
of pages, on a Samsung Exynos 4210 board with ~2GB RAM, booted with 4 memory
regions, each with ~512MB. The allocation size used was 512MB. Below is the
free page statistics while running the benchmark:

		---------------------------------------
	 	|	   | start  | ~480MB |  512MB |
		---------------------------------------
		| Region 0 | 124013 | 1129   | 484    |
		| Region 1 | 131072 | 131072 | 130824 |
		| Region 2 | 131072 | 131072 | 131072 |
		| Region 3 | 57332  | 57332  | 57332  |
		---------------------------------------

(The total number of pages in Region 3 is 57332, as it contains all the
remaining pages and hence the region size is not 512MB).

Column 1 indicates the number of free pages in each region at the start of the
benchmark, column 2 at about 480MB allocation and column 3 at 512MB
allocation. The memory in regions 1,2 & 3 is free and only region0 is
utilized. So if the regions are aligned to the hardware memory units, free
regions could potentially be put either into low power state or turned off. It
may be possible to allocate from lower address without regions, but once the
page reclaim comes into play, the page allocations will tend to get spread
around.

References
----------

[1] Partial Array Self Refresh
    http://focus.ti.com/general/docs/wtbu/wtbudocumentcenter.tsp?templateId=6123&navigationId=12037

[2] TI OMAP$ Panda board
    http://pandaboard.org/node/224/#manual

[3] Memory Regions discussion at Ubuntu Development Summit, May 2011
    https://wiki.linaro.org/Specs/KernelGeneralPerformanceO?action=AttachFile&do=view&target=k81-memregions.odp

[4] Memory compaction
    http://lwn.net/Articles/368869/

[5] First posting of this patch series:
    http://lwn.net/Articles/445045/
    http://thread.gmane.org/gmane.linux.kernel.mm/63840

    Summary of the discussions on this patchset:
    http://article.gmane.org/gmane.linux.power-management.general/25061

[6] Estimate of potential power savings on Samsung exynos board
    http://article.gmane.org/gmane.linux.kernel.mm/65935

[7] Review comments suggesting modifying the buddy allocator to be aware of
    memory regions:
    http://article.gmane.org/gmane.linux.power-management.general/24862
    http://article.gmane.org/gmane.linux.power-management.general/25061
    http://article.gmane.org/gmane.linux.kernel.mm/64689

 Ankita Garg (10):
      mm: Introduce the memory regions data structure
      mm: Helper routines
      mm: Init zones inside memory regions
      mm: Refer to zones from memory regions
      mm: Create zonelists
      mm: Verify zonelists
      mm: Modify vmstat
      mm: Modify vmscan
      mm: Reflect memory region changes in zoneinfo
      mm: Create memory regions at boot-up

  include/linux/mm.h     |   11 +
 include/linux/mmzone.h |   55 +++++--
 include/linux/vmstat.h |   21 ++-
 mm/mm_init.c           |   57 ++++---
 mm/mmzone.c            |   48 +++++-
 mm/page_alloc.c        |  398 +++++++++++++++++++++++++++++++-----------------
 mm/vmscan.c            |  364 +++++++++++++++++++++++---------------------
 mm/vmstat.c            |   71 +++++----
 8 files changed, 631 insertions(+), 394 deletions(-)

Thanks,
Srivatsa S. Bhat
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 01/10] mm: Introduce the memory regions data structure
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
@ 2012-11-06 19:39 ` Srivatsa S. Bhat
  2012-11-06 19:40 ` [RFC PATCH 02/10] mm: Helper routines Srivatsa S. Bhat
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:39 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

Memory region data structure is created under a NUMA node. Each NUMA node can
have multiple memory regions, depending upon the platform configuration for
power management. Each memory region contains zones, which is the entity from
which memory is allocated by the buddy allocator.

                 -------------
		 | pg_data_t |
                 -------------
                     |  |
		------  -------
		v             v
        ----------------    ----------------
        | mem_region_t |    | mem_region_t |
        ----------------    ----------------    -------------
               |                    |...........| zone0 | ....
	       v                                -------------
           -----------------------------
           | zone0 | zone1 | zone3 | ..|
           -----------------------------

Each memory region contains a zone array for the zones belonging to that region,
in addition to other fields like node id, index of the region in the node, start
pfn of the pages in that region and the number of pages spanned in the region.
The zone array inside the regions is statically allocated at this point.

ToDo:
However, since the number of regions actually present on the system might be much
smaller than the maximum allowed, dynamic bootmem allocation could be used to save
memory.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/mmzone.h |   24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 50aaca8..3f9b106 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -86,6 +86,7 @@ struct free_area {
 };
 
 struct pglist_data;
+struct mem_region;
 
 /*
  * zone->lock and zone->lru_lock are two of the hottest locks in the kernel.
@@ -465,6 +466,8 @@ struct zone {
 	 * Discontig memory support fields.
 	 */
 	struct pglist_data	*zone_pgdat;
+	struct mem_region	*zone_mem_region;
+
 	/* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */
 	unsigned long		zone_start_pfn;
 
@@ -533,6 +536,8 @@ static inline int zone_is_oom_locked(const struct zone *zone)
 	return test_bit(ZONE_OOM_LOCKED, &zone->flags);
 }
 
+#define MAX_NR_REGIONS    256
+
 /*
  * The "priority" of VM scanning is how much of the queues we will scan in one
  * go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
@@ -541,7 +546,7 @@ static inline int zone_is_oom_locked(const struct zone *zone)
 #define DEF_PRIORITY 12
 
 /* Maximum number of zones on a zonelist */
-#define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
+#define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_REGIONS * MAX_NR_ZONES)
 
 #ifdef CONFIG_NUMA
 
@@ -671,6 +676,18 @@ struct node_active_region {
 extern struct page *mem_map;
 #endif
 
+struct mem_region {
+	struct zone region_zones[MAX_NR_ZONES];
+	int nr_region_zones;
+
+	int node;
+	int region;
+
+	unsigned long start_pfn;
+	unsigned long spanned_pages;
+};
+
+
 /*
  * The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM
  * (mostly NUMA machines?) to denote a higher-level memory zone than the
@@ -684,9 +701,10 @@ extern struct page *mem_map;
  */
 struct bootmem_data;
 typedef struct pglist_data {
-	struct zone node_zones[MAX_NR_ZONES];
+	struct mem_region node_regions[MAX_NR_REGIONS];
+	int nr_node_regions;
 	struct zonelist node_zonelists[MAX_ZONELISTS];
-	int nr_zones;
+	int nr_node_zone_types;
 #ifdef CONFIG_FLAT_NODE_MEM_MAP	/* means !SPARSEMEM */
 	struct page *node_mem_map;
 #ifdef CONFIG_MEMCG

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 02/10] mm: Helper routines
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
  2012-11-06 19:39 ` [RFC PATCH 01/10] mm: Introduce the memory regions data structure Srivatsa S. Bhat
@ 2012-11-06 19:40 ` Srivatsa S. Bhat
  2012-11-06 19:40 ` [RFC PATCH 03/10] mm: Init zones inside memory regions Srivatsa S. Bhat
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:40 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

With the introduction of regions, helper routines are needed to walk through
all the regions and zones inside a node. This patch adds these helper
routines.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/mm.h     |    7 ++-----
 include/linux/mmzone.h |   22 +++++++++++++++++++---
 mm/mmzone.c            |   48 ++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa06804..70f1009 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -693,11 +693,6 @@ static inline int page_to_nid(const struct page *page)
 }
 #endif
 
-static inline struct zone *page_zone(const struct page *page)
-{
-	return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
-}
-
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 static inline void set_page_section(struct page *page, unsigned long section)
 {
@@ -711,6 +706,8 @@ static inline unsigned long page_to_section(const struct page *page)
 }
 #endif
 
+struct zone *page_zone(struct page *page);
+
 static inline void set_page_zone(struct page *page, enum zone_type zone)
 {
 	page->flags &= ~(ZONES_MASK << ZONES_PGSHIFT);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3f9b106..6f5d533 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -800,7 +800,7 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
 /*
  * zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc.
  */
-#define zone_idx(zone)		((zone) - (zone)->zone_pgdat->node_zones)
+#define zone_idx(zone)		((zone) - (zone)->zone_mem_region->region_zones)
 
 static inline int populated_zone(struct zone *zone)
 {
@@ -907,7 +907,9 @@ extern struct pglist_data contig_page_data;
 
 extern struct pglist_data *first_online_pgdat(void);
 extern struct pglist_data *next_online_pgdat(struct pglist_data *pgdat);
+extern struct zone *first_zone(void);
 extern struct zone *next_zone(struct zone *zone);
+extern struct mem_region *next_mem_region(struct mem_region *region);
 
 /**
  * for_each_online_pgdat - helper macro to iterate over all online nodes
@@ -917,6 +919,20 @@ extern struct zone *next_zone(struct zone *zone);
 	for (pgdat = first_online_pgdat();		\
 	     pgdat;					\
 	     pgdat = next_online_pgdat(pgdat))
+
+
+/**
+ * for_each_mem_region_in_node - helper macro to iterate over all the memory
+ * regions in a node.
+ * @region - pointer to a struct mem_region variable
+ * @nid - node id of the node
+ */
+#define for_each_mem_region_in_node(region, nid)	\
+	for (region = (NODE_DATA(nid))->node_regions;	\
+	     region;					\
+	     region = next_mem_region(region))
+
+
 /**
  * for_each_zone - helper macro to iterate over all memory zones
  * @zone - pointer to struct zone variable
@@ -925,12 +941,12 @@ extern struct zone *next_zone(struct zone *zone);
  * fills it in.
  */
 #define for_each_zone(zone)			        \
-	for (zone = (first_online_pgdat())->node_zones; \
+	for (zone = (first_zone()); 			\
 	     zone;					\
 	     zone = next_zone(zone))
 
 #define for_each_populated_zone(zone)		        \
-	for (zone = (first_online_pgdat())->node_zones; \
+	for (zone = (first_zone()); 			\
 	     zone;					\
 	     zone = next_zone(zone))			\
 		if (!populated_zone(zone))		\
diff --git a/mm/mmzone.c b/mm/mmzone.c
index 3cef80f..d32d10a 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -23,22 +23,62 @@ struct pglist_data *next_online_pgdat(struct pglist_data *pgdat)
 	return NODE_DATA(nid);
 }
 
+struct mem_region *next_mem_region(struct mem_region *region)
+{
+	int next_region = region->region + 1;
+	pg_data_t *pgdat = NODE_DATA(region->node);
+
+	if (next_region == pgdat->nr_node_regions)
+		return NULL;
+	return &(pgdat->node_regions[next_region]);
+}
+
+struct zone *first_zone(void)
+{
+	return (first_online_pgdat())->node_regions[0].region_zones;
+}
+
+struct zone *page_zone(struct page *page)
+{
+        pg_data_t *pgdat  = NODE_DATA(page_to_nid(page));
+        unsigned long pfn = page_to_pfn(page);
+        struct mem_region *region;
+
+        for_each_mem_region_in_node(region, pgdat->node_id) {
+                unsigned long end_pfn = region->start_pfn +
+                                        region->spanned_pages;
+
+                if ((pfn >= region->start_pfn) && (pfn < end_pfn))
+                        return &region->region_zones[page_zonenum(page)];
+        }
+
+        return NULL;
+}
+
+
 /*
  * next_zone - helper magic for for_each_zone()
  */
 struct zone *next_zone(struct zone *zone)
 {
 	pg_data_t *pgdat = zone->zone_pgdat;
+	struct mem_region *region = zone->zone_mem_region;
+
+	if (zone < region->region_zones + MAX_NR_ZONES - 1)
+		return ++zone;
 
-	if (zone < pgdat->node_zones + MAX_NR_ZONES - 1)
-		zone++;
-	else {
+	region = next_mem_region(region);
+
+	if (region) {
+		zone = region->region_zones;
+	} else {
 		pgdat = next_online_pgdat(pgdat);
 		if (pgdat)
-			zone = pgdat->node_zones;
+			zone = pgdat->node_regions[0].region_zones;
 		else
 			zone = NULL;
 	}
+
 	return zone;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 03/10] mm: Init zones inside memory regions
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
  2012-11-06 19:39 ` [RFC PATCH 01/10] mm: Introduce the memory regions data structure Srivatsa S. Bhat
  2012-11-06 19:40 ` [RFC PATCH 02/10] mm: Helper routines Srivatsa S. Bhat
@ 2012-11-06 19:40 ` Srivatsa S. Bhat
  2012-11-06 19:40 ` [RFC PATCH 04/10] mm: Refer to zones from " Srivatsa S. Bhat
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:40 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

This patch initializes zones inside memory regions. Each memory region is
scanned for the pfns present in it. The intersection of the range with that of
a zone is setup as the amount of memory present in the zone in that region.
Most of the other setup related steps continue to be unmodified.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/mm.h |    2 +
 mm/page_alloc.c    |  175 ++++++++++++++++++++++++++++++++++------------------
 2 files changed, 118 insertions(+), 59 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 70f1009..f57eef0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1320,6 +1320,8 @@ extern unsigned long absent_pages_in_range(unsigned long start_pfn,
 						unsigned long end_pfn);
 extern void get_pfn_range_for_nid(unsigned int nid,
 			unsigned long *start_pfn, unsigned long *end_pfn);
+extern void get_pfn_range_for_region(int nid, int region,
+			unsigned long *start_pfn, unsigned long *end_pfn);
 extern unsigned long find_min_pfn_with_active_regions(void);
 extern void free_bootmem_with_active_regions(int nid,
 						unsigned long max_low_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb90971..c807272 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4321,6 +4321,7 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
 }
 
 #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
 static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
 					unsigned long zone_type,
 					unsigned long *zones_size)
@@ -4340,6 +4341,48 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
 
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
 
+void __meminit get_pfn_range_for_region(int nid, int region,
+			unsigned long *start_pfn, unsigned long *end_pfn)
+{
+	struct mem_region *mem_region;
+
+	mem_region = &NODE_DATA(nid)->node_regions[region];
+	*start_pfn = mem_region->start_pfn;
+	*end_pfn = *start_pfn + mem_region->spanned_pages;
+}
+
+static inline unsigned long __meminit zone_spanned_pages_in_node_region(int nid,
+					int region,
+					unsigned long zone_start_pfn,
+					unsigned long zone_type,
+					unsigned long *zones_size)
+{
+	unsigned long start_pfn, end_pfn;
+	unsigned long zone_end_pfn, spanned_pages;
+
+	get_pfn_range_for_region(nid, region, &start_pfn, &end_pfn);
+
+	spanned_pages = zone_spanned_pages_in_node(nid, zone_type, zones_size);
+
+	zone_end_pfn = zone_start_pfn + spanned_pages;
+
+	zone_end_pfn = min(zone_end_pfn, end_pfn);
+	zone_start_pfn = max(start_pfn, zone_start_pfn);
+
+	/* Detect if region and zone don't intersect */
+	if (zone_end_pfn < zone_start_pfn)
+		return 0;
+
+	return zone_end_pfn - zone_start_pfn;
+}
+
+static inline unsigned long __meminit zone_absent_pages_in_node_region(int nid,
+					unsigned long zone_start_pfn,
+					unsigned long zone_end_pfn)
+{
+	return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
+}
+
 static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
 		unsigned long *zones_size, unsigned long *zholes_size)
 {
@@ -4446,6 +4489,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 	enum zone_type j;
 	int nid = pgdat->node_id;
 	unsigned long zone_start_pfn = pgdat->node_start_pfn;
+	struct mem_region *region;
 	int ret;
 
 	pgdat_resize_init(pgdat);
@@ -4454,68 +4498,77 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 	pgdat_page_cgroup_init(pgdat);
 
 	for (j = 0; j < MAX_NR_ZONES; j++) {
-		struct zone *zone = pgdat->node_zones + j;
-		unsigned long size, realsize, memmap_pages;
+		for_each_mem_region_in_node(region, pgdat->node_id) {
+			struct zone *zone = region->region_zones + j;
+			unsigned long size, realsize = 0, memmap_pages;
 
-		size = zone_spanned_pages_in_node(nid, j, zones_size);
-		realsize = size - zone_absent_pages_in_node(nid, j,
-								zholes_size);
+			size = zone_spanned_pages_in_node_region(nid,
+								 region->region,
+								 zone_start_pfn,
+								 j, zones_size);
 
-		/*
-		 * Adjust realsize so that it accounts for how much memory
-		 * is used by this zone for memmap. This affects the watermark
-		 * and per-cpu initialisations
-		 */
-		memmap_pages =
-			PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
-		if (realsize >= memmap_pages) {
-			realsize -= memmap_pages;
-			if (memmap_pages)
-				printk(KERN_DEBUG
-				       "  %s zone: %lu pages used for memmap\n",
-				       zone_names[j], memmap_pages);
-		} else
-			printk(KERN_WARNING
-				"  %s zone: %lu pages exceeds realsize %lu\n",
-				zone_names[j], memmap_pages, realsize);
-
-		/* Account for reserved pages */
-		if (j == 0 && realsize > dma_reserve) {
-			realsize -= dma_reserve;
-			printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
-					zone_names[0], dma_reserve);
-		}
+			realsize = size -
+					zone_absent_pages_in_node_region(nid,
+								zone_start_pfn,
+								zone_start_pfn + size);
 
-		if (!is_highmem_idx(j))
-			nr_kernel_pages += realsize;
-		nr_all_pages += realsize;
+			/*
+			 * Adjust realsize so that it accounts for how much memory
+			 * is used by this zone for memmap. This affects the watermark
+			 * and per-cpu initialisations
+			 */
+			memmap_pages =
+				PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
+			if (realsize >= memmap_pages) {
+				realsize -= memmap_pages;
+				if (memmap_pages)
+					printk(KERN_DEBUG
+					       "  %s zone: %lu pages used for memmap\n",
+					       zone_names[j], memmap_pages);
+			} else
+				printk(KERN_WARNING
+					"  %s zone: %lu pages exceeds realsize %lu\n",
+					zone_names[j], memmap_pages, realsize);
+
+			/* Account for reserved pages */
+			if (j == 0 && realsize > dma_reserve) {
+				realsize -= dma_reserve;
+				printk(KERN_DEBUG "  %s zone: %lu pages reserved\n",
+						zone_names[0], dma_reserve);
+			}
 
-		zone->spanned_pages = size;
-		zone->present_pages = realsize;
+			if (!is_highmem_idx(j))
+				nr_kernel_pages += realsize;
+			nr_all_pages += realsize;
+
+			zone->spanned_pages = size;
+			zone->present_pages = realsize;
 #ifdef CONFIG_NUMA
-		zone->node = nid;
-		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
-						/ 100;
-		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
+			zone->node = nid;
+			zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
+							/ 100;
+			zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
-		zone->name = zone_names[j];
-		spin_lock_init(&zone->lock);
-		spin_lock_init(&zone->lru_lock);
-		zone_seqlock_init(zone);
-		zone->zone_pgdat = pgdat;
-
-		zone_pcp_init(zone);
-		lruvec_init(&zone->lruvec, zone);
-		if (!size)
-			continue;
+			zone->name = zone_names[j];
+			spin_lock_init(&zone->lock);
+			spin_lock_init(&zone->lru_lock);
+			zone_seqlock_init(zone);
+			zone->zone_pgdat = pgdat;
+			zone->zone_mem_region = region;
+
+			zone_pcp_init(zone);
+			lruvec_init(&zone->lruvec, zone);
+			if (!size)
+				continue;
 
-		set_pageblock_order();
-		setup_usemap(pgdat, zone, size);
-		ret = init_currently_empty_zone(zone, zone_start_pfn,
-						size, MEMMAP_EARLY);
-		BUG_ON(ret);
-		memmap_init(size, nid, j, zone_start_pfn);
-		zone_start_pfn += size;
+			set_pageblock_order();
+			setup_usemap(pgdat, zone, size);
+			ret = init_currently_empty_zone(zone, zone_start_pfn,
+							size, MEMMAP_EARLY);
+			BUG_ON(ret);
+			memmap_init(size, nid, j, zone_start_pfn);
+			zone_start_pfn += size;
+		}
 	}
 }
 
@@ -4854,12 +4907,16 @@ static void __init check_for_regular_memory(pg_data_t *pgdat)
 {
 #ifdef CONFIG_HIGHMEM
 	enum zone_type zone_type;
+	struct mem_region *region;
 
 	for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
-		struct zone *zone = &pgdat->node_zones[zone_type];
-		if (zone->present_pages) {
-			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
-			break;
+		for_each_mem_region_in_node(region, pgdat->node_id) {
+			struct zone *zone = &region->region_zones[zone_type];
+			if (zone->present_pages) {
+				node_set_state(zone_to_nid(zone),
+					       N_NORMAL_MEMORY);
+				return;
+			}
 		}
 	}
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 04/10] mm: Refer to zones from memory regions
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
                   ` (2 preceding siblings ...)
  2012-11-06 19:40 ` [RFC PATCH 03/10] mm: Init zones inside memory regions Srivatsa S. Bhat
@ 2012-11-06 19:40 ` Srivatsa S. Bhat
  2012-11-06 19:40 ` [RFC PATCH 05/10] mm: Create zonelists Srivatsa S. Bhat
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:40 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

With the introduction of memory regions, the node_zones link inside
the node structure is removed. Hence, this patch modifies the VM
code to refer to zones from within memory regions instead of nodes.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/mm.h     |    2 -
 include/linux/mmzone.h |    9 ++-
 mm/page_alloc.c        |  128 +++++++++++++++++++++++++++---------------------
 3 files changed, 79 insertions(+), 60 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f57eef0..27fc2d3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1345,7 +1345,7 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn);
 #endif
 
 extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long,
+extern void memmap_init_zone(unsigned long, int, int, unsigned long,
 				unsigned long, enum memmap_context);
 extern void setup_per_zone_wmarks(void);
 extern int __meminit init_per_zone_wmark_min(void);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6f5d533..4abc7d5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -842,7 +842,8 @@ static inline int is_normal_idx(enum zone_type idx)
 static inline int is_highmem(struct zone *zone)
 {
 #ifdef CONFIG_HIGHMEM
-	int zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones;
+	int zone_off = (char *)zone -
+				(char *)zone->zone_mem_region->region_zones;
 	return zone_off == ZONE_HIGHMEM * sizeof(*zone) ||
 	       (zone_off == ZONE_MOVABLE * sizeof(*zone) &&
 		zone_movable_is_highmem());
@@ -853,13 +854,13 @@ static inline int is_highmem(struct zone *zone)
 
 static inline int is_normal(struct zone *zone)
 {
-	return zone == zone->zone_pgdat->node_zones + ZONE_NORMAL;
+	return zone == zone->zone_mem_region->region_zones + ZONE_NORMAL;
 }
 
 static inline int is_dma32(struct zone *zone)
 {
 #ifdef CONFIG_ZONE_DMA32
-	return zone == zone->zone_pgdat->node_zones + ZONE_DMA32;
+	return zone == zone->zone_mem_region->region_zones + ZONE_DMA32;
 #else
 	return 0;
 #endif
@@ -868,7 +869,7 @@ static inline int is_dma32(struct zone *zone)
 static inline int is_dma(struct zone *zone)
 {
 #ifdef CONFIG_ZONE_DMA
-	return zone == zone->zone_pgdat->node_zones + ZONE_DMA;
+	return zone == zone->zone_mem_region->region_zones + ZONE_DMA;
 #else
 	return 0;
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c807272..a8e86b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3797,8 +3797,8 @@ static void setup_zone_migrate_reserve(struct zone *zone)
  * up by free_all_bootmem() once the early boot process is
  * done. Non-atomic initialization, single-pass.
  */
-void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
-		unsigned long start_pfn, enum memmap_context context)
+void __meminit memmap_init_zone(unsigned long size, int nid, int region,
+		unsigned long zone, unsigned long start_pfn, enum memmap_context context)
 {
 	struct page *page;
 	unsigned long end_pfn = start_pfn + size;
@@ -3808,7 +3808,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
 
-	z = &NODE_DATA(nid)->node_zones[zone];
+	z = &NODE_DATA(nid)->node_regions[region].region_zones[zone];
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		/*
 		 * There can be holes in boot-time mem_map[]s
@@ -3865,8 +3865,8 @@ static void __meminit zone_init_free_lists(struct zone *zone)
 }
 
 #ifndef __HAVE_ARCH_MEMMAP_INIT
-#define memmap_init(size, nid, zone, start_pfn) \
-	memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
+#define memmap_init(size, nid, region, zone, start_pfn) \
+	memmap_init_zone((size), (nid), (region), (zone), (start_pfn), MEMMAP_EARLY)
 #endif
 
 static int __meminit zone_batchsize(struct zone *zone)
@@ -4045,11 +4045,13 @@ int __meminit init_currently_empty_zone(struct zone *zone,
 					enum memmap_context context)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
+	struct mem_region *region = zone->zone_mem_region;
 	int ret;
 	ret = zone_wait_table_init(zone, size);
 	if (ret)
 		return ret;
-	pgdat->nr_zones = zone_idx(zone) + 1;
+	pgdat->nr_node_zone_types = zone_idx(zone) + 1;
+	region->nr_region_zones = zone_idx(zone) + 1;
 
 	zone->zone_start_pfn = zone_start_pfn;
 
@@ -4058,7 +4060,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
 			pgdat->node_id,
 			(unsigned long)zone_idx(zone),
 			zone_start_pfn, (zone_start_pfn + size));
-
 	zone_init_free_lists(zone);
 
 	return 0;
@@ -4566,7 +4567,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 			ret = init_currently_empty_zone(zone, zone_start_pfn,
 							size, MEMMAP_EARLY);
 			BUG_ON(ret);
-			memmap_init(size, nid, j, zone_start_pfn);
+			memmap_init(size, nid, region->region, j, zone_start_pfn);
 			zone_start_pfn += size;
 		}
 	}
@@ -4613,13 +4614,17 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
 #endif /* CONFIG_FLAT_NODE_MEM_MAP */
 }
 
+/*
+ * Todo: This routine needs more modifications, but not required for the
+ * minimalistic config options, to start with
+ */
 void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 		unsigned long node_start_pfn, unsigned long *zholes_size)
 {
 	pg_data_t *pgdat = NODE_DATA(nid);
 
 	/* pg_data_t should be reset to zero when it's allocated */
-	WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
+	WARN_ON(pgdat->nr_node_zone_types || pgdat->classzone_idx);
 
 	pgdat->node_id = nid;
 	pgdat->node_start_pfn = node_start_pfn;
@@ -5109,35 +5114,38 @@ static void calculate_totalreserve_pages(void)
 {
 	struct pglist_data *pgdat;
 	unsigned long reserve_pages = 0;
+	struct mem_region *region;
 	enum zone_type i, j;
 
 	for_each_online_pgdat(pgdat) {
 		for (i = 0; i < MAX_NR_ZONES; i++) {
-			struct zone *zone = pgdat->node_zones + i;
-			unsigned long max = 0;
-
-			/* Find valid and maximum lowmem_reserve in the zone */
-			for (j = i; j < MAX_NR_ZONES; j++) {
-				if (zone->lowmem_reserve[j] > max)
-					max = zone->lowmem_reserve[j];
-			}
+			for_each_mem_region_in_node(region, pgdat->node_id) {
+				struct zone *zone = region->region_zones + i;
+				unsigned long max = 0;
+
+				/* Find valid and maximum lowmem_reserve in the zone */
+				for (j = i; j < MAX_NR_ZONES; j++) {
+					if (zone->lowmem_reserve[j] > max)
+						max = zone->lowmem_reserve[j];
+				}
 
-			/* we treat the high watermark as reserved pages. */
-			max += high_wmark_pages(zone);
+				/* we treat the high watermark as reserved pages. */
+				max += high_wmark_pages(zone);
 
-			if (max > zone->present_pages)
-				max = zone->present_pages;
-			reserve_pages += max;
-			/*
-			 * Lowmem reserves are not available to
-			 * GFP_HIGHUSER page cache allocations and
-			 * kswapd tries to balance zones to their high
-			 * watermark.  As a result, neither should be
-			 * regarded as dirtyable memory, to prevent a
-			 * situation where reclaim has to clean pages
-			 * in order to balance the zones.
-			 */
-			zone->dirty_balance_reserve = max;
+				if (max > zone->present_pages)
+					max = zone->present_pages;
+				reserve_pages += max;
+				/*
+				 * Lowmem reserves are not available to
+				 * GFP_HIGHUSER page cache allocations and
+				 * kswapd tries to balance zones to their high
+				 * watermark.  As a result, neither should be
+				 * regarded as dirtyable memory, to prevent a
+				 * situation where reclaim has to clean pages
+				 * in order to balance the zones.
+				 */
+				zone->dirty_balance_reserve = max;
+			}
 		}
 	}
 	dirty_balance_reserve = reserve_pages;
@@ -5154,27 +5162,30 @@ static void setup_per_zone_lowmem_reserve(void)
 {
 	struct pglist_data *pgdat;
 	enum zone_type j, idx;
+	struct mem_region *region;
 
 	for_each_online_pgdat(pgdat) {
 		for (j = 0; j < MAX_NR_ZONES; j++) {
-			struct zone *zone = pgdat->node_zones + j;
-			unsigned long present_pages = zone->present_pages;
+			for_each_mem_region_in_node(region, pgdat->node_id) {
+				struct zone *zone = region->region_zones + j;
+				unsigned long present_pages = zone->present_pages;
 
-			zone->lowmem_reserve[j] = 0;
+				zone->lowmem_reserve[j] = 0;
 
-			idx = j;
-			while (idx) {
-				struct zone *lower_zone;
+				idx = j;
+				while (idx) {
+					struct zone *lower_zone;
 
-				idx--;
+					idx--;
 
-				if (sysctl_lowmem_reserve_ratio[idx] < 1)
-					sysctl_lowmem_reserve_ratio[idx] = 1;
+					if (sysctl_lowmem_reserve_ratio[idx] < 1)
+						sysctl_lowmem_reserve_ratio[idx] = 1;
 
-				lower_zone = pgdat->node_zones + idx;
-				lower_zone->lowmem_reserve[j] = present_pages /
-					sysctl_lowmem_reserve_ratio[idx];
-				present_pages += lower_zone->present_pages;
+					lower_zone = region->region_zones + idx;
+					lower_zone->lowmem_reserve[j] = present_pages /
+						sysctl_lowmem_reserve_ratio[idx];
+					present_pages += lower_zone->present_pages;
+				}
 			}
 		}
 	}
@@ -6159,13 +6170,16 @@ void dump_page(struct page *page)
 /* reset zone->present_pages */
 void reset_zone_present_pages(void)
 {
+	struct mem_region *region;
 	struct zone *z;
 	int i, nid;
 
 	for_each_node_state(nid, N_HIGH_MEMORY) {
 		for (i = 0; i < MAX_NR_ZONES; i++) {
-			z = NODE_DATA(nid)->node_zones + i;
-			z->present_pages = 0;
+			for_each_mem_region_in_node(region, nid) {
+				z = region->region_zones + i;
+				z->present_pages = 0;
+			}
 		}
 	}
 }
@@ -6177,15 +6191,19 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn,
 	struct zone *z;
 	unsigned long zone_start_pfn, zone_end_pfn;
 	int i;
+	struct mem_region *region;
 
 	for (i = 0; i < MAX_NR_ZONES; i++) {
-		z = NODE_DATA(nid)->node_zones + i;
-		zone_start_pfn = z->zone_start_pfn;
-		zone_end_pfn = zone_start_pfn + z->spanned_pages;
-
-		/* if the two regions intersect */
-		if (!(zone_start_pfn >= end_pfn	|| zone_end_pfn <= start_pfn))
-			z->present_pages += min(end_pfn, zone_end_pfn) -
-					    max(start_pfn, zone_start_pfn);
+		for_each_mem_region_in_node(region, nid) {
+			z = region->region_zones + i;
+			zone_start_pfn = z->zone_start_pfn;
+			zone_end_pfn = zone_start_pfn + z->spanned_pages;
+
+			/* if the two regions intersect */
+			if (!(zone_start_pfn >= end_pfn	||
+						zone_end_pfn <= start_pfn))
+				z->present_pages += min(end_pfn, zone_end_pfn) -
+						    max(start_pfn, zone_start_pfn);
+		}
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 05/10] mm: Create zonelists
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
                   ` (3 preceding siblings ...)
  2012-11-06 19:40 ` [RFC PATCH 04/10] mm: Refer to zones from " Srivatsa S. Bhat
@ 2012-11-06 19:40 ` Srivatsa S. Bhat
  2012-11-06 19:41 ` [RFC PATCH 06/10] mm: Verify zonelists Srivatsa S. Bhat
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:40 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

The default zonelist that is node ordered contains all zones from within a
node and then all zones from the next node and so on. By introducing memory
regions, the primary aim is to group memory allocations to a given area of
memory together. The modified zonelists thus contain all zones from one
region, followed by all zones from the next region and so on. This ensures
that all the memory in one region is allocated before going over to the next
region, unless targetted memory allocations are performed.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 mm/page_alloc.c |   69 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 42 insertions(+), 27 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8e86b5..9c1d680 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3040,21 +3040,25 @@ static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist,
 				int nr_zones, enum zone_type zone_type)
 {
+	enum zone_type z_type = zone_type;
+	struct mem_region *region;
 	struct zone *zone;
 
 	BUG_ON(zone_type >= MAX_NR_ZONES);
 	zone_type++;
 
-	do {
-		zone_type--;
-		zone = pgdat->node_zones + zone_type;
-		if (populated_zone(zone)) {
-			zoneref_set_zone(zone,
-				&zonelist->_zonerefs[nr_zones++]);
-			check_highest_zone(zone_type);
-		}
-
-	} while (zone_type);
+	for_each_mem_region_in_node(region, pgdat->node_id) {
+		do {
+			zone_type--;
+			zone = region->region_zones + zone_type;
+			if (populated_zone(zone)) {
+				zoneref_set_zone(zone,
+					&zonelist->_zonerefs[nr_zones++]);
+				check_highest_zone(zone_type);
+			}
+		} while (zone_type);
+		zone_type = z_type + 1;
+	}
 	return nr_zones;
 }
 
@@ -3275,17 +3279,20 @@ static void build_zonelists_in_zone_order(pg_data_t *pgdat, int nr_nodes)
 	int zone_type;		/* needs to be signed */
 	struct zone *z;
 	struct zonelist *zonelist;
+	struct mem_region *region;
 
 	zonelist = &pgdat->node_zonelists[0];
 	pos = 0;
 	for (zone_type = MAX_NR_ZONES - 1; zone_type >= 0; zone_type--) {
 		for (j = 0; j < nr_nodes; j++) {
 			node = node_order[j];
-			z = &NODE_DATA(node)->node_zones[zone_type];
-			if (populated_zone(z)) {
-				zoneref_set_zone(z,
-					&zonelist->_zonerefs[pos++]);
-				check_highest_zone(zone_type);
+			for_each_mem_region_in_node(region, node) {
+				z = &region->region_zones[zone_type];
+				if (populated_zone(z)) {
+					zoneref_set_zone(z,
+						&zonelist->_zonerefs[pos++]);
+					check_highest_zone(zone_type);
+				}
 			}
 		}
 	}
@@ -3299,6 +3306,8 @@ static int default_zonelist_order(void)
 	unsigned long low_kmem_size,total_size;
 	struct zone *z;
 	int average_size;
+	struct mem_region *region;
+
 	/*
          * ZONE_DMA and ZONE_DMA32 can be very small area in the system.
 	 * If they are really small and used heavily, the system can fall
@@ -3310,12 +3319,15 @@ static int default_zonelist_order(void)
 	total_size = 0;
 	for_each_online_node(nid) {
 		for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
-			z = &NODE_DATA(nid)->node_zones[zone_type];
-			if (populated_zone(z)) {
-				if (zone_type < ZONE_NORMAL)
-					low_kmem_size += z->present_pages;
-				total_size += z->present_pages;
-			} else if (zone_type == ZONE_NORMAL) {
+			for_each_mem_region_in_node(region, nid) {
+				z = &region->region_zones[zone_type];
+				if (populated_zone(z)) {
+					if (zone_type < ZONE_NORMAL)
+						low_kmem_size +=
+							z->present_pages;
+
+					total_size += z->present_pages;
+				} else if (zone_type == ZONE_NORMAL) {
 				/*
 				 * If any node has only lowmem, then node order
 				 * is preferred to allow kernel allocations
@@ -3323,7 +3335,8 @@ static int default_zonelist_order(void)
 				 * on other nodes when there is an abundance of
 				 * lowmem available to allocate from.
 				 */
-				return ZONELIST_ORDER_NODE;
+					return ZONELIST_ORDER_NODE;
+				}
 			}
 		}
 	}
@@ -3341,11 +3354,13 @@ static int default_zonelist_order(void)
 		low_kmem_size = 0;
 		total_size = 0;
 		for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
-			z = &NODE_DATA(nid)->node_zones[zone_type];
-			if (populated_zone(z)) {
-				if (zone_type < ZONE_NORMAL)
-					low_kmem_size += z->present_pages;
-				total_size += z->present_pages;
+			for_each_mem_region_in_node(region, nid) {
+				z = &region->region_zones[zone_type];
+				if (populated_zone(z)) {
+					if (zone_type < ZONE_NORMAL)
+						low_kmem_size += z->present_pages;
+					total_size += z->present_pages;
+				}
 			}
 		}
 		if (low_kmem_size &&

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 06/10] mm: Verify zonelists
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
                   ` (4 preceding siblings ...)
  2012-11-06 19:40 ` [RFC PATCH 05/10] mm: Create zonelists Srivatsa S. Bhat
@ 2012-11-06 19:41 ` Srivatsa S. Bhat
  2012-11-06 19:41 ` [RFC PATCH 07/10] mm: Modify vmstat Srivatsa S. Bhat
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:41 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

Verify that the zonelists were created appropriately.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 mm/mm_init.c |   57 ++++++++++++++++++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 1ffd97a..5c19842 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -21,6 +21,7 @@ int mminit_loglevel;
 /* The zonelists are simply reported, validation is manual. */
 void mminit_verify_zonelist(void)
 {
+	struct mem_region *region;
 	int nid;
 
 	if (mminit_loglevel < MMINIT_VERIFY)
@@ -28,37 +29,39 @@ void mminit_verify_zonelist(void)
 
 	for_each_online_node(nid) {
 		pg_data_t *pgdat = NODE_DATA(nid);
-		struct zone *zone;
-		struct zoneref *z;
-		struct zonelist *zonelist;
-		int i, listid, zoneid;
-
-		BUG_ON(MAX_ZONELISTS > 2);
-		for (i = 0; i < MAX_ZONELISTS * MAX_NR_ZONES; i++) {
-
-			/* Identify the zone and nodelist */
-			zoneid = i % MAX_NR_ZONES;
-			listid = i / MAX_NR_ZONES;
-			zonelist = &pgdat->node_zonelists[listid];
-			zone = &pgdat->node_zones[zoneid];
-			if (!populated_zone(zone))
-				continue;
-
-			/* Print information about the zonelist */
-			printk(KERN_DEBUG "mminit::zonelist %s %d:%s = ",
-				listid > 0 ? "thisnode" : "general", nid,
-				zone->name);
-
-			/* Iterate the zonelist */
-			for_each_zone_zonelist(zone, z, zonelist, zoneid) {
+		for_each_mem_region_in_node(region, nid) {
+			struct zone *zone;
+			struct zoneref *z;
+			struct zonelist *zonelist;
+			int i, listid, zoneid;
+
+			BUG_ON(MAX_ZONELISTS > 2);
+			for (i = 0; i < MAX_ZONELISTS * MAX_NR_ZONES; i++) {
+
+				/* Identify the zone and nodelist */
+				zoneid = i % MAX_NR_ZONES;
+				listid = i / MAX_NR_ZONES;
+				zonelist = &pgdat->node_zonelists[listid];
+				zone = &region->region_zones[zoneid];
+				if (!populated_zone(zone))
+					continue;
+
+				/* Print information about the zonelist */
+				printk(KERN_DEBUG "mminit::zonelist %s %d:%s = ",
+					listid > 0 ? "thisnode" : "general", nid,
+					zone->name);
+
+				/* Iterate the zonelist */
+				for_each_zone_zonelist(zone, z, zonelist, zoneid) {
 #ifdef CONFIG_NUMA
-				printk(KERN_CONT "%d:%s ",
-					zone->node, zone->name);
+					printk(KERN_CONT "%d:%s ",
+						zone->node, zone->name);
 #else
-				printk(KERN_CONT "0:%s ", zone->name);
+					printk(KERN_CONT "0:%s ", zone->name);
 #endif /* CONFIG_NUMA */
+				}
+				printk(KERN_CONT "\n");
 			}
-			printk(KERN_CONT "\n");
 		}
 	}
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 07/10] mm: Modify vmstat
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
                   ` (5 preceding siblings ...)
  2012-11-06 19:41 ` [RFC PATCH 06/10] mm: Verify zonelists Srivatsa S. Bhat
@ 2012-11-06 19:41 ` Srivatsa S. Bhat
  2012-11-06 19:41 ` [RFC PATCH 08/10] mm: Modify vmscan Srivatsa S. Bhat
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:41 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

Change the way vmstats are collected. Since the zones are now present inside
regions, scan through all the regions to obtain zone specific statistics.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 include/linux/vmstat.h |   21 ++++++++++++++-------
 mm/vmstat.c            |   40 ++++++++++++++++++++++++----------------
 2 files changed, 38 insertions(+), 23 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 92a86b2..a782f05 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -151,20 +151,27 @@ extern unsigned long zone_reclaimable_pages(struct zone *zone);
 static inline unsigned long node_page_state(int node,
 				 enum zone_stat_item item)
 {
-	struct zone *zones = NODE_DATA(node)->node_zones;
+	unsigned long page_state = 0;
+	struct mem_region *region;
+
+	for_each_mem_region_in_node(region, node) {
+		struct zone *zones = region->region_zones;
+
+		page_state =
 
-	return
 #ifdef CONFIG_ZONE_DMA
-		zone_page_state(&zones[ZONE_DMA], item) +
+			zone_page_state(&zones[ZONE_DMA], item) +
 #endif
 #ifdef CONFIG_ZONE_DMA32
-		zone_page_state(&zones[ZONE_DMA32], item) +
+			zone_page_state(&zones[ZONE_DMA32], item) +
 #endif
 #ifdef CONFIG_HIGHMEM
-		zone_page_state(&zones[ZONE_HIGHMEM], item) +
+			zone_page_state(&zones[ZONE_HIGHMEM], item) +
 #endif
-		zone_page_state(&zones[ZONE_NORMAL], item) +
-		zone_page_state(&zones[ZONE_MOVABLE], item);
+			zone_page_state(&zones[ZONE_NORMAL], item) +
+			zone_page_state(&zones[ZONE_MOVABLE], item);
+	}
+	return page_state;
 }
 
 extern void zone_statistics(struct zone *, struct zone *, gfp_t gfp);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..86a92a6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -188,20 +188,24 @@ void refresh_zone_stat_thresholds(void)
 void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 				int (*calculate_pressure)(struct zone *))
 {
+	struct mem_region *region;
 	struct zone *zone;
 	int cpu;
 	int threshold;
 	int i;
 
 	for (i = 0; i < pgdat->nr_zones; i++) {
-		zone = &pgdat->node_zones[i];
-		if (!zone->percpu_drift_mark)
-			continue;
+		for_each_mem_region_in_node(region, pgdat->node_id) {
+			struct zone *zone = region->region_zones + i;
 
-		threshold = (*calculate_pressure)(zone);
-		for_each_possible_cpu(cpu)
-			per_cpu_ptr(zone->pageset, cpu)->stat_threshold
-							= threshold;
+			if (!zone->percpu_drift_mark)
+				continue;
+
+			threshold = (*calculate_pressure)(zone);
+			for_each_possible_cpu(cpu)
+				per_cpu_ptr(zone->pageset, cpu)->stat_threshold
+								= threshold;
+		}
 	}
 }
 
@@ -657,19 +661,23 @@ static void frag_stop(struct seq_file *m, void *arg)
 
 /* Walk all the zones in a node and print using a callback */
 static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
-		void (*print)(struct seq_file *m, pg_data_t *, struct zone *))
+			       void (*print)(struct seq_file *m, pg_data_t *,
+		               struct mem_region *, struct zone *))
 {
-	struct zone *zone;
-	struct zone *node_zones = pgdat->node_zones;
+	int i;
 	unsigned long flags;
+	struct mem_region *region;
 
-	for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
-		if (!populated_zone(zone))
-			continue;
+	for (i = 0; i < MAX_NR_ZONES; ++i) {
+		for_each_mem_region_in_node(region, pgdat->node_id) {
+			struct zone *zone = region->region_zones + i;
+			if (!populated_zone(zone))
+				continue;
 
-		spin_lock_irqsave(&zone->lock, flags);
-		print(m, pgdat, zone);
-		spin_unlock_irqrestore(&zone->lock, flags);
+			spin_lock_irqsave(&zone->lock, flags);
+			print(m, pgdat, region, zone);
+			spin_unlock_irqrestore(&zone->lock, flags);
+		}
 	}
 }
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 08/10] mm: Modify vmscan
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
                   ` (6 preceding siblings ...)
  2012-11-06 19:41 ` [RFC PATCH 07/10] mm: Modify vmstat Srivatsa S. Bhat
@ 2012-11-06 19:41 ` Srivatsa S. Bhat
  2012-11-06 19:41 ` [RFC PATCH 09/10] mm: Reflect memory region changes in zoneinfo Srivatsa S. Bhat
  2012-11-06 19:42 ` [RFC PATCH 10/10] mm: Create memory regions at boot-up Srivatsa S. Bhat
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:41 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

Modify vmscan to take into account the changed node-zone hierarchy.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 mm/vmscan.c |  364 +++++++++++++++++++++++++++++++----------------------------
 1 file changed, 193 insertions(+), 171 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2624edc..4d8f303 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2209,11 +2209,14 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
 	unsigned long free_pages = 0;
 	int i;
 	bool wmark_ok;
+	struct mem_region *region;
 
 	for (i = 0; i <= ZONE_NORMAL; i++) {
-		zone = &pgdat->node_zones[i];
-		pfmemalloc_reserve += min_wmark_pages(zone);
-		free_pages += zone_page_state(zone, NR_FREE_PAGES);
+		for_each_mem_region_in_node(region, pgdat->node_id) {
+			zone = &region->region_zones[i];
+			pfmemalloc_reserve += min_wmark_pages(zone);
+			free_pages += zone_page_state(zone, NR_FREE_PAGES);
+		}
 	}
 
 	wmark_ok = free_pages > pfmemalloc_reserve / 2;
@@ -2442,10 +2445,16 @@ static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
 						int classzone_idx)
 {
 	unsigned long present_pages = 0;
+	struct mem_region *region;
 	int i;
 
-	for (i = 0; i <= classzone_idx; i++)
-		present_pages += pgdat->node_zones[i].present_pages;
+	for (i = 0; i <= classzone_idx; i++) {
+		for_each_mem_region_in_node(region, pgdat->node_id) {
+			struct zone *zone = region->region_zones + i;
+
+			present_pages += zone->present_pages;
+		}
+	}
 
 	/* A special case here: if zone has no page, we think it's balanced */
 	return balanced_pages >= (present_pages >> 2);
@@ -2463,6 +2472,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
 	int i;
 	unsigned long balanced = 0;
 	bool all_zones_ok = true;
+	struct mem_region *region;
 
 	/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
 	if (remaining)
@@ -2484,27 +2494,29 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
 
 	/* Check the watermark levels */
 	for (i = 0; i <= classzone_idx; i++) {
-		struct zone *zone = pgdat->node_zones + i;
+		for_each_mem_region_in_node(region, pgdat->node_id) {
+			struct zone *zone = region->region_zones + i;
 
-		if (!populated_zone(zone))
-			continue;
+			if (!populated_zone(zone))
+				continue;
 
-		/*
-		 * balance_pgdat() skips over all_unreclaimable after
-		 * DEF_PRIORITY. Effectively, it considers them balanced so
-		 * they must be considered balanced here as well if kswapd
-		 * is to sleep
-		 */
-		if (zone->all_unreclaimable) {
-			balanced += zone->present_pages;
-			continue;
-		}
+				/*
+			 * balance_pgdat() skips over all_unreclaimable after
+			 * DEF_PRIORITY. Effectively, it considers them balanced so
+			 * they must be considered balanced here as well if kswapd
+			 * is to sleep
+			 */
+			if (zone->all_unreclaimable) {
+				balanced += zone->present_pages;
+				continue;
+			}
 
-		if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone),
-							i, 0))
-			all_zones_ok = false;
-		else
-			balanced += zone->present_pages;
+			if (!zone_watermark_ok_safe(zone, order,
+						high_wmark_pages(zone), i, 0))
+				all_zones_ok = false;
+			else
+				balanced += zone->present_pages;
+		}
 	}
 
 	/*
@@ -2565,6 +2577,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
 	struct shrink_control shrink = {
 		.gfp_mask = sc.gfp_mask,
 	};
+	struct mem_region *region;
 loop_again:
 	total_scanned = 0;
 	sc.priority = DEF_PRIORITY;
@@ -2583,49 +2596,55 @@ loop_again:
 		 * Scan in the highmem->dma direction for the highest
 		 * zone which needs scanning
 		 */
-		for (i = pgdat->nr_zones - 1; i >= 0; i--) {
-			struct zone *zone = pgdat->node_zones + i;
+		for (i = pgdat->nr_node_zone_types - 1; i >= 0; i--) {
+			for_each_mem_region_in_node(region, pgdat->node_id) {
+				struct zone *zone = region->region_zones + i;
 
-			if (!populated_zone(zone))
-				continue;
+				if (!populated_zone(zone))
+					continue;
 
-			if (zone->all_unreclaimable &&
-			    sc.priority != DEF_PRIORITY)
-				continue;
+				if (zone->all_unreclaimable &&
+				    sc.priority != DEF_PRIORITY)
+					continue;
 
-			/*
-			 * Do some background aging of the anon list, to give
-			 * pages a chance to be referenced before reclaiming.
-			 */
-			age_active_anon(zone, &sc);
+				/*
+				 * Do some background aging of the anon list, to give
+				 * pages a chance to be referenced before reclaiming.
+				 */
+				age_active_anon(zone, &sc);
 
-			/*
-			 * If the number of buffer_heads in the machine
-			 * exceeds the maximum allowed level and this node
-			 * has a highmem zone, force kswapd to reclaim from
-			 * it to relieve lowmem pressure.
-			 */
-			if (buffer_heads_over_limit && is_highmem_idx(i)) {
-				end_zone = i;
-				break;
-			}
+				/*
+				 * If the number of buffer_heads in the machine
+				 * exceeds the maximum allowed level and this node
+				 * has a highmem zone, force kswapd to reclaim from
+				 * it to relieve lowmem pressure.
+				 */
+				if (buffer_heads_over_limit && is_highmem_idx(i)) {
+					end_zone = i;
+					goto out_loop;
+				}
 
-			if (!zone_watermark_ok_safe(zone, order,
-					high_wmark_pages(zone), 0, 0)) {
-				end_zone = i;
-				break;
-			} else {
-				/* If balanced, clear the congested flag */
-				zone_clear_flag(zone, ZONE_CONGESTED);
+				if (!zone_watermark_ok_safe(zone, order,
+						high_wmark_pages(zone), 0, 0)) {
+					end_zone = i;
+					goto out_loop;
+				} else {
+					/* If balanced, clear the congested flag */
+					zone_clear_flag(zone, ZONE_CONGESTED);
+				}
 			}
 		}
+
+	out_loop:
 		if (i < 0)
 			goto out;
 
 		for (i = 0; i <= end_zone; i++) {
-			struct zone *zone = pgdat->node_zones + i;
+			for_each_mem_region_in_node(region, pgdat->node_id) {
+				struct zone *zone = region->region_zones + i;
 
-			lru_pages += zone_reclaimable_pages(zone);
+				lru_pages += zone_reclaimable_pages(zone);
+			}
 		}
 
 		/*
@@ -2638,108 +2657,109 @@ loop_again:
 		 * cause too much scanning of the lower zones.
 		 */
 		for (i = 0; i <= end_zone; i++) {
-			struct zone *zone = pgdat->node_zones + i;
-			int nr_slab, testorder;
-			unsigned long balance_gap;
-
-			if (!populated_zone(zone))
-				continue;
+			for_each_mem_region_in_node(region, pgdat->node_id) {
+				struct zone *zone = region->region_zones + i;
+				int nr_slab, testorder;
+				unsigned long balance_gap;
 
-			if (zone->all_unreclaimable &&
-			    sc.priority != DEF_PRIORITY)
-				continue;
-
-			sc.nr_scanned = 0;
-
-			nr_soft_scanned = 0;
-			/*
-			 * Call soft limit reclaim before calling shrink_zone.
-			 */
-			nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
-							order, sc.gfp_mask,
-							&nr_soft_scanned);
-			sc.nr_reclaimed += nr_soft_reclaimed;
-			total_scanned += nr_soft_scanned;
-
-			/*
-			 * We put equal pressure on every zone, unless
-			 * one zone has way too many pages free
-			 * already. The "too many pages" is defined
-			 * as the high wmark plus a "gap" where the
-			 * gap is either the low watermark or 1%
-			 * of the zone, whichever is smaller.
-			 */
-			balance_gap = min(low_wmark_pages(zone),
-				(zone->present_pages +
-					KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
-				KSWAPD_ZONE_BALANCE_GAP_RATIO);
-			/*
-			 * Kswapd reclaims only single pages with compaction
-			 * enabled. Trying too hard to reclaim until contiguous
-			 * free pages have become available can hurt performance
-			 * by evicting too much useful data from memory.
-			 * Do not reclaim more than needed for compaction.
-			 */
-			testorder = order;
-			if (COMPACTION_BUILD && order &&
-					compaction_suitable(zone, order) !=
-						COMPACT_SKIPPED)
-				testorder = 0;
-
-			if ((buffer_heads_over_limit && is_highmem_idx(i)) ||
-				    !zone_watermark_ok_safe(zone, testorder,
-					high_wmark_pages(zone) + balance_gap,
-					end_zone, 0)) {
-				shrink_zone(zone, &sc);
-
-				reclaim_state->reclaimed_slab = 0;
-				nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages);
-				sc.nr_reclaimed += reclaim_state->reclaimed_slab;
-				total_scanned += sc.nr_scanned;
+				if (!populated_zone(zone))
+					continue;
 
-				if (nr_slab == 0 && !zone_reclaimable(zone))
-					zone->all_unreclaimable = 1;
-			}
+				if (zone->all_unreclaimable &&
+				    sc.priority != DEF_PRIORITY)
+					continue;
 
-			/*
-			 * If we've done a decent amount of scanning and
-			 * the reclaim ratio is low, start doing writepage
-			 * even in laptop mode
-			 */
-			if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
-			    total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
-				sc.may_writepage = 1;
+				sc.nr_scanned = 0;
 
-			if (zone->all_unreclaimable) {
-				if (end_zone && end_zone == i)
-					end_zone--;
-				continue;
-			}
+				nr_soft_scanned = 0;
+				/*
+				 * Call soft limit reclaim before calling shrink_zone.
+				 */
+				nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
+								order, sc.gfp_mask,
+								&nr_soft_scanned);
+				sc.nr_reclaimed += nr_soft_reclaimed;
+				total_scanned += nr_soft_scanned;
 
-			if (!zone_watermark_ok_safe(zone, testorder,
-					high_wmark_pages(zone), end_zone, 0)) {
-				all_zones_ok = 0;
 				/*
-				 * We are still under min water mark.  This
-				 * means that we have a GFP_ATOMIC allocation
-				 * failure risk. Hurry up!
+				 * We put equal pressure on every zone, unless
+				 * one zone has way too many pages free
+				 * already. The "too many pages" is defined
+				 * as the high wmark plus a "gap" where the
+				 * gap is either the low watermark or 1%
+				 * of the zone, whichever is smaller.
 				 */
-				if (!zone_watermark_ok_safe(zone, order,
-					    min_wmark_pages(zone), end_zone, 0))
-					has_under_min_watermark_zone = 1;
-			} else {
+				balance_gap = min(low_wmark_pages(zone),
+					(zone->present_pages +
+						KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
+					KSWAPD_ZONE_BALANCE_GAP_RATIO);
 				/*
-				 * If a zone reaches its high watermark,
-				 * consider it to be no longer congested. It's
-				 * possible there are dirty pages backed by
-				 * congested BDIs but as pressure is relieved,
-				 * speculatively avoid congestion waits
+				 * Kswapd reclaims only single pages with compaction
+				 * enabled. Trying too hard to reclaim until contiguous
+				 * free pages have become available can hurt performance
+				 * by evicting too much useful data from memory.
+				 * Do not reclaim more than needed for compaction.
 				 */
-				zone_clear_flag(zone, ZONE_CONGESTED);
-				if (i <= *classzone_idx)
-					balanced += zone->present_pages;
-			}
+				testorder = order;
+				if (COMPACTION_BUILD && order &&
+						compaction_suitable(zone, order) !=
+							COMPACT_SKIPPED)
+					testorder = 0;
+
+				if ((buffer_heads_over_limit && is_highmem_idx(i)) ||
+					    !zone_watermark_ok_safe(zone, testorder,
+						high_wmark_pages(zone) + balance_gap,
+						end_zone, 0)) {
+					shrink_zone(zone, &sc);
+
+					reclaim_state->reclaimed_slab = 0;
+					nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages);
+					sc.nr_reclaimed += reclaim_state->reclaimed_slab;
+					total_scanned += sc.nr_scanned;
+
+					if (nr_slab == 0 && !zone_reclaimable(zone))
+						zone->all_unreclaimable = 1;
+				}
 
+				/*
+				 * If we've done a decent amount of scanning and
+				 * the reclaim ratio is low, start doing writepage
+				 * even in laptop mode
+				 */
+				if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
+				    total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
+					sc.may_writepage = 1;
+
+				if (zone->all_unreclaimable) {
+					if (end_zone && end_zone == i)
+						end_zone--;
+					continue;
+				}
+
+				if (!zone_watermark_ok_safe(zone, testorder,
+						high_wmark_pages(zone), end_zone, 0)) {
+					all_zones_ok = 0;
+					/*
+					 * We are still under min water mark.  This
+					 * means that we have a GFP_ATOMIC allocation
+					 * failure risk. Hurry up!
+					 */
+					if (!zone_watermark_ok_safe(zone, order,
+						    min_wmark_pages(zone), end_zone, 0))
+						has_under_min_watermark_zone = 1;
+				} else {
+					/*
+					 * If a zone reaches its high watermark,
+					 * consider it to be no longer congested. It's
+					 * possible there are dirty pages backed by
+					 * congested BDIs but as pressure is relieved,
+					 * speculatively avoid congestion waits
+					 */
+					zone_clear_flag(zone, ZONE_CONGESTED);
+					if (i <= *classzone_idx)
+						balanced += zone->present_pages;
+				}
+			}
 		}
 
 		/*
@@ -2817,34 +2837,36 @@ out:
 		int zones_need_compaction = 1;
 
 		for (i = 0; i <= end_zone; i++) {
-			struct zone *zone = pgdat->node_zones + i;
+			for_each_mem_region_in_node(region, pgdat->node_id) {
+				struct zone *zone = region->region_zones + i;
 
-			if (!populated_zone(zone))
-				continue;
+				if (!populated_zone(zone))
+					continue;
 
-			if (zone->all_unreclaimable &&
-			    sc.priority != DEF_PRIORITY)
-				continue;
+				if (zone->all_unreclaimable &&
+				    sc.priority != DEF_PRIORITY)
+					continue;
 
-			/* Would compaction fail due to lack of free memory? */
-			if (COMPACTION_BUILD &&
-			    compaction_suitable(zone, order) == COMPACT_SKIPPED)
-				goto loop_again;
+				/* Would compaction fail due to lack of free memory? */
+				if (COMPACTION_BUILD &&
+				    compaction_suitable(zone, order) == COMPACT_SKIPPED)
+					goto loop_again;
 
-			/* Confirm the zone is balanced for order-0 */
-			if (!zone_watermark_ok(zone, 0,
-					high_wmark_pages(zone), 0, 0)) {
-				order = sc.order = 0;
-				goto loop_again;
-			}
+				/* Confirm the zone is balanced for order-0 */
+				if (!zone_watermark_ok(zone, 0,
+						high_wmark_pages(zone), 0, 0)) {
+					order = sc.order = 0;
+					goto loop_again;
+				}
 
-			/* Check if the memory needs to be defragmented. */
-			if (zone_watermark_ok(zone, order,
-				    low_wmark_pages(zone), *classzone_idx, 0))
-				zones_need_compaction = 0;
+				/* Check if the memory needs to be defragmented. */
+				if (zone_watermark_ok(zone, order,
+					    low_wmark_pages(zone), *classzone_idx, 0))
+					zones_need_compaction = 0;
 
-			/* If balanced, clear the congested flag */
-			zone_clear_flag(zone, ZONE_CONGESTED);
+				/* If balanced, clear the congested flag */
+				zone_clear_flag(zone, ZONE_CONGESTED);
+			}
 		}
 
 		if (zones_need_compaction)
@@ -2966,7 +2988,7 @@ static int kswapd(void *p)
 
 	order = new_order = 0;
 	balanced_order = 0;
-	classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
+	classzone_idx = new_classzone_idx = pgdat->nr_node_zone_types - 1;
 	balanced_classzone_idx = classzone_idx;
 	for ( ; ; ) {
 		int ret;
@@ -2981,7 +3003,7 @@ static int kswapd(void *p)
 			new_order = pgdat->kswapd_max_order;
 			new_classzone_idx = pgdat->classzone_idx;
 			pgdat->kswapd_max_order =  0;
-			pgdat->classzone_idx = pgdat->nr_zones - 1;
+			pgdat->classzone_idx = pgdat->nr_node_zone_types - 1;
 		}
 
 		if (order < new_order || classzone_idx > new_classzone_idx) {
@@ -2999,7 +3021,7 @@ static int kswapd(void *p)
 			new_order = order;
 			new_classzone_idx = classzone_idx;
 			pgdat->kswapd_max_order = 0;
-			pgdat->classzone_idx = pgdat->nr_zones - 1;
+			pgdat->classzone_idx = pgdat->nr_node_zone_types - 1;
 		}
 
 		ret = try_to_freeze();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 09/10] mm: Reflect memory region changes in zoneinfo
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
                   ` (7 preceding siblings ...)
  2012-11-06 19:41 ` [RFC PATCH 08/10] mm: Modify vmscan Srivatsa S. Bhat
@ 2012-11-06 19:41 ` Srivatsa S. Bhat
  2012-11-06 19:42 ` [RFC PATCH 10/10] mm: Create memory regions at boot-up Srivatsa S. Bhat
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:41 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

This patch modifies the output of /proc/zoneinfo to take the memory regions
into into account. Below is the output on a KVM guest booted with 4 regions,
each of size 512MB.

cat /proc/zoneinfo:

Node 0, Region 0, zone      DMA
  pages free     3975
        min      11
        low      13
        high     16
        scanned  0
        spanned  4080
        present  3977
    nr_free_pages 3975
    nr_inactive_anon 0
    nr_active_anon 0
    nr_inactive_file 0
    nr_active_file 0
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 0
    nr_mapped    0
    nr_file_pages 0
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 2
    nr_page_table_pages 0
    nr_kernel_stack 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     0
    nr_dirtied   0
    nr_written   0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 471, 471, 471)
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 6
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 6
    cpu: 2
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 6
    cpu: 3
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 6
  all_unreclaimable: 0
  start_pfn:         16
  inactive_ratio:    1
Node 0, Region 0, zone    DMA32
  pages free     107720
        min      338
        low      422
        high     507
        scanned  0
        spanned  126992
        present  120642
.....
Node 0, Region 1, zone    DMA32
  pages free     131072
        min      367
        low      458
        high     550
        scanned  0
        spanned  131072
        present  131072
.....
Node 0, Region 2, zone    DMA32
  pages free     131072
        min      367
        low      458
        high     550
        scanned  0
        spanned  131072
        present  131072
.....
Node 0, Region 3, zone    DMA32
  pages free     121880
        min      341
        low      426
        high     511
        scanned  0
        spanned  131054
        present  121928
.....

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 mm/vmstat.c |   31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 86a92a6..b3be9ba 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -179,9 +179,12 @@ void refresh_zone_stat_thresholds(void)
 		 */
 		tolerate_drift = low_wmark_pages(zone) - min_wmark_pages(zone);
 		max_drift = num_online_cpus() * threshold;
-		if (max_drift > tolerate_drift)
+		if (max_drift > tolerate_drift) {
 			zone->percpu_drift_mark = high_wmark_pages(zone) +
 					max_drift;
+			printk("zone %s drift mark %lu \n", zone->name,
+						zone->percpu_drift_mark);
+		}
 	}
 }
 
@@ -189,12 +192,11 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
 				int (*calculate_pressure)(struct zone *))
 {
 	struct mem_region *region;
-	struct zone *zone;
 	int cpu;
 	int threshold;
 	int i;
 
-	for (i = 0; i < pgdat->nr_zones; i++) {
+	for (i = 0; i < pgdat->nr_node_zone_types; i++) {
 		for_each_mem_region_in_node(region, pgdat->node_id) {
 			struct zone *zone = region->region_zones + i;
 
@@ -818,11 +820,12 @@ const char * const vmstat_text[] = {
 
 #ifdef CONFIG_PROC_FS
 static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
-						struct zone *zone)
+					struct mem_region *region, struct zone *zone)
 {
 	int order;
 
-	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+	seq_printf(m, "Node %d, REG %d, zone %8s ", pgdat->node_id,
+						region->region, zone->name);
 	for (order = 0; order < MAX_ORDER; ++order)
 		seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
 	seq_putc(m, '\n');
@@ -838,14 +841,15 @@ static int frag_show(struct seq_file *m, void *arg)
 	return 0;
 }
 
-static void pagetypeinfo_showfree_print(struct seq_file *m,
-					pg_data_t *pgdat, struct zone *zone)
+static void pagetypeinfo_showfree_print(struct seq_file *m, pg_data_t *pgdat,
+						struct mem_region *region, struct zone *zone)
 {
 	int order, mtype;
 
 	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
-		seq_printf(m, "Node %4d, zone %8s, type %12s ",
+		seq_printf(m, "Node %4d, Region %d, zone %8s, type %12s ",
 					pgdat->node_id,
+					region->region,
 					zone->name,
 					migratetype_names[mtype]);
 		for (order = 0; order < MAX_ORDER; ++order) {
@@ -880,8 +884,8 @@ static int pagetypeinfo_showfree(struct seq_file *m, void *arg)
 	return 0;
 }
 
-static void pagetypeinfo_showblockcount_print(struct seq_file *m,
-					pg_data_t *pgdat, struct zone *zone)
+static void pagetypeinfo_showblockcount_print(struct seq_file *m, pg_data_t *pgdat,
+							struct mem_region *region, struct zone *zone)
 {
 	int mtype;
 	unsigned long pfn;
@@ -908,7 +912,7 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
 	}
 
 	/* Print counts */
-	seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+	seq_printf(m, "Node %d, Region %d, zone %8s ", pgdat->node_id, region->region, zone->name);
 	for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
 		seq_printf(m, "%12lu ", count[mtype]);
 	seq_putc(m, '\n');
@@ -989,10 +993,11 @@ static const struct file_operations pagetypeinfo_file_ops = {
 };
 
 static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
-							struct zone *zone)
+					struct mem_region *region, struct zone *zone)
 {
 	int i;
-	seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
+	seq_printf(m, "Node %d, Region %d, zone %8s", pgdat->node_id,
+						region->region, zone->name);
 	seq_printf(m,
 		   "\n  pages free     %lu"
 		   "\n        min      %lu"

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 10/10] mm: Create memory regions at boot-up
  2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
                   ` (8 preceding siblings ...)
  2012-11-06 19:41 ` [RFC PATCH 09/10] mm: Reflect memory region changes in zoneinfo Srivatsa S. Bhat
@ 2012-11-06 19:42 ` Srivatsa S. Bhat
  9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:42 UTC (permalink / raw)
  To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
	loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
  Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
	santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
	linux-kernel

From: Ankita Garg <gargankita@gmail.com>

Memory regions are created at boot up time, from the information obtained
from the firmware. But since the firmware doesn't yet export information
about memory units that can be independently power managed, for the purpose
of demonstration, we hard code memory region size to be 512MB.

In future, we expect ACPI 5.0 compliant firmware to expose the required
info in the form of MPST (Memory Power State Table) tables.

Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 mm/page_alloc.c |   28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9c1d680..13d1b2f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4491,6 +4491,33 @@ void __init set_pageblock_order(void)
 
 #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
 
+#define REGIONS_SIZE	(512 << 20) >> PAGE_SHIFT
+
+static void init_node_memory_regions(struct pglist_data *pgdat)
+{
+	int cnt = 0;
+	unsigned long i;
+	unsigned long start_pfn = pgdat->node_start_pfn;
+	unsigned long spanned_pages = pgdat->node_spanned_pages;
+	unsigned long total = 0;
+
+	for (i = start_pfn; i < start_pfn + spanned_pages; i += REGIONS_SIZE) {
+		struct mem_region *region = &pgdat->node_regions[cnt];
+
+		region->start_pfn = i;
+		if ((spanned_pages - total) < REGIONS_SIZE)
+			region->spanned_pages = spanned_pages - total;
+		else
+			region->spanned_pages = REGIONS_SIZE;
+
+		region->node = pgdat->node_id;
+		region->region = cnt;
+		pgdat->nr_node_regions++;
+		total += region->spanned_pages;
+		cnt++;
+	}
+}
+
 /*
  * Set up the zone data structures:
  *   - mark all pages reserved
@@ -4653,6 +4680,7 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
 		(unsigned long)pgdat->node_mem_map);
 #endif
 
+	init_node_memory_regions(pgdat);
 	free_area_init_core(pgdat, zones_size, zholes_size);
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-11-06 19:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
2012-11-06 19:39 ` [RFC PATCH 01/10] mm: Introduce the memory regions data structure Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 02/10] mm: Helper routines Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 03/10] mm: Init zones inside memory regions Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 04/10] mm: Refer to zones from " Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 05/10] mm: Create zonelists Srivatsa S. Bhat
2012-11-06 19:41 ` [RFC PATCH 06/10] mm: Verify zonelists Srivatsa S. Bhat
2012-11-06 19:41 ` [RFC PATCH 07/10] mm: Modify vmstat Srivatsa S. Bhat
2012-11-06 19:41 ` [RFC PATCH 08/10] mm: Modify vmscan Srivatsa S. Bhat
2012-11-06 19:41 ` [RFC PATCH 09/10] mm: Reflect memory region changes in zoneinfo Srivatsa S. Bhat
2012-11-06 19:42 ` [RFC PATCH 10/10] mm: Create memory regions at boot-up Srivatsa S. Bhat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).