* [RFC PATCH 01/10] mm: Introduce the memory regions data structure
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
@ 2012-11-06 19:39 ` Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 02/10] mm: Helper routines Srivatsa S. Bhat
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:39 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
Memory region data structure is created under a NUMA node. Each NUMA node can
have multiple memory regions, depending upon the platform configuration for
power management. Each memory region contains zones, which is the entity from
which memory is allocated by the buddy allocator.
-------------
| pg_data_t |
-------------
| |
------ -------
v v
---------------- ----------------
| mem_region_t | | mem_region_t |
---------------- ---------------- -------------
| |...........| zone0 | ....
v -------------
-----------------------------
| zone0 | zone1 | zone3 | ..|
-----------------------------
Each memory region contains a zone array for the zones belonging to that region,
in addition to other fields like node id, index of the region in the node, start
pfn of the pages in that region and the number of pages spanned in the region.
The zone array inside the regions is statically allocated at this point.
ToDo:
However, since the number of regions actually present on the system might be much
smaller than the maximum allowed, dynamic bootmem allocation could be used to save
memory.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
include/linux/mmzone.h | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 50aaca8..3f9b106 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -86,6 +86,7 @@ struct free_area {
};
struct pglist_data;
+struct mem_region;
/*
* zone->lock and zone->lru_lock are two of the hottest locks in the kernel.
@@ -465,6 +466,8 @@ struct zone {
* Discontig memory support fields.
*/
struct pglist_data *zone_pgdat;
+ struct mem_region *zone_mem_region;
+
/* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */
unsigned long zone_start_pfn;
@@ -533,6 +536,8 @@ static inline int zone_is_oom_locked(const struct zone *zone)
return test_bit(ZONE_OOM_LOCKED, &zone->flags);
}
+#define MAX_NR_REGIONS 256
+
/*
* The "priority" of VM scanning is how much of the queues we will scan in one
* go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
@@ -541,7 +546,7 @@ static inline int zone_is_oom_locked(const struct zone *zone)
#define DEF_PRIORITY 12
/* Maximum number of zones on a zonelist */
-#define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
+#define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_REGIONS * MAX_NR_ZONES)
#ifdef CONFIG_NUMA
@@ -671,6 +676,18 @@ struct node_active_region {
extern struct page *mem_map;
#endif
+struct mem_region {
+ struct zone region_zones[MAX_NR_ZONES];
+ int nr_region_zones;
+
+ int node;
+ int region;
+
+ unsigned long start_pfn;
+ unsigned long spanned_pages;
+};
+
+
/*
* The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM
* (mostly NUMA machines?) to denote a higher-level memory zone than the
@@ -684,9 +701,10 @@ extern struct page *mem_map;
*/
struct bootmem_data;
typedef struct pglist_data {
- struct zone node_zones[MAX_NR_ZONES];
+ struct mem_region node_regions[MAX_NR_REGIONS];
+ int nr_node_regions;
struct zonelist node_zonelists[MAX_ZONELISTS];
- int nr_zones;
+ int nr_node_zone_types;
#ifdef CONFIG_FLAT_NODE_MEM_MAP /* means !SPARSEMEM */
struct page *node_mem_map;
#ifdef CONFIG_MEMCG
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 02/10] mm: Helper routines
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
2012-11-06 19:39 ` [RFC PATCH 01/10] mm: Introduce the memory regions data structure Srivatsa S. Bhat
@ 2012-11-06 19:40 ` Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 03/10] mm: Init zones inside memory regions Srivatsa S. Bhat
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:40 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
With the introduction of regions, helper routines are needed to walk through
all the regions and zones inside a node. This patch adds these helper
routines.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
include/linux/mm.h | 7 ++-----
include/linux/mmzone.h | 22 +++++++++++++++++++---
mm/mmzone.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 65 insertions(+), 12 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fa06804..70f1009 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -693,11 +693,6 @@ static inline int page_to_nid(const struct page *page)
}
#endif
-static inline struct zone *page_zone(const struct page *page)
-{
- return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
-}
-
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
static inline void set_page_section(struct page *page, unsigned long section)
{
@@ -711,6 +706,8 @@ static inline unsigned long page_to_section(const struct page *page)
}
#endif
+struct zone *page_zone(struct page *page);
+
static inline void set_page_zone(struct page *page, enum zone_type zone)
{
page->flags &= ~(ZONES_MASK << ZONES_PGSHIFT);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3f9b106..6f5d533 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -800,7 +800,7 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long);
/*
* zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc.
*/
-#define zone_idx(zone) ((zone) - (zone)->zone_pgdat->node_zones)
+#define zone_idx(zone) ((zone) - (zone)->zone_mem_region->region_zones)
static inline int populated_zone(struct zone *zone)
{
@@ -907,7 +907,9 @@ extern struct pglist_data contig_page_data;
extern struct pglist_data *first_online_pgdat(void);
extern struct pglist_data *next_online_pgdat(struct pglist_data *pgdat);
+extern struct zone *first_zone(void);
extern struct zone *next_zone(struct zone *zone);
+extern struct mem_region *next_mem_region(struct mem_region *region);
/**
* for_each_online_pgdat - helper macro to iterate over all online nodes
@@ -917,6 +919,20 @@ extern struct zone *next_zone(struct zone *zone);
for (pgdat = first_online_pgdat(); \
pgdat; \
pgdat = next_online_pgdat(pgdat))
+
+
+/**
+ * for_each_mem_region_in_node - helper macro to iterate over all the memory
+ * regions in a node.
+ * @region - pointer to a struct mem_region variable
+ * @nid - node id of the node
+ */
+#define for_each_mem_region_in_node(region, nid) \
+ for (region = (NODE_DATA(nid))->node_regions; \
+ region; \
+ region = next_mem_region(region))
+
+
/**
* for_each_zone - helper macro to iterate over all memory zones
* @zone - pointer to struct zone variable
@@ -925,12 +941,12 @@ extern struct zone *next_zone(struct zone *zone);
* fills it in.
*/
#define for_each_zone(zone) \
- for (zone = (first_online_pgdat())->node_zones; \
+ for (zone = (first_zone()); \
zone; \
zone = next_zone(zone))
#define for_each_populated_zone(zone) \
- for (zone = (first_online_pgdat())->node_zones; \
+ for (zone = (first_zone()); \
zone; \
zone = next_zone(zone)) \
if (!populated_zone(zone)) \
diff --git a/mm/mmzone.c b/mm/mmzone.c
index 3cef80f..d32d10a 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -23,22 +23,62 @@ struct pglist_data *next_online_pgdat(struct pglist_data *pgdat)
return NODE_DATA(nid);
}
+struct mem_region *next_mem_region(struct mem_region *region)
+{
+ int next_region = region->region + 1;
+ pg_data_t *pgdat = NODE_DATA(region->node);
+
+ if (next_region == pgdat->nr_node_regions)
+ return NULL;
+ return &(pgdat->node_regions[next_region]);
+}
+
+struct zone *first_zone(void)
+{
+ return (first_online_pgdat())->node_regions[0].region_zones;
+}
+
+struct zone *page_zone(struct page *page)
+{
+ pg_data_t *pgdat = NODE_DATA(page_to_nid(page));
+ unsigned long pfn = page_to_pfn(page);
+ struct mem_region *region;
+
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ unsigned long end_pfn = region->start_pfn +
+ region->spanned_pages;
+
+ if ((pfn >= region->start_pfn) && (pfn < end_pfn))
+ return ®ion->region_zones[page_zonenum(page)];
+ }
+
+ return NULL;
+}
+
+
/*
* next_zone - helper magic for for_each_zone()
*/
struct zone *next_zone(struct zone *zone)
{
pg_data_t *pgdat = zone->zone_pgdat;
+ struct mem_region *region = zone->zone_mem_region;
+
+ if (zone < region->region_zones + MAX_NR_ZONES - 1)
+ return ++zone;
- if (zone < pgdat->node_zones + MAX_NR_ZONES - 1)
- zone++;
- else {
+ region = next_mem_region(region);
+
+ if (region) {
+ zone = region->region_zones;
+ } else {
pgdat = next_online_pgdat(pgdat);
if (pgdat)
- zone = pgdat->node_zones;
+ zone = pgdat->node_regions[0].region_zones;
else
zone = NULL;
}
+
return zone;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 03/10] mm: Init zones inside memory regions
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
2012-11-06 19:39 ` [RFC PATCH 01/10] mm: Introduce the memory regions data structure Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 02/10] mm: Helper routines Srivatsa S. Bhat
@ 2012-11-06 19:40 ` Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 04/10] mm: Refer to zones from " Srivatsa S. Bhat
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:40 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
This patch initializes zones inside memory regions. Each memory region is
scanned for the pfns present in it. The intersection of the range with that of
a zone is setup as the amount of memory present in the zone in that region.
Most of the other setup related steps continue to be unmodified.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
include/linux/mm.h | 2 +
mm/page_alloc.c | 175 ++++++++++++++++++++++++++++++++++------------------
2 files changed, 118 insertions(+), 59 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 70f1009..f57eef0 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1320,6 +1320,8 @@ extern unsigned long absent_pages_in_range(unsigned long start_pfn,
unsigned long end_pfn);
extern void get_pfn_range_for_nid(unsigned int nid,
unsigned long *start_pfn, unsigned long *end_pfn);
+extern void get_pfn_range_for_region(int nid, int region,
+ unsigned long *start_pfn, unsigned long *end_pfn);
extern unsigned long find_min_pfn_with_active_regions(void);
extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb90971..c807272 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4321,6 +4321,7 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid,
}
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
static inline unsigned long __meminit zone_spanned_pages_in_node(int nid,
unsigned long zone_type,
unsigned long *zones_size)
@@ -4340,6 +4341,48 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid,
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+void __meminit get_pfn_range_for_region(int nid, int region,
+ unsigned long *start_pfn, unsigned long *end_pfn)
+{
+ struct mem_region *mem_region;
+
+ mem_region = &NODE_DATA(nid)->node_regions[region];
+ *start_pfn = mem_region->start_pfn;
+ *end_pfn = *start_pfn + mem_region->spanned_pages;
+}
+
+static inline unsigned long __meminit zone_spanned_pages_in_node_region(int nid,
+ int region,
+ unsigned long zone_start_pfn,
+ unsigned long zone_type,
+ unsigned long *zones_size)
+{
+ unsigned long start_pfn, end_pfn;
+ unsigned long zone_end_pfn, spanned_pages;
+
+ get_pfn_range_for_region(nid, region, &start_pfn, &end_pfn);
+
+ spanned_pages = zone_spanned_pages_in_node(nid, zone_type, zones_size);
+
+ zone_end_pfn = zone_start_pfn + spanned_pages;
+
+ zone_end_pfn = min(zone_end_pfn, end_pfn);
+ zone_start_pfn = max(start_pfn, zone_start_pfn);
+
+ /* Detect if region and zone don't intersect */
+ if (zone_end_pfn < zone_start_pfn)
+ return 0;
+
+ return zone_end_pfn - zone_start_pfn;
+}
+
+static inline unsigned long __meminit zone_absent_pages_in_node_region(int nid,
+ unsigned long zone_start_pfn,
+ unsigned long zone_end_pfn)
+{
+ return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn);
+}
+
static void __meminit calculate_node_totalpages(struct pglist_data *pgdat,
unsigned long *zones_size, unsigned long *zholes_size)
{
@@ -4446,6 +4489,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
enum zone_type j;
int nid = pgdat->node_id;
unsigned long zone_start_pfn = pgdat->node_start_pfn;
+ struct mem_region *region;
int ret;
pgdat_resize_init(pgdat);
@@ -4454,68 +4498,77 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
pgdat_page_cgroup_init(pgdat);
for (j = 0; j < MAX_NR_ZONES; j++) {
- struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + j;
+ unsigned long size, realsize = 0, memmap_pages;
- size = zone_spanned_pages_in_node(nid, j, zones_size);
- realsize = size - zone_absent_pages_in_node(nid, j,
- zholes_size);
+ size = zone_spanned_pages_in_node_region(nid,
+ region->region,
+ zone_start_pfn,
+ j, zones_size);
- /*
- * Adjust realsize so that it accounts for how much memory
- * is used by this zone for memmap. This affects the watermark
- * and per-cpu initialisations
- */
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
- if (memmap_pages)
- printk(KERN_DEBUG
- " %s zone: %lu pages used for memmap\n",
- zone_names[j], memmap_pages);
- } else
- printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
-
- /* Account for reserved pages */
- if (j == 0 && realsize > dma_reserve) {
- realsize -= dma_reserve;
- printk(KERN_DEBUG " %s zone: %lu pages reserved\n",
- zone_names[0], dma_reserve);
- }
+ realsize = size -
+ zone_absent_pages_in_node_region(nid,
+ zone_start_pfn,
+ zone_start_pfn + size);
- if (!is_highmem_idx(j))
- nr_kernel_pages += realsize;
- nr_all_pages += realsize;
+ /*
+ * Adjust realsize so that it accounts for how much memory
+ * is used by this zone for memmap. This affects the watermark
+ * and per-cpu initialisations
+ */
+ memmap_pages =
+ PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
+ if (realsize >= memmap_pages) {
+ realsize -= memmap_pages;
+ if (memmap_pages)
+ printk(KERN_DEBUG
+ " %s zone: %lu pages used for memmap\n",
+ zone_names[j], memmap_pages);
+ } else
+ printk(KERN_WARNING
+ " %s zone: %lu pages exceeds realsize %lu\n",
+ zone_names[j], memmap_pages, realsize);
+
+ /* Account for reserved pages */
+ if (j == 0 && realsize > dma_reserve) {
+ realsize -= dma_reserve;
+ printk(KERN_DEBUG " %s zone: %lu pages reserved\n",
+ zone_names[0], dma_reserve);
+ }
- zone->spanned_pages = size;
- zone->present_pages = realsize;
+ if (!is_highmem_idx(j))
+ nr_kernel_pages += realsize;
+ nr_all_pages += realsize;
+
+ zone->spanned_pages = size;
+ zone->present_pages = realsize;
#ifdef CONFIG_NUMA
- zone->node = nid;
- zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
- / 100;
- zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
+ zone->node = nid;
+ zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
+ / 100;
+ zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
#endif
- zone->name = zone_names[j];
- spin_lock_init(&zone->lock);
- spin_lock_init(&zone->lru_lock);
- zone_seqlock_init(zone);
- zone->zone_pgdat = pgdat;
-
- zone_pcp_init(zone);
- lruvec_init(&zone->lruvec, zone);
- if (!size)
- continue;
+ zone->name = zone_names[j];
+ spin_lock_init(&zone->lock);
+ spin_lock_init(&zone->lru_lock);
+ zone_seqlock_init(zone);
+ zone->zone_pgdat = pgdat;
+ zone->zone_mem_region = region;
+
+ zone_pcp_init(zone);
+ lruvec_init(&zone->lruvec, zone);
+ if (!size)
+ continue;
- set_pageblock_order();
- setup_usemap(pgdat, zone, size);
- ret = init_currently_empty_zone(zone, zone_start_pfn,
- size, MEMMAP_EARLY);
- BUG_ON(ret);
- memmap_init(size, nid, j, zone_start_pfn);
- zone_start_pfn += size;
+ set_pageblock_order();
+ setup_usemap(pgdat, zone, size);
+ ret = init_currently_empty_zone(zone, zone_start_pfn,
+ size, MEMMAP_EARLY);
+ BUG_ON(ret);
+ memmap_init(size, nid, j, zone_start_pfn);
+ zone_start_pfn += size;
+ }
}
}
@@ -4854,12 +4907,16 @@ static void __init check_for_regular_memory(pg_data_t *pgdat)
{
#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;
+ struct mem_region *region;
for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
- struct zone *zone = &pgdat->node_zones[zone_type];
- if (zone->present_pages) {
- node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
- break;
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = ®ion->region_zones[zone_type];
+ if (zone->present_pages) {
+ node_set_state(zone_to_nid(zone),
+ N_NORMAL_MEMORY);
+ return;
+ }
}
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 04/10] mm: Refer to zones from memory regions
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
` (2 preceding siblings ...)
2012-11-06 19:40 ` [RFC PATCH 03/10] mm: Init zones inside memory regions Srivatsa S. Bhat
@ 2012-11-06 19:40 ` Srivatsa S. Bhat
2012-11-06 19:40 ` [RFC PATCH 05/10] mm: Create zonelists Srivatsa S. Bhat
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:40 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
With the introduction of memory regions, the node_zones link inside
the node structure is removed. Hence, this patch modifies the VM
code to refer to zones from within memory regions instead of nodes.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
include/linux/mm.h | 2 -
include/linux/mmzone.h | 9 ++-
mm/page_alloc.c | 128 +++++++++++++++++++++++++++---------------------
3 files changed, 79 insertions(+), 60 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f57eef0..27fc2d3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1345,7 +1345,7 @@ extern int __meminit __early_pfn_to_nid(unsigned long pfn);
#endif
extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long,
+extern void memmap_init_zone(unsigned long, int, int, unsigned long,
unsigned long, enum memmap_context);
extern void setup_per_zone_wmarks(void);
extern int __meminit init_per_zone_wmark_min(void);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6f5d533..4abc7d5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -842,7 +842,8 @@ static inline int is_normal_idx(enum zone_type idx)
static inline int is_highmem(struct zone *zone)
{
#ifdef CONFIG_HIGHMEM
- int zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones;
+ int zone_off = (char *)zone -
+ (char *)zone->zone_mem_region->region_zones;
return zone_off == ZONE_HIGHMEM * sizeof(*zone) ||
(zone_off == ZONE_MOVABLE * sizeof(*zone) &&
zone_movable_is_highmem());
@@ -853,13 +854,13 @@ static inline int is_highmem(struct zone *zone)
static inline int is_normal(struct zone *zone)
{
- return zone == zone->zone_pgdat->node_zones + ZONE_NORMAL;
+ return zone == zone->zone_mem_region->region_zones + ZONE_NORMAL;
}
static inline int is_dma32(struct zone *zone)
{
#ifdef CONFIG_ZONE_DMA32
- return zone == zone->zone_pgdat->node_zones + ZONE_DMA32;
+ return zone == zone->zone_mem_region->region_zones + ZONE_DMA32;
#else
return 0;
#endif
@@ -868,7 +869,7 @@ static inline int is_dma32(struct zone *zone)
static inline int is_dma(struct zone *zone)
{
#ifdef CONFIG_ZONE_DMA
- return zone == zone->zone_pgdat->node_zones + ZONE_DMA;
+ return zone == zone->zone_mem_region->region_zones + ZONE_DMA;
#else
return 0;
#endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c807272..a8e86b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3797,8 +3797,8 @@ static void setup_zone_migrate_reserve(struct zone *zone)
* up by free_all_bootmem() once the early boot process is
* done. Non-atomic initialization, single-pass.
*/
-void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
- unsigned long start_pfn, enum memmap_context context)
+void __meminit memmap_init_zone(unsigned long size, int nid, int region,
+ unsigned long zone, unsigned long start_pfn, enum memmap_context context)
{
struct page *page;
unsigned long end_pfn = start_pfn + size;
@@ -3808,7 +3808,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
if (highest_memmap_pfn < end_pfn - 1)
highest_memmap_pfn = end_pfn - 1;
- z = &NODE_DATA(nid)->node_zones[zone];
+ z = &NODE_DATA(nid)->node_regions[region].region_zones[zone];
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
/*
* There can be holes in boot-time mem_map[]s
@@ -3865,8 +3865,8 @@ static void __meminit zone_init_free_lists(struct zone *zone)
}
#ifndef __HAVE_ARCH_MEMMAP_INIT
-#define memmap_init(size, nid, zone, start_pfn) \
- memmap_init_zone((size), (nid), (zone), (start_pfn), MEMMAP_EARLY)
+#define memmap_init(size, nid, region, zone, start_pfn) \
+ memmap_init_zone((size), (nid), (region), (zone), (start_pfn), MEMMAP_EARLY)
#endif
static int __meminit zone_batchsize(struct zone *zone)
@@ -4045,11 +4045,13 @@ int __meminit init_currently_empty_zone(struct zone *zone,
enum memmap_context context)
{
struct pglist_data *pgdat = zone->zone_pgdat;
+ struct mem_region *region = zone->zone_mem_region;
int ret;
ret = zone_wait_table_init(zone, size);
if (ret)
return ret;
- pgdat->nr_zones = zone_idx(zone) + 1;
+ pgdat->nr_node_zone_types = zone_idx(zone) + 1;
+ region->nr_region_zones = zone_idx(zone) + 1;
zone->zone_start_pfn = zone_start_pfn;
@@ -4058,7 +4060,6 @@ int __meminit init_currently_empty_zone(struct zone *zone,
pgdat->node_id,
(unsigned long)zone_idx(zone),
zone_start_pfn, (zone_start_pfn + size));
-
zone_init_free_lists(zone);
return 0;
@@ -4566,7 +4567,7 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
ret = init_currently_empty_zone(zone, zone_start_pfn,
size, MEMMAP_EARLY);
BUG_ON(ret);
- memmap_init(size, nid, j, zone_start_pfn);
+ memmap_init(size, nid, region->region, j, zone_start_pfn);
zone_start_pfn += size;
}
}
@@ -4613,13 +4614,17 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
#endif /* CONFIG_FLAT_NODE_MEM_MAP */
}
+/*
+ * Todo: This routine needs more modifications, but not required for the
+ * minimalistic config options, to start with
+ */
void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
unsigned long node_start_pfn, unsigned long *zholes_size)
{
pg_data_t *pgdat = NODE_DATA(nid);
/* pg_data_t should be reset to zero when it's allocated */
- WARN_ON(pgdat->nr_zones || pgdat->classzone_idx);
+ WARN_ON(pgdat->nr_node_zone_types || pgdat->classzone_idx);
pgdat->node_id = nid;
pgdat->node_start_pfn = node_start_pfn;
@@ -5109,35 +5114,38 @@ static void calculate_totalreserve_pages(void)
{
struct pglist_data *pgdat;
unsigned long reserve_pages = 0;
+ struct mem_region *region;
enum zone_type i, j;
for_each_online_pgdat(pgdat) {
for (i = 0; i < MAX_NR_ZONES; i++) {
- struct zone *zone = pgdat->node_zones + i;
- unsigned long max = 0;
-
- /* Find valid and maximum lowmem_reserve in the zone */
- for (j = i; j < MAX_NR_ZONES; j++) {
- if (zone->lowmem_reserve[j] > max)
- max = zone->lowmem_reserve[j];
- }
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
+ unsigned long max = 0;
+
+ /* Find valid and maximum lowmem_reserve in the zone */
+ for (j = i; j < MAX_NR_ZONES; j++) {
+ if (zone->lowmem_reserve[j] > max)
+ max = zone->lowmem_reserve[j];
+ }
- /* we treat the high watermark as reserved pages. */
- max += high_wmark_pages(zone);
+ /* we treat the high watermark as reserved pages. */
+ max += high_wmark_pages(zone);
- if (max > zone->present_pages)
- max = zone->present_pages;
- reserve_pages += max;
- /*
- * Lowmem reserves are not available to
- * GFP_HIGHUSER page cache allocations and
- * kswapd tries to balance zones to their high
- * watermark. As a result, neither should be
- * regarded as dirtyable memory, to prevent a
- * situation where reclaim has to clean pages
- * in order to balance the zones.
- */
- zone->dirty_balance_reserve = max;
+ if (max > zone->present_pages)
+ max = zone->present_pages;
+ reserve_pages += max;
+ /*
+ * Lowmem reserves are not available to
+ * GFP_HIGHUSER page cache allocations and
+ * kswapd tries to balance zones to their high
+ * watermark. As a result, neither should be
+ * regarded as dirtyable memory, to prevent a
+ * situation where reclaim has to clean pages
+ * in order to balance the zones.
+ */
+ zone->dirty_balance_reserve = max;
+ }
}
}
dirty_balance_reserve = reserve_pages;
@@ -5154,27 +5162,30 @@ static void setup_per_zone_lowmem_reserve(void)
{
struct pglist_data *pgdat;
enum zone_type j, idx;
+ struct mem_region *region;
for_each_online_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
- struct zone *zone = pgdat->node_zones + j;
- unsigned long present_pages = zone->present_pages;
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + j;
+ unsigned long present_pages = zone->present_pages;
- zone->lowmem_reserve[j] = 0;
+ zone->lowmem_reserve[j] = 0;
- idx = j;
- while (idx) {
- struct zone *lower_zone;
+ idx = j;
+ while (idx) {
+ struct zone *lower_zone;
- idx--;
+ idx--;
- if (sysctl_lowmem_reserve_ratio[idx] < 1)
- sysctl_lowmem_reserve_ratio[idx] = 1;
+ if (sysctl_lowmem_reserve_ratio[idx] < 1)
+ sysctl_lowmem_reserve_ratio[idx] = 1;
- lower_zone = pgdat->node_zones + idx;
- lower_zone->lowmem_reserve[j] = present_pages /
- sysctl_lowmem_reserve_ratio[idx];
- present_pages += lower_zone->present_pages;
+ lower_zone = region->region_zones + idx;
+ lower_zone->lowmem_reserve[j] = present_pages /
+ sysctl_lowmem_reserve_ratio[idx];
+ present_pages += lower_zone->present_pages;
+ }
}
}
}
@@ -6159,13 +6170,16 @@ void dump_page(struct page *page)
/* reset zone->present_pages */
void reset_zone_present_pages(void)
{
+ struct mem_region *region;
struct zone *z;
int i, nid;
for_each_node_state(nid, N_HIGH_MEMORY) {
for (i = 0; i < MAX_NR_ZONES; i++) {
- z = NODE_DATA(nid)->node_zones + i;
- z->present_pages = 0;
+ for_each_mem_region_in_node(region, nid) {
+ z = region->region_zones + i;
+ z->present_pages = 0;
+ }
}
}
}
@@ -6177,15 +6191,19 @@ void fixup_zone_present_pages(int nid, unsigned long start_pfn,
struct zone *z;
unsigned long zone_start_pfn, zone_end_pfn;
int i;
+ struct mem_region *region;
for (i = 0; i < MAX_NR_ZONES; i++) {
- z = NODE_DATA(nid)->node_zones + i;
- zone_start_pfn = z->zone_start_pfn;
- zone_end_pfn = zone_start_pfn + z->spanned_pages;
-
- /* if the two regions intersect */
- if (!(zone_start_pfn >= end_pfn || zone_end_pfn <= start_pfn))
- z->present_pages += min(end_pfn, zone_end_pfn) -
- max(start_pfn, zone_start_pfn);
+ for_each_mem_region_in_node(region, nid) {
+ z = region->region_zones + i;
+ zone_start_pfn = z->zone_start_pfn;
+ zone_end_pfn = zone_start_pfn + z->spanned_pages;
+
+ /* if the two regions intersect */
+ if (!(zone_start_pfn >= end_pfn ||
+ zone_end_pfn <= start_pfn))
+ z->present_pages += min(end_pfn, zone_end_pfn) -
+ max(start_pfn, zone_start_pfn);
+ }
}
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 05/10] mm: Create zonelists
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
` (3 preceding siblings ...)
2012-11-06 19:40 ` [RFC PATCH 04/10] mm: Refer to zones from " Srivatsa S. Bhat
@ 2012-11-06 19:40 ` Srivatsa S. Bhat
2012-11-06 19:41 ` [RFC PATCH 06/10] mm: Verify zonelists Srivatsa S. Bhat
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:40 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
The default zonelist that is node ordered contains all zones from within a
node and then all zones from the next node and so on. By introducing memory
regions, the primary aim is to group memory allocations to a given area of
memory together. The modified zonelists thus contain all zones from one
region, followed by all zones from the next region and so on. This ensures
that all the memory in one region is allocated before going over to the next
region, unless targetted memory allocations are performed.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
mm/page_alloc.c | 69 +++++++++++++++++++++++++++++++++----------------------
1 file changed, 42 insertions(+), 27 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8e86b5..9c1d680 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3040,21 +3040,25 @@ static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist,
int nr_zones, enum zone_type zone_type)
{
+ enum zone_type z_type = zone_type;
+ struct mem_region *region;
struct zone *zone;
BUG_ON(zone_type >= MAX_NR_ZONES);
zone_type++;
- do {
- zone_type--;
- zone = pgdat->node_zones + zone_type;
- if (populated_zone(zone)) {
- zoneref_set_zone(zone,
- &zonelist->_zonerefs[nr_zones++]);
- check_highest_zone(zone_type);
- }
-
- } while (zone_type);
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ do {
+ zone_type--;
+ zone = region->region_zones + zone_type;
+ if (populated_zone(zone)) {
+ zoneref_set_zone(zone,
+ &zonelist->_zonerefs[nr_zones++]);
+ check_highest_zone(zone_type);
+ }
+ } while (zone_type);
+ zone_type = z_type + 1;
+ }
return nr_zones;
}
@@ -3275,17 +3279,20 @@ static void build_zonelists_in_zone_order(pg_data_t *pgdat, int nr_nodes)
int zone_type; /* needs to be signed */
struct zone *z;
struct zonelist *zonelist;
+ struct mem_region *region;
zonelist = &pgdat->node_zonelists[0];
pos = 0;
for (zone_type = MAX_NR_ZONES - 1; zone_type >= 0; zone_type--) {
for (j = 0; j < nr_nodes; j++) {
node = node_order[j];
- z = &NODE_DATA(node)->node_zones[zone_type];
- if (populated_zone(z)) {
- zoneref_set_zone(z,
- &zonelist->_zonerefs[pos++]);
- check_highest_zone(zone_type);
+ for_each_mem_region_in_node(region, node) {
+ z = ®ion->region_zones[zone_type];
+ if (populated_zone(z)) {
+ zoneref_set_zone(z,
+ &zonelist->_zonerefs[pos++]);
+ check_highest_zone(zone_type);
+ }
}
}
}
@@ -3299,6 +3306,8 @@ static int default_zonelist_order(void)
unsigned long low_kmem_size,total_size;
struct zone *z;
int average_size;
+ struct mem_region *region;
+
/*
* ZONE_DMA and ZONE_DMA32 can be very small area in the system.
* If they are really small and used heavily, the system can fall
@@ -3310,12 +3319,15 @@ static int default_zonelist_order(void)
total_size = 0;
for_each_online_node(nid) {
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
- z = &NODE_DATA(nid)->node_zones[zone_type];
- if (populated_zone(z)) {
- if (zone_type < ZONE_NORMAL)
- low_kmem_size += z->present_pages;
- total_size += z->present_pages;
- } else if (zone_type == ZONE_NORMAL) {
+ for_each_mem_region_in_node(region, nid) {
+ z = ®ion->region_zones[zone_type];
+ if (populated_zone(z)) {
+ if (zone_type < ZONE_NORMAL)
+ low_kmem_size +=
+ z->present_pages;
+
+ total_size += z->present_pages;
+ } else if (zone_type == ZONE_NORMAL) {
/*
* If any node has only lowmem, then node order
* is preferred to allow kernel allocations
@@ -3323,7 +3335,8 @@ static int default_zonelist_order(void)
* on other nodes when there is an abundance of
* lowmem available to allocate from.
*/
- return ZONELIST_ORDER_NODE;
+ return ZONELIST_ORDER_NODE;
+ }
}
}
}
@@ -3341,11 +3354,13 @@ static int default_zonelist_order(void)
low_kmem_size = 0;
total_size = 0;
for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
- z = &NODE_DATA(nid)->node_zones[zone_type];
- if (populated_zone(z)) {
- if (zone_type < ZONE_NORMAL)
- low_kmem_size += z->present_pages;
- total_size += z->present_pages;
+ for_each_mem_region_in_node(region, nid) {
+ z = ®ion->region_zones[zone_type];
+ if (populated_zone(z)) {
+ if (zone_type < ZONE_NORMAL)
+ low_kmem_size += z->present_pages;
+ total_size += z->present_pages;
+ }
}
}
if (low_kmem_size &&
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 06/10] mm: Verify zonelists
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
` (4 preceding siblings ...)
2012-11-06 19:40 ` [RFC PATCH 05/10] mm: Create zonelists Srivatsa S. Bhat
@ 2012-11-06 19:41 ` Srivatsa S. Bhat
2012-11-06 19:41 ` [RFC PATCH 07/10] mm: Modify vmstat Srivatsa S. Bhat
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:41 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
Verify that the zonelists were created appropriately.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
mm/mm_init.c | 57 ++++++++++++++++++++++++++++++---------------------------
1 file changed, 30 insertions(+), 27 deletions(-)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 1ffd97a..5c19842 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -21,6 +21,7 @@ int mminit_loglevel;
/* The zonelists are simply reported, validation is manual. */
void mminit_verify_zonelist(void)
{
+ struct mem_region *region;
int nid;
if (mminit_loglevel < MMINIT_VERIFY)
@@ -28,37 +29,39 @@ void mminit_verify_zonelist(void)
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
- struct zone *zone;
- struct zoneref *z;
- struct zonelist *zonelist;
- int i, listid, zoneid;
-
- BUG_ON(MAX_ZONELISTS > 2);
- for (i = 0; i < MAX_ZONELISTS * MAX_NR_ZONES; i++) {
-
- /* Identify the zone and nodelist */
- zoneid = i % MAX_NR_ZONES;
- listid = i / MAX_NR_ZONES;
- zonelist = &pgdat->node_zonelists[listid];
- zone = &pgdat->node_zones[zoneid];
- if (!populated_zone(zone))
- continue;
-
- /* Print information about the zonelist */
- printk(KERN_DEBUG "mminit::zonelist %s %d:%s = ",
- listid > 0 ? "thisnode" : "general", nid,
- zone->name);
-
- /* Iterate the zonelist */
- for_each_zone_zonelist(zone, z, zonelist, zoneid) {
+ for_each_mem_region_in_node(region, nid) {
+ struct zone *zone;
+ struct zoneref *z;
+ struct zonelist *zonelist;
+ int i, listid, zoneid;
+
+ BUG_ON(MAX_ZONELISTS > 2);
+ for (i = 0; i < MAX_ZONELISTS * MAX_NR_ZONES; i++) {
+
+ /* Identify the zone and nodelist */
+ zoneid = i % MAX_NR_ZONES;
+ listid = i / MAX_NR_ZONES;
+ zonelist = &pgdat->node_zonelists[listid];
+ zone = ®ion->region_zones[zoneid];
+ if (!populated_zone(zone))
+ continue;
+
+ /* Print information about the zonelist */
+ printk(KERN_DEBUG "mminit::zonelist %s %d:%s = ",
+ listid > 0 ? "thisnode" : "general", nid,
+ zone->name);
+
+ /* Iterate the zonelist */
+ for_each_zone_zonelist(zone, z, zonelist, zoneid) {
#ifdef CONFIG_NUMA
- printk(KERN_CONT "%d:%s ",
- zone->node, zone->name);
+ printk(KERN_CONT "%d:%s ",
+ zone->node, zone->name);
#else
- printk(KERN_CONT "0:%s ", zone->name);
+ printk(KERN_CONT "0:%s ", zone->name);
#endif /* CONFIG_NUMA */
+ }
+ printk(KERN_CONT "\n");
}
- printk(KERN_CONT "\n");
}
}
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 07/10] mm: Modify vmstat
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
` (5 preceding siblings ...)
2012-11-06 19:41 ` [RFC PATCH 06/10] mm: Verify zonelists Srivatsa S. Bhat
@ 2012-11-06 19:41 ` Srivatsa S. Bhat
2012-11-06 19:41 ` [RFC PATCH 08/10] mm: Modify vmscan Srivatsa S. Bhat
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:41 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
Change the way vmstats are collected. Since the zones are now present inside
regions, scan through all the regions to obtain zone specific statistics.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
include/linux/vmstat.h | 21 ++++++++++++++-------
mm/vmstat.c | 40 ++++++++++++++++++++++++----------------
2 files changed, 38 insertions(+), 23 deletions(-)
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 92a86b2..a782f05 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -151,20 +151,27 @@ extern unsigned long zone_reclaimable_pages(struct zone *zone);
static inline unsigned long node_page_state(int node,
enum zone_stat_item item)
{
- struct zone *zones = NODE_DATA(node)->node_zones;
+ unsigned long page_state = 0;
+ struct mem_region *region;
+
+ for_each_mem_region_in_node(region, node) {
+ struct zone *zones = region->region_zones;
+
+ page_state =
- return
#ifdef CONFIG_ZONE_DMA
- zone_page_state(&zones[ZONE_DMA], item) +
+ zone_page_state(&zones[ZONE_DMA], item) +
#endif
#ifdef CONFIG_ZONE_DMA32
- zone_page_state(&zones[ZONE_DMA32], item) +
+ zone_page_state(&zones[ZONE_DMA32], item) +
#endif
#ifdef CONFIG_HIGHMEM
- zone_page_state(&zones[ZONE_HIGHMEM], item) +
+ zone_page_state(&zones[ZONE_HIGHMEM], item) +
#endif
- zone_page_state(&zones[ZONE_NORMAL], item) +
- zone_page_state(&zones[ZONE_MOVABLE], item);
+ zone_page_state(&zones[ZONE_NORMAL], item) +
+ zone_page_state(&zones[ZONE_MOVABLE], item);
+ }
+ return page_state;
}
extern void zone_statistics(struct zone *, struct zone *, gfp_t gfp);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..86a92a6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -188,20 +188,24 @@ void refresh_zone_stat_thresholds(void)
void set_pgdat_percpu_threshold(pg_data_t *pgdat,
int (*calculate_pressure)(struct zone *))
{
+ struct mem_region *region;
struct zone *zone;
int cpu;
int threshold;
int i;
for (i = 0; i < pgdat->nr_zones; i++) {
- zone = &pgdat->node_zones[i];
- if (!zone->percpu_drift_mark)
- continue;
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
- threshold = (*calculate_pressure)(zone);
- for_each_possible_cpu(cpu)
- per_cpu_ptr(zone->pageset, cpu)->stat_threshold
- = threshold;
+ if (!zone->percpu_drift_mark)
+ continue;
+
+ threshold = (*calculate_pressure)(zone);
+ for_each_possible_cpu(cpu)
+ per_cpu_ptr(zone->pageset, cpu)->stat_threshold
+ = threshold;
+ }
}
}
@@ -657,19 +661,23 @@ static void frag_stop(struct seq_file *m, void *arg)
/* Walk all the zones in a node and print using a callback */
static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat,
- void (*print)(struct seq_file *m, pg_data_t *, struct zone *))
+ void (*print)(struct seq_file *m, pg_data_t *,
+ struct mem_region *, struct zone *))
{
- struct zone *zone;
- struct zone *node_zones = pgdat->node_zones;
+ int i;
unsigned long flags;
+ struct mem_region *region;
- for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
- if (!populated_zone(zone))
- continue;
+ for (i = 0; i < MAX_NR_ZONES; ++i) {
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
+ if (!populated_zone(zone))
+ continue;
- spin_lock_irqsave(&zone->lock, flags);
- print(m, pgdat, zone);
- spin_unlock_irqrestore(&zone->lock, flags);
+ spin_lock_irqsave(&zone->lock, flags);
+ print(m, pgdat, region, zone);
+ spin_unlock_irqrestore(&zone->lock, flags);
+ }
}
}
#endif
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 08/10] mm: Modify vmscan
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
` (6 preceding siblings ...)
2012-11-06 19:41 ` [RFC PATCH 07/10] mm: Modify vmstat Srivatsa S. Bhat
@ 2012-11-06 19:41 ` Srivatsa S. Bhat
2012-11-06 19:41 ` [RFC PATCH 09/10] mm: Reflect memory region changes in zoneinfo Srivatsa S. Bhat
2012-11-06 19:42 ` [RFC PATCH 10/10] mm: Create memory regions at boot-up Srivatsa S. Bhat
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:41 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
Modify vmscan to take into account the changed node-zone hierarchy.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
mm/vmscan.c | 364 +++++++++++++++++++++++++++++++----------------------------
1 file changed, 193 insertions(+), 171 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2624edc..4d8f303 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2209,11 +2209,14 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
unsigned long free_pages = 0;
int i;
bool wmark_ok;
+ struct mem_region *region;
for (i = 0; i <= ZONE_NORMAL; i++) {
- zone = &pgdat->node_zones[i];
- pfmemalloc_reserve += min_wmark_pages(zone);
- free_pages += zone_page_state(zone, NR_FREE_PAGES);
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ zone = ®ion->region_zones[i];
+ pfmemalloc_reserve += min_wmark_pages(zone);
+ free_pages += zone_page_state(zone, NR_FREE_PAGES);
+ }
}
wmark_ok = free_pages > pfmemalloc_reserve / 2;
@@ -2442,10 +2445,16 @@ static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
int classzone_idx)
{
unsigned long present_pages = 0;
+ struct mem_region *region;
int i;
- for (i = 0; i <= classzone_idx; i++)
- present_pages += pgdat->node_zones[i].present_pages;
+ for (i = 0; i <= classzone_idx; i++) {
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
+
+ present_pages += zone->present_pages;
+ }
+ }
/* A special case here: if zone has no page, we think it's balanced */
return balanced_pages >= (present_pages >> 2);
@@ -2463,6 +2472,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
int i;
unsigned long balanced = 0;
bool all_zones_ok = true;
+ struct mem_region *region;
/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
if (remaining)
@@ -2484,27 +2494,29 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
/* Check the watermark levels */
for (i = 0; i <= classzone_idx; i++) {
- struct zone *zone = pgdat->node_zones + i;
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
- if (!populated_zone(zone))
- continue;
+ if (!populated_zone(zone))
+ continue;
- /*
- * balance_pgdat() skips over all_unreclaimable after
- * DEF_PRIORITY. Effectively, it considers them balanced so
- * they must be considered balanced here as well if kswapd
- * is to sleep
- */
- if (zone->all_unreclaimable) {
- balanced += zone->present_pages;
- continue;
- }
+ /*
+ * balance_pgdat() skips over all_unreclaimable after
+ * DEF_PRIORITY. Effectively, it considers them balanced so
+ * they must be considered balanced here as well if kswapd
+ * is to sleep
+ */
+ if (zone->all_unreclaimable) {
+ balanced += zone->present_pages;
+ continue;
+ }
- if (!zone_watermark_ok_safe(zone, order, high_wmark_pages(zone),
- i, 0))
- all_zones_ok = false;
- else
- balanced += zone->present_pages;
+ if (!zone_watermark_ok_safe(zone, order,
+ high_wmark_pages(zone), i, 0))
+ all_zones_ok = false;
+ else
+ balanced += zone->present_pages;
+ }
}
/*
@@ -2565,6 +2577,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
struct shrink_control shrink = {
.gfp_mask = sc.gfp_mask,
};
+ struct mem_region *region;
loop_again:
total_scanned = 0;
sc.priority = DEF_PRIORITY;
@@ -2583,49 +2596,55 @@ loop_again:
* Scan in the highmem->dma direction for the highest
* zone which needs scanning
*/
- for (i = pgdat->nr_zones - 1; i >= 0; i--) {
- struct zone *zone = pgdat->node_zones + i;
+ for (i = pgdat->nr_node_zone_types - 1; i >= 0; i--) {
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
- if (!populated_zone(zone))
- continue;
+ if (!populated_zone(zone))
+ continue;
- if (zone->all_unreclaimable &&
- sc.priority != DEF_PRIORITY)
- continue;
+ if (zone->all_unreclaimable &&
+ sc.priority != DEF_PRIORITY)
+ continue;
- /*
- * Do some background aging of the anon list, to give
- * pages a chance to be referenced before reclaiming.
- */
- age_active_anon(zone, &sc);
+ /*
+ * Do some background aging of the anon list, to give
+ * pages a chance to be referenced before reclaiming.
+ */
+ age_active_anon(zone, &sc);
- /*
- * If the number of buffer_heads in the machine
- * exceeds the maximum allowed level and this node
- * has a highmem zone, force kswapd to reclaim from
- * it to relieve lowmem pressure.
- */
- if (buffer_heads_over_limit && is_highmem_idx(i)) {
- end_zone = i;
- break;
- }
+ /*
+ * If the number of buffer_heads in the machine
+ * exceeds the maximum allowed level and this node
+ * has a highmem zone, force kswapd to reclaim from
+ * it to relieve lowmem pressure.
+ */
+ if (buffer_heads_over_limit && is_highmem_idx(i)) {
+ end_zone = i;
+ goto out_loop;
+ }
- if (!zone_watermark_ok_safe(zone, order,
- high_wmark_pages(zone), 0, 0)) {
- end_zone = i;
- break;
- } else {
- /* If balanced, clear the congested flag */
- zone_clear_flag(zone, ZONE_CONGESTED);
+ if (!zone_watermark_ok_safe(zone, order,
+ high_wmark_pages(zone), 0, 0)) {
+ end_zone = i;
+ goto out_loop;
+ } else {
+ /* If balanced, clear the congested flag */
+ zone_clear_flag(zone, ZONE_CONGESTED);
+ }
}
}
+
+ out_loop:
if (i < 0)
goto out;
for (i = 0; i <= end_zone; i++) {
- struct zone *zone = pgdat->node_zones + i;
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
- lru_pages += zone_reclaimable_pages(zone);
+ lru_pages += zone_reclaimable_pages(zone);
+ }
}
/*
@@ -2638,108 +2657,109 @@ loop_again:
* cause too much scanning of the lower zones.
*/
for (i = 0; i <= end_zone; i++) {
- struct zone *zone = pgdat->node_zones + i;
- int nr_slab, testorder;
- unsigned long balance_gap;
-
- if (!populated_zone(zone))
- continue;
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
+ int nr_slab, testorder;
+ unsigned long balance_gap;
- if (zone->all_unreclaimable &&
- sc.priority != DEF_PRIORITY)
- continue;
-
- sc.nr_scanned = 0;
-
- nr_soft_scanned = 0;
- /*
- * Call soft limit reclaim before calling shrink_zone.
- */
- nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
- order, sc.gfp_mask,
- &nr_soft_scanned);
- sc.nr_reclaimed += nr_soft_reclaimed;
- total_scanned += nr_soft_scanned;
-
- /*
- * We put equal pressure on every zone, unless
- * one zone has way too many pages free
- * already. The "too many pages" is defined
- * as the high wmark plus a "gap" where the
- * gap is either the low watermark or 1%
- * of the zone, whichever is smaller.
- */
- balance_gap = min(low_wmark_pages(zone),
- (zone->present_pages +
- KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
- KSWAPD_ZONE_BALANCE_GAP_RATIO);
- /*
- * Kswapd reclaims only single pages with compaction
- * enabled. Trying too hard to reclaim until contiguous
- * free pages have become available can hurt performance
- * by evicting too much useful data from memory.
- * Do not reclaim more than needed for compaction.
- */
- testorder = order;
- if (COMPACTION_BUILD && order &&
- compaction_suitable(zone, order) !=
- COMPACT_SKIPPED)
- testorder = 0;
-
- if ((buffer_heads_over_limit && is_highmem_idx(i)) ||
- !zone_watermark_ok_safe(zone, testorder,
- high_wmark_pages(zone) + balance_gap,
- end_zone, 0)) {
- shrink_zone(zone, &sc);
-
- reclaim_state->reclaimed_slab = 0;
- nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages);
- sc.nr_reclaimed += reclaim_state->reclaimed_slab;
- total_scanned += sc.nr_scanned;
+ if (!populated_zone(zone))
+ continue;
- if (nr_slab == 0 && !zone_reclaimable(zone))
- zone->all_unreclaimable = 1;
- }
+ if (zone->all_unreclaimable &&
+ sc.priority != DEF_PRIORITY)
+ continue;
- /*
- * If we've done a decent amount of scanning and
- * the reclaim ratio is low, start doing writepage
- * even in laptop mode
- */
- if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
- total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
- sc.may_writepage = 1;
+ sc.nr_scanned = 0;
- if (zone->all_unreclaimable) {
- if (end_zone && end_zone == i)
- end_zone--;
- continue;
- }
+ nr_soft_scanned = 0;
+ /*
+ * Call soft limit reclaim before calling shrink_zone.
+ */
+ nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
+ order, sc.gfp_mask,
+ &nr_soft_scanned);
+ sc.nr_reclaimed += nr_soft_reclaimed;
+ total_scanned += nr_soft_scanned;
- if (!zone_watermark_ok_safe(zone, testorder,
- high_wmark_pages(zone), end_zone, 0)) {
- all_zones_ok = 0;
/*
- * We are still under min water mark. This
- * means that we have a GFP_ATOMIC allocation
- * failure risk. Hurry up!
+ * We put equal pressure on every zone, unless
+ * one zone has way too many pages free
+ * already. The "too many pages" is defined
+ * as the high wmark plus a "gap" where the
+ * gap is either the low watermark or 1%
+ * of the zone, whichever is smaller.
*/
- if (!zone_watermark_ok_safe(zone, order,
- min_wmark_pages(zone), end_zone, 0))
- has_under_min_watermark_zone = 1;
- } else {
+ balance_gap = min(low_wmark_pages(zone),
+ (zone->present_pages +
+ KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
+ KSWAPD_ZONE_BALANCE_GAP_RATIO);
/*
- * If a zone reaches its high watermark,
- * consider it to be no longer congested. It's
- * possible there are dirty pages backed by
- * congested BDIs but as pressure is relieved,
- * speculatively avoid congestion waits
+ * Kswapd reclaims only single pages with compaction
+ * enabled. Trying too hard to reclaim until contiguous
+ * free pages have become available can hurt performance
+ * by evicting too much useful data from memory.
+ * Do not reclaim more than needed for compaction.
*/
- zone_clear_flag(zone, ZONE_CONGESTED);
- if (i <= *classzone_idx)
- balanced += zone->present_pages;
- }
+ testorder = order;
+ if (COMPACTION_BUILD && order &&
+ compaction_suitable(zone, order) !=
+ COMPACT_SKIPPED)
+ testorder = 0;
+
+ if ((buffer_heads_over_limit && is_highmem_idx(i)) ||
+ !zone_watermark_ok_safe(zone, testorder,
+ high_wmark_pages(zone) + balance_gap,
+ end_zone, 0)) {
+ shrink_zone(zone, &sc);
+
+ reclaim_state->reclaimed_slab = 0;
+ nr_slab = shrink_slab(&shrink, sc.nr_scanned, lru_pages);
+ sc.nr_reclaimed += reclaim_state->reclaimed_slab;
+ total_scanned += sc.nr_scanned;
+
+ if (nr_slab == 0 && !zone_reclaimable(zone))
+ zone->all_unreclaimable = 1;
+ }
+ /*
+ * If we've done a decent amount of scanning and
+ * the reclaim ratio is low, start doing writepage
+ * even in laptop mode
+ */
+ if (total_scanned > SWAP_CLUSTER_MAX * 2 &&
+ total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
+ sc.may_writepage = 1;
+
+ if (zone->all_unreclaimable) {
+ if (end_zone && end_zone == i)
+ end_zone--;
+ continue;
+ }
+
+ if (!zone_watermark_ok_safe(zone, testorder,
+ high_wmark_pages(zone), end_zone, 0)) {
+ all_zones_ok = 0;
+ /*
+ * We are still under min water mark. This
+ * means that we have a GFP_ATOMIC allocation
+ * failure risk. Hurry up!
+ */
+ if (!zone_watermark_ok_safe(zone, order,
+ min_wmark_pages(zone), end_zone, 0))
+ has_under_min_watermark_zone = 1;
+ } else {
+ /*
+ * If a zone reaches its high watermark,
+ * consider it to be no longer congested. It's
+ * possible there are dirty pages backed by
+ * congested BDIs but as pressure is relieved,
+ * speculatively avoid congestion waits
+ */
+ zone_clear_flag(zone, ZONE_CONGESTED);
+ if (i <= *classzone_idx)
+ balanced += zone->present_pages;
+ }
+ }
}
/*
@@ -2817,34 +2837,36 @@ out:
int zones_need_compaction = 1;
for (i = 0; i <= end_zone; i++) {
- struct zone *zone = pgdat->node_zones + i;
+ for_each_mem_region_in_node(region, pgdat->node_id) {
+ struct zone *zone = region->region_zones + i;
- if (!populated_zone(zone))
- continue;
+ if (!populated_zone(zone))
+ continue;
- if (zone->all_unreclaimable &&
- sc.priority != DEF_PRIORITY)
- continue;
+ if (zone->all_unreclaimable &&
+ sc.priority != DEF_PRIORITY)
+ continue;
- /* Would compaction fail due to lack of free memory? */
- if (COMPACTION_BUILD &&
- compaction_suitable(zone, order) == COMPACT_SKIPPED)
- goto loop_again;
+ /* Would compaction fail due to lack of free memory? */
+ if (COMPACTION_BUILD &&
+ compaction_suitable(zone, order) == COMPACT_SKIPPED)
+ goto loop_again;
- /* Confirm the zone is balanced for order-0 */
- if (!zone_watermark_ok(zone, 0,
- high_wmark_pages(zone), 0, 0)) {
- order = sc.order = 0;
- goto loop_again;
- }
+ /* Confirm the zone is balanced for order-0 */
+ if (!zone_watermark_ok(zone, 0,
+ high_wmark_pages(zone), 0, 0)) {
+ order = sc.order = 0;
+ goto loop_again;
+ }
- /* Check if the memory needs to be defragmented. */
- if (zone_watermark_ok(zone, order,
- low_wmark_pages(zone), *classzone_idx, 0))
- zones_need_compaction = 0;
+ /* Check if the memory needs to be defragmented. */
+ if (zone_watermark_ok(zone, order,
+ low_wmark_pages(zone), *classzone_idx, 0))
+ zones_need_compaction = 0;
- /* If balanced, clear the congested flag */
- zone_clear_flag(zone, ZONE_CONGESTED);
+ /* If balanced, clear the congested flag */
+ zone_clear_flag(zone, ZONE_CONGESTED);
+ }
}
if (zones_need_compaction)
@@ -2966,7 +2988,7 @@ static int kswapd(void *p)
order = new_order = 0;
balanced_order = 0;
- classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
+ classzone_idx = new_classzone_idx = pgdat->nr_node_zone_types - 1;
balanced_classzone_idx = classzone_idx;
for ( ; ; ) {
int ret;
@@ -2981,7 +3003,7 @@ static int kswapd(void *p)
new_order = pgdat->kswapd_max_order;
new_classzone_idx = pgdat->classzone_idx;
pgdat->kswapd_max_order = 0;
- pgdat->classzone_idx = pgdat->nr_zones - 1;
+ pgdat->classzone_idx = pgdat->nr_node_zone_types - 1;
}
if (order < new_order || classzone_idx > new_classzone_idx) {
@@ -2999,7 +3021,7 @@ static int kswapd(void *p)
new_order = order;
new_classzone_idx = classzone_idx;
pgdat->kswapd_max_order = 0;
- pgdat->classzone_idx = pgdat->nr_zones - 1;
+ pgdat->classzone_idx = pgdat->nr_node_zone_types - 1;
}
ret = try_to_freeze();
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 09/10] mm: Reflect memory region changes in zoneinfo
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
` (7 preceding siblings ...)
2012-11-06 19:41 ` [RFC PATCH 08/10] mm: Modify vmscan Srivatsa S. Bhat
@ 2012-11-06 19:41 ` Srivatsa S. Bhat
2012-11-06 19:42 ` [RFC PATCH 10/10] mm: Create memory regions at boot-up Srivatsa S. Bhat
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:41 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
This patch modifies the output of /proc/zoneinfo to take the memory regions
into into account. Below is the output on a KVM guest booted with 4 regions,
each of size 512MB.
cat /proc/zoneinfo:
Node 0, Region 0, zone DMA
pages free 3975
min 11
low 13
high 16
scanned 0
spanned 4080
present 3977
nr_free_pages 3975
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 2
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
nr_anon_transparent_hugepages 0
nr_free_cma 0
protection: (0, 471, 471, 471)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 2
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 3
count: 0
high: 0
batch: 1
vm stats threshold: 6
all_unreclaimable: 0
start_pfn: 16
inactive_ratio: 1
Node 0, Region 0, zone DMA32
pages free 107720
min 338
low 422
high 507
scanned 0
spanned 126992
present 120642
.....
Node 0, Region 1, zone DMA32
pages free 131072
min 367
low 458
high 550
scanned 0
spanned 131072
present 131072
.....
Node 0, Region 2, zone DMA32
pages free 131072
min 367
low 458
high 550
scanned 0
spanned 131072
present 131072
.....
Node 0, Region 3, zone DMA32
pages free 121880
min 341
low 426
high 511
scanned 0
spanned 131054
present 121928
.....
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
mm/vmstat.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 86a92a6..b3be9ba 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -179,9 +179,12 @@ void refresh_zone_stat_thresholds(void)
*/
tolerate_drift = low_wmark_pages(zone) - min_wmark_pages(zone);
max_drift = num_online_cpus() * threshold;
- if (max_drift > tolerate_drift)
+ if (max_drift > tolerate_drift) {
zone->percpu_drift_mark = high_wmark_pages(zone) +
max_drift;
+ printk("zone %s drift mark %lu \n", zone->name,
+ zone->percpu_drift_mark);
+ }
}
}
@@ -189,12 +192,11 @@ void set_pgdat_percpu_threshold(pg_data_t *pgdat,
int (*calculate_pressure)(struct zone *))
{
struct mem_region *region;
- struct zone *zone;
int cpu;
int threshold;
int i;
- for (i = 0; i < pgdat->nr_zones; i++) {
+ for (i = 0; i < pgdat->nr_node_zone_types; i++) {
for_each_mem_region_in_node(region, pgdat->node_id) {
struct zone *zone = region->region_zones + i;
@@ -818,11 +820,12 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_PROC_FS
static void frag_show_print(struct seq_file *m, pg_data_t *pgdat,
- struct zone *zone)
+ struct mem_region *region, struct zone *zone)
{
int order;
- seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+ seq_printf(m, "Node %d, REG %d, zone %8s ", pgdat->node_id,
+ region->region, zone->name);
for (order = 0; order < MAX_ORDER; ++order)
seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
seq_putc(m, '\n');
@@ -838,14 +841,15 @@ static int frag_show(struct seq_file *m, void *arg)
return 0;
}
-static void pagetypeinfo_showfree_print(struct seq_file *m,
- pg_data_t *pgdat, struct zone *zone)
+static void pagetypeinfo_showfree_print(struct seq_file *m, pg_data_t *pgdat,
+ struct mem_region *region, struct zone *zone)
{
int order, mtype;
for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) {
- seq_printf(m, "Node %4d, zone %8s, type %12s ",
+ seq_printf(m, "Node %4d, Region %d, zone %8s, type %12s ",
pgdat->node_id,
+ region->region,
zone->name,
migratetype_names[mtype]);
for (order = 0; order < MAX_ORDER; ++order) {
@@ -880,8 +884,8 @@ static int pagetypeinfo_showfree(struct seq_file *m, void *arg)
return 0;
}
-static void pagetypeinfo_showblockcount_print(struct seq_file *m,
- pg_data_t *pgdat, struct zone *zone)
+static void pagetypeinfo_showblockcount_print(struct seq_file *m, pg_data_t *pgdat,
+ struct mem_region *region, struct zone *zone)
{
int mtype;
unsigned long pfn;
@@ -908,7 +912,7 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
}
/* Print counts */
- seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
+ seq_printf(m, "Node %d, Region %d, zone %8s ", pgdat->node_id, region->region, zone->name);
for (mtype = 0; mtype < MIGRATE_TYPES; mtype++)
seq_printf(m, "%12lu ", count[mtype]);
seq_putc(m, '\n');
@@ -989,10 +993,11 @@ static const struct file_operations pagetypeinfo_file_ops = {
};
static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
- struct zone *zone)
+ struct mem_region *region, struct zone *zone)
{
int i;
- seq_printf(m, "Node %d, zone %8s", pgdat->node_id, zone->name);
+ seq_printf(m, "Node %d, Region %d, zone %8s", pgdat->node_id,
+ region->region, zone->name);
seq_printf(m,
"\n pages free %lu"
"\n min %lu"
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH 10/10] mm: Create memory regions at boot-up
2012-11-06 19:39 [RFC PATCH 00/10][Hierarchy] mm: Linux VM Infrastructure to support Memory Power Management Srivatsa S. Bhat
` (8 preceding siblings ...)
2012-11-06 19:41 ` [RFC PATCH 09/10] mm: Reflect memory region changes in zoneinfo Srivatsa S. Bhat
@ 2012-11-06 19:42 ` Srivatsa S. Bhat
9 siblings, 0 replies; 11+ messages in thread
From: Srivatsa S. Bhat @ 2012-11-06 19:42 UTC (permalink / raw)
To: akpm, mgorman, mjg59, paulmck, dave, maxime.coquelin,
loic.pallardy, arjan, kmpark, kamezawa.hiroyu, lenb, rjw
Cc: gargankita, amit.kachhap, svaidy, thomas.abraham,
santosh.shilimkar, srivatsa.bhat, linux-pm, linux-mm,
linux-kernel
From: Ankita Garg <gargankita@gmail.com>
Memory regions are created at boot up time, from the information obtained
from the firmware. But since the firmware doesn't yet export information
about memory units that can be independently power managed, for the purpose
of demonstration, we hard code memory region size to be 512MB.
In future, we expect ACPI 5.0 compliant firmware to expose the required
info in the form of MPST (Memory Power State Table) tables.
Signed-off-by: Ankita Garg <gargankita@gmail.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
mm/page_alloc.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9c1d680..13d1b2f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4491,6 +4491,33 @@ void __init set_pageblock_order(void)
#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */
+#define REGIONS_SIZE (512 << 20) >> PAGE_SHIFT
+
+static void init_node_memory_regions(struct pglist_data *pgdat)
+{
+ int cnt = 0;
+ unsigned long i;
+ unsigned long start_pfn = pgdat->node_start_pfn;
+ unsigned long spanned_pages = pgdat->node_spanned_pages;
+ unsigned long total = 0;
+
+ for (i = start_pfn; i < start_pfn + spanned_pages; i += REGIONS_SIZE) {
+ struct mem_region *region = &pgdat->node_regions[cnt];
+
+ region->start_pfn = i;
+ if ((spanned_pages - total) < REGIONS_SIZE)
+ region->spanned_pages = spanned_pages - total;
+ else
+ region->spanned_pages = REGIONS_SIZE;
+
+ region->node = pgdat->node_id;
+ region->region = cnt;
+ pgdat->nr_node_regions++;
+ total += region->spanned_pages;
+ cnt++;
+ }
+}
+
/*
* Set up the zone data structures:
* - mark all pages reserved
@@ -4653,6 +4680,7 @@ void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
(unsigned long)pgdat->node_mem_map);
#endif
+ init_node_memory_regions(pgdat);
free_area_init_core(pgdat, zones_size, zholes_size);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread