* [RFC PATCH 06/23 V2] mempolicy: use N_MEMORY instead N_HIGH_MEMORY
[not found] ` <1343875991-7533-1-git-send-email-laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
@ 2012-08-02 2:52 ` Lai Jiangshan
0 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 2:52 UTC (permalink / raw)
To: Mel Gorman
Cc: Christoph Lameter, Jiri Kosina, Dan Magenheimer,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Paul Gortmaker,
Konstantin Khlebnikov, H. Peter Anvin, Sam Ravnborg, Gavin Shan,
Rik van Riel, cgroups-u79uwXL29TY76Z2rM5mHXA,
x86-DgEjT+Ai2ygdnm+yROfE0A, Hugh Dickins, Ingo Molnar, Mel Gorman,
KOSAKI Motohiro, David Rientjes, Petr Holasek,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Wanlong Gao, Djalal Harouni,
Rusty Russell, Wen Congyang, Peter Zijlstra
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
---
mm/mempolicy.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..ad0381d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -212,9 +212,9 @@ static int mpol_set_nodemask(struct mempolicy *pol,
/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
if (pol == NULL)
return 0;
- /* Check N_HIGH_MEMORY */
+ /* Check N_MEMORY */
nodes_and(nsc->mask1,
- cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
+ cpuset_current_mems_allowed, node_states[N_MEMORY]);
VM_BUG_ON(!nodes);
if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes))
@@ -1363,7 +1363,7 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
goto out_put;
}
- if (!nodes_subset(*new, node_states[N_HIGH_MEMORY])) {
+ if (!nodes_subset(*new, node_states[N_MEMORY])) {
err = -EINVAL;
goto out_put;
}
@@ -2314,7 +2314,7 @@ void __init numa_policy_init(void)
* fall back to the largest node if they're all smaller.
*/
nodes_clear(interleave_nodes);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long total_pages = node_present_pages(nid);
/* Preserve the largest node */
@@ -2395,7 +2395,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
*nodelist++ = '\0';
if (nodelist_parse(nodelist, nodes))
goto out;
- if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY]))
+ if (!nodes_subset(nodes, node_states[N_MEMORY]))
goto out;
} else
nodes_clear(nodes);
@@ -2429,7 +2429,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
* Default to online nodes with memory if no nodelist
*/
if (!nodelist)
- nodes = node_states[N_HIGH_MEMORY];
+ nodes = node_states[N_MEMORY];
break;
case MPOL_LOCAL:
/*
--
1.7.1
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 04/23 V2] oom: use N_MEMORY instead N_HIGH_MEMORY
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 05/23 V2] mm,migrate: " Lai Jiangshan
` (15 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki,
Michal Hocko, KOSAKI Motohiro, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/oom_kill.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ac300c9..1e58f12 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -257,7 +257,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
* the page allocator means a mempolicy is in effect. Cpuset policy
* is enforced in get_page_from_freelist().
*/
- if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask)) {
+ if (nodemask && !nodes_subset(node_states[N_MEMORY], *nodemask)) {
*totalpages = total_swap_pages;
for_each_node_mask(nid, *nodemask)
*totalpages += node_spanned_pages(nid);
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 05/23 V2] mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
2012-08-02 6:01 ` [RFC PATCH 04/23 V2] oom: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 16:09 ` Christoph Lameter
2012-08-02 6:01 ` [RFC PATCH 06/23 V2] mempolicy: " Lai Jiangshan
` (14 subsequent siblings)
16 siblings, 1 reply; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, Hugh Dickins, Mel Gorman,
Wang Sheng-Hui, Christoph Lameter, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/migrate.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..dbe4f86 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1226,7 +1226,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
if (node < 0 || node >= MAX_NUMNODES)
goto out_pm;
- if (!node_state(node, N_HIGH_MEMORY))
+ if (!node_state(node, N_MEMORY))
goto out_pm;
err = -EACCES;
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 06/23 V2] mempolicy: use N_MEMORY instead N_HIGH_MEMORY
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
2012-08-02 6:01 ` [RFC PATCH 04/23 V2] oom: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 05/23 V2] mm,migrate: " Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 07/23 V2] memcontrol: " Lai Jiangshan
` (13 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, Mel Gorman, David Rientjes,
Rik van Riel, KOSAKI Motohiro, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/mempolicy.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..ad0381d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -212,9 +212,9 @@ static int mpol_set_nodemask(struct mempolicy *pol,
/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
if (pol == NULL)
return 0;
- /* Check N_HIGH_MEMORY */
+ /* Check N_MEMORY */
nodes_and(nsc->mask1,
- cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
+ cpuset_current_mems_allowed, node_states[N_MEMORY]);
VM_BUG_ON(!nodes);
if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes))
@@ -1363,7 +1363,7 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
goto out_put;
}
- if (!nodes_subset(*new, node_states[N_HIGH_MEMORY])) {
+ if (!nodes_subset(*new, node_states[N_MEMORY])) {
err = -EINVAL;
goto out_put;
}
@@ -2314,7 +2314,7 @@ void __init numa_policy_init(void)
* fall back to the largest node if they're all smaller.
*/
nodes_clear(interleave_nodes);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long total_pages = node_present_pages(nid);
/* Preserve the largest node */
@@ -2395,7 +2395,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
*nodelist++ = '\0';
if (nodelist_parse(nodelist, nodes))
goto out;
- if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY]))
+ if (!nodes_subset(nodes, node_states[N_MEMORY]))
goto out;
} else
nodes_clear(nodes);
@@ -2429,7 +2429,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
* Default to online nodes with memory if no nodelist
*/
if (!nodelist)
- nodes = node_states[N_HIGH_MEMORY];
+ nodes = node_states[N_MEMORY];
break;
case MPOL_LOCAL:
/*
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 07/23 V2] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (2 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 06/23 V2] mempolicy: " Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 08/23 V2] hugetlb: " Lai Jiangshan
` (12 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Johannes Weiner, Michal Hocko, Balbir Singh,
KAMEZAWA Hiroyuki, Tejun Heo, Li Zefan, cgroups, linux-mm,
containers
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/memcontrol.c | 18 +++++++++---------
mm/page_cgroup.c | 2 +-
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f72b5e5..4402c2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -797,7 +797,7 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
int nid;
u64 total = 0;
- for_each_node_state(nid, N_HIGH_MEMORY)
+ for_each_node_state(nid, N_MEMORY)
total += mem_cgroup_node_nr_lru_pages(memcg, nid, lru_mask);
return total;
}
@@ -1549,9 +1549,9 @@ static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
return;
/* make a nodemask where this memcg uses memory from */
- memcg->scan_nodes = node_states[N_HIGH_MEMORY];
+ memcg->scan_nodes = node_states[N_MEMORY];
- for_each_node_mask(nid, node_states[N_HIGH_MEMORY]) {
+ for_each_node_mask(nid, node_states[N_MEMORY]) {
if (!test_mem_cgroup_node_reclaimable(memcg, nid, false))
node_clear(nid, memcg->scan_nodes);
@@ -1622,7 +1622,7 @@ static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
/*
* Check rest of nodes.
*/
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
if (node_isset(nid, memcg->scan_nodes))
continue;
if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
@@ -3700,7 +3700,7 @@ move_account:
drain_all_stock_sync(memcg);
ret = 0;
mem_cgroup_start_move(memcg);
- for_each_node_state(node, N_HIGH_MEMORY) {
+ for_each_node_state(node, N_MEMORY) {
for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
enum lru_list lru;
for_each_lru(lru) {
@@ -4025,7 +4025,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
seq_printf(m, "total=%lu", total_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL);
seq_printf(m, " N%d=%lu", nid, node_nr);
}
@@ -4033,7 +4033,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
file_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_FILE);
seq_printf(m, "file=%lu", file_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
LRU_ALL_FILE);
seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4042,7 +4042,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
anon_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_ANON);
seq_printf(m, "anon=%lu", anon_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
LRU_ALL_ANON);
seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4051,7 +4051,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
unevictable_nr = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
seq_printf(m, "unevictable=%lu", unevictable_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
BIT(LRU_UNEVICTABLE));
seq_printf(m, " N%d=%lu", nid, node_nr);
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index eb750f8..e775239 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -271,7 +271,7 @@ void __init page_cgroup_init(void)
if (mem_cgroup_disabled())
return;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long start_pfn, end_pfn;
start_pfn = node_start_pfn(nid);
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 08/23 V2] hugetlb: use N_MEMORY instead N_HIGH_MEMORY
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (3 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 07/23 V2] memcontrol: " Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-04 14:02 ` Hillf Danton
2012-08-02 6:01 ` [RFC PATCH 09/23 V2] vmstat: " Lai Jiangshan
` (11 subsequent siblings)
16 siblings, 1 reply; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Greg Kroah-Hartman, Andrew Morton, Hillf Danton,
Michal Hocko, KAMEZAWA Hiroyuki, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
drivers/base/node.c | 2 +-
mm/hugetlb.c | 24 ++++++++++++------------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..31f4805 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -227,7 +227,7 @@ static node_registration_func_t __hugetlb_unregister_node;
static inline bool hugetlb_register_node(struct node *node)
{
if (__hugetlb_register_node &&
- node_state(node->dev.id, N_HIGH_MEMORY)) {
+ node_state(node->dev.id, N_MEMORY)) {
__hugetlb_register_node(node);
return true;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e198831..661db47 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1046,7 +1046,7 @@ static void return_unused_surplus_pages(struct hstate *h,
* on-line nodes with memory and will handle the hstate accounting.
*/
while (nr_pages--) {
- if (!free_pool_huge_page(h, &node_states[N_HIGH_MEMORY], 1))
+ if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
break;
}
}
@@ -1150,14 +1150,14 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
int __weak alloc_bootmem_huge_page(struct hstate *h)
{
struct huge_bootmem_page *m;
- int nr_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+ int nr_nodes = nodes_weight(node_states[N_MEMORY]);
while (nr_nodes) {
void *addr;
addr = __alloc_bootmem_node_nopanic(
NODE_DATA(hstate_next_node_to_alloc(h,
- &node_states[N_HIGH_MEMORY])),
+ &node_states[N_MEMORY])),
huge_page_size(h), huge_page_size(h), 0);
if (addr) {
@@ -1229,7 +1229,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
if (!alloc_bootmem_huge_page(h))
break;
} else if (!alloc_fresh_huge_page(h,
- &node_states[N_HIGH_MEMORY]))
+ &node_states[N_MEMORY]))
break;
}
h->max_huge_pages = i;
@@ -1497,7 +1497,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
if (!(obey_mempolicy &&
init_nodemask_of_mempolicy(nodes_allowed))) {
NODEMASK_FREE(nodes_allowed);
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
}
} else if (nodes_allowed) {
/*
@@ -1507,11 +1507,11 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
init_nodemask_of_node(nodes_allowed, nid);
} else
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
- if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+ if (nodes_allowed != &node_states[N_MEMORY])
NODEMASK_FREE(nodes_allowed);
return len;
@@ -1812,7 +1812,7 @@ static void hugetlb_register_all_nodes(void)
{
int nid;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
struct node *node = &node_devices[nid];
if (node->dev.id == nid)
hugetlb_register_node(node);
@@ -1906,8 +1906,8 @@ void __init hugetlb_add_hstate(unsigned order)
h->free_huge_pages = 0;
for (i = 0; i < MAX_NUMNODES; ++i)
INIT_LIST_HEAD(&h->hugepage_freelists[i]);
- h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
- h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
+ h->next_nid_to_alloc = first_node(node_states[N_MEMORY]);
+ h->next_nid_to_free = first_node(node_states[N_MEMORY]);
snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
huge_page_size(h)/1024);
@@ -1995,11 +1995,11 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
if (!(obey_mempolicy &&
init_nodemask_of_mempolicy(nodes_allowed))) {
NODEMASK_FREE(nodes_allowed);
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
}
h->max_huge_pages = set_max_huge_pages(h, tmp, nodes_allowed);
- if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+ if (nodes_allowed != &node_states[N_MEMORY])
NODEMASK_FREE(nodes_allowed);
}
out:
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 09/23 V2] vmstat: use N_MEMORY instead N_HIGH_MEMORY
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (4 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 08/23 V2] hugetlb: " Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 16:09 ` Christoph Lameter
2012-08-02 6:01 ` [RFC PATCH 12/23 V2] vmscan: " Lai Jiangshan
` (10 subsequent siblings)
16 siblings, 1 reply; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, Christoph Lameter,
KAMEZAWA Hiroyuki, David Rientjes, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/vmstat.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1bbbbd9..aa3da12 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -917,7 +917,7 @@ static int pagetypeinfo_show(struct seq_file *m, void *arg)
pg_data_t *pgdat = (pg_data_t *)arg;
/* check memoryless node */
- if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ if (!node_state(pgdat->node_id, N_MEMORY))
return 0;
seq_printf(m, "Page block order: %d\n", pageblock_order);
@@ -1279,7 +1279,7 @@ static int unusable_show(struct seq_file *m, void *arg)
pg_data_t *pgdat = (pg_data_t *)arg;
/* check memoryless node */
- if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ if (!node_state(pgdat->node_id, N_MEMORY))
return 0;
walk_zones_in_node(m, pgdat, unusable_show_print);
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 12/23 V2] vmscan: use N_MEMORY instead N_HIGH_MEMORY
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (5 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 09/23 V2] vmstat: " Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 13/23 V2] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
` (9 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, KAMEZAWA Hiroyuki, Hugh Dickins,
Minchan Kim, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/vmscan.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 66e4310..1888026 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2921,7 +2921,7 @@ static int __devinit cpu_callback(struct notifier_block *nfb,
int nid;
if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
pg_data_t *pgdat = NODE_DATA(nid);
const struct cpumask *mask;
@@ -2976,7 +2976,7 @@ static int __init kswapd_init(void)
int nid;
swap_setup();
- for_each_node_state(nid, N_HIGH_MEMORY)
+ for_each_node_state(nid, N_MEMORY)
kswapd_run(nid);
hotcpu_notifier(cpu_callback, 0);
return 0;
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 13/23 V2] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (6 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 12/23 V2] vmscan: " Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 14/23 V2] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
` (8 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Tejun Heo, Pekka Enberg, Yinghai Lu, David Rientjes,
Andrew Morton, Michal Hocko, KAMEZAWA Hiroyuki, Minchan Kim,
linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Since we introduced N_MEMORY, we update the initialization of node_states.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
arch/x86/mm/init_64.c | 4 +++-
mm/page_alloc.c | 40 ++++++++++++++++++++++------------------
2 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 2b6b4a3..005f00c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -625,7 +625,9 @@ void __init paging_init(void)
* numa support is not compiled in, and later node_set_state
* will not set it back.
*/
- node_clear_state(0, N_NORMAL_MEMORY);
+ node_clear_state(0, N_MEMORY);
+ if (N_MEMORY != N_NORMAL_MEMORY)
+ node_clear_state(0, N_NORMAL_MEMORY);
zone_sizes_init();
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4a4f921..0571f2a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1646,7 +1646,7 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
*
* If the zonelist cache is present in the passed in zonelist, then
* returns a pointer to the allowed node mask (either the current
- * tasks mems_allowed, or node_states[N_HIGH_MEMORY].)
+ * tasks mems_allowed, or node_states[N_MEMORY].)
*
* If the zonelist cache is not available for this zonelist, does
* nothing and returns NULL.
@@ -1675,7 +1675,7 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags)
allowednodes = !in_interrupt() && (alloc_flags & ALLOC_CPUSET) ?
&cpuset_current_mems_allowed :
- &node_states[N_HIGH_MEMORY];
+ &node_states[N_MEMORY];
return allowednodes;
}
@@ -3070,7 +3070,7 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
return node;
}
- for_each_node_state(n, N_HIGH_MEMORY) {
+ for_each_node_state(n, N_MEMORY) {
/* Don't want a node to appear more than once */
if (node_isset(n, *used_node_mask))
@@ -3212,7 +3212,7 @@ static int default_zonelist_order(void)
* local memory, NODE_ORDER may be suitable.
*/
average_size = total_size /
- (nodes_weight(node_states[N_HIGH_MEMORY]) + 1);
+ (nodes_weight(node_states[N_MEMORY]) + 1);
for_each_online_node(nid) {
low_kmem_size = 0;
total_size = 0;
@@ -4587,7 +4587,7 @@ unsigned long __init find_min_pfn_with_active_regions(void)
/*
* early_calculate_totalpages()
* Sum pages in active regions for movable zone.
- * Populate N_HIGH_MEMORY for calculating usable_nodes.
+ * Populate N_MEMORY for calculating usable_nodes.
*/
static unsigned long __init early_calculate_totalpages(void)
{
@@ -4600,7 +4600,7 @@ static unsigned long __init early_calculate_totalpages(void)
totalpages += pages;
if (pages)
- node_set_state(nid, N_HIGH_MEMORY);
+ node_set_state(nid, N_MEMORY);
}
return totalpages;
}
@@ -4617,9 +4617,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
unsigned long usable_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
- nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
+ nodemask_t saved_node_state = node_states[N_MEMORY];
unsigned long totalpages = early_calculate_totalpages();
- int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+ int usable_nodes = nodes_weight(node_states[N_MEMORY]);
/*
* If movablecore was specified, calculate what size of
@@ -4654,7 +4654,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
kernelcore_node = required_kernelcore / usable_nodes;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long start_pfn, end_pfn;
/*
@@ -4746,23 +4746,27 @@ restart:
out:
/* restore the node_state */
- node_states[N_HIGH_MEMORY] = saved_node_state;
+ node_states[N_MEMORY] = saved_node_state;
}
-/* Any regular memory on that node ? */
-static void check_for_regular_memory(pg_data_t *pgdat)
+/* Any regular or high memory on that node ? */
+static void check_for_memory(pg_data_t *pgdat, int nid)
{
-#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;
- for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
+ if (N_MEMORY == N_NORMAL_MEMORY)
+ return;
+
+ for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) {
struct zone *zone = &pgdat->node_zones[zone_type];
if (zone->present_pages) {
- node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
+ node_set_state(nid, N_HIGH_MEMORY);
+ if (N_NORMAL_MEMORY != N_HIGH_MEMORY &&
+ zone_type <= ZONE_NORMAL)
+ node_set_state(nid, N_NORMAL_MEMORY);
break;
}
}
-#endif
}
/**
@@ -4845,8 +4849,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
/* Any memory on that node */
if (pgdat->node_present_pages)
- node_set_state(nid, N_HIGH_MEMORY);
- check_for_regular_memory(pgdat);
+ node_set_state(nid, N_MEMORY);
+ check_for_memory(pgdat, nid);
}
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 14/23 V2] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (7 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 13/23 V2] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 15/23 V2] memory_hotplug: fix missing nodemask management Lai Jiangshan
` (7 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Christoph Lameter, Pekka Enberg, Matt Mackall,
linux-mm
SLUB only fucus on the nodes which has normal memory, so ignore the other
node's hot-adding and hot-removing.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/slub.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 8c691fa..4c5bdc0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3577,6 +3577,9 @@ static void slab_mem_offline_callback(void *arg)
if (offline_node < 0)
return;
+ if (page_zonenum(pfn_to_page(marg->start_pfn)) > ZONE_NORMAL)
+ return;
+
down_read(&slub_lock);
list_for_each_entry(s, &slab_caches, list) {
n = get_node(s, offline_node);
@@ -3611,6 +3614,9 @@ static int slab_mem_going_online_callback(void *arg)
if (nid < 0)
return 0;
+ if (page_zonenum(pfn_to_page(marg->start_pfn)) > ZONE_NORMAL)
+ return 0;
+
/*
* We are bringing a node online. No memory is available yet. We must
* allocate a kmem_cache_node structure in order to bring the node
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 15/23 V2] memory_hotplug: fix missing nodemask management
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (8 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 14/23 V2] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 16/23 V2] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
` (6 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Rob Landley, Andrew Morton, Paul Gortmaker,
Bjorn Helgaas, David Rientjes, Wen Congyang, linux-doc, linux-mm
Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgot to manage node_states[N_NORMAL_MEMORY]. fix it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
Documentation/memory-hotplug.txt | 2 +-
mm/memory_hotplug.c | 23 +++++++++++++++++++++--
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..89f21b2 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -382,7 +382,7 @@ struct memory_notify {
start_pfn is start_pfn of online/offline memory.
nr_pages is # of pages of online/offline memory.
-status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
+status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid >= 0, callback should create/discard structures for the
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 427bb29..c44b39e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -522,8 +522,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
init_per_zone_wmark_min();
if (onlined_pages) {
+ enum zone_type zoneid = zone_idx(zone);
+
kswapd_run(zone_to_nid(zone));
- node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
+
+ node_set_state(nid, N_MEMORY);
+ if (zoneid <= ZONE_NORMAL && N_NORMAL_MEMORY != N_MEMORY)
+ node_set_state(nid, N_NORMAL_MEMORY);
+#ifdef CONFIG_HIGMEM
+ if (zoneid <= ZONE_HIGHMEM && N_HIGH_MEMORY != N_MEMORY)
+ node_set_state(nid, N_HIGH_MEMORY);
+#endif
+
}
vm_total_pages = nr_free_pagecache_pages();
@@ -966,7 +976,16 @@ repeat:
init_per_zone_wmark_min();
if (!node_present_pages(node)) {
- node_clear_state(node, N_HIGH_MEMORY);
+ enum zone_type zoneid = zone_idx(zone);
+
+ node_clear_state(node, N_MEMORY);
+ if (zoneid <= ZONE_NORMAL && N_NORMAL_MEMORY != N_MEMORY)
+ node_clear_state(node, N_NORMAL_MEMORY);
+#ifdef CONFIG_HIGMEM
+ if (zoneid <= ZONE_HIGHMEM && N_HIGH_MEMORY != N_MEMORY)
+ node_clear_state(node, N_HIGH_MEMORY);
+#endif
+
kswapd_stop(node);
}
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 16/23 V2] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (9 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 15/23 V2] memory_hotplug: fix missing nodemask management Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 17/23 V2] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
` (5 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Greg Kroah-Hartman, Andrew Morton, Jan Beulich,
Seth Jennings, Dan Magenheimer, Michal Hocko, KAMEZAWA Hiroyuki,
Minchan Kim, linux-mm
All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
drivers/base/node.c | 6 ++++++
include/linux/nodemask.h | 4 ++++
mm/Kconfig | 8 ++++++++
mm/page_alloc.c | 3 +++
4 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 31f4805..4bf5629 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -621,6 +621,9 @@ static struct node_attr node_state_attr[] = {
#ifdef CONFIG_HIGHMEM
_NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
#endif
+#ifdef CONFIG_MOVABLE_NODE
+ _NODE_ATTR(has_memory, N_MEMORY),
+#endif
};
static struct attribute *node_state_attrs[] = {
@@ -631,6 +634,9 @@ static struct attribute *node_state_attrs[] = {
#ifdef CONFIG_HIGHMEM
&node_state_attr[4].attr.attr,
#endif
+#ifdef CONFIG_MOVABLE_NODE
+ &node_state_attr[4].attr.attr,
+#endif
NULL
};
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index c6ebdc9..4e2cbfa 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -380,7 +380,11 @@ enum node_states {
#else
N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif
+#ifdef CONFIG_MOVABLE_NODE
+ N_MEMORY, /* The node has memory(regular, high, movable) */
+#else
N_MEMORY = N_HIGH_MEMORY,
+#endif
N_CPU, /* The node has one or more cpus */
NR_NODE_STATES
};
diff --git a/mm/Kconfig b/mm/Kconfig
index 82fed4e..4371c65 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -140,6 +140,14 @@ config ARCH_DISCARD_MEMBLOCK
config NO_BOOTMEM
boolean
+config MOVABLE_NODE
+ boolean "Enable to assign a node has only movable memory"
+ depends on HAVE_MEMBLOCK
+ depends on NO_BOOTMEM
+ depends on X86_64
+ depends on NUMA
+ default y
+
# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0571f2a..737faf7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -91,6 +91,9 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
#ifdef CONFIG_HIGHMEM
[N_HIGH_MEMORY] = { { [0] = 1UL } },
#endif
+#ifdef CONFIG_MOVABLE_NODE
+ [N_MEMORY] = { { [0] = 1UL } },
+#endif
[N_CPU] = { { [0] = 1UL } },
#endif /* NUMA */
};
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 17/23 V2] page_alloc.c: don't subtract unrelated memmap from zone's present pages
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (10 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 16/23 V2] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 18/23 V2] page_alloc: add kernelcore_max_addr Lai Jiangshan
` (4 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, Michal Hocko, KAMEZAWA Hiroyuki,
Minchan Kim, linux-mm
A)======
Currently, memory-page-map(struct page array) is not defined in struct zone.
It is defined in several ways:
FLATMEM: global memmap, can be allocated from any zone <= ZONE_NORMAL
CONFIG_DISCONTIGMEM: node-specific memmap, can be allocated from any
zone <= ZONE_NORMAL within that node.
CONFIG_SPARSEMEM: memorysection-specific memmap, can be allocated from any zone,
when CONFIG_SPARSEMEM_VMEMMAP, it is even not physical continuous.
So, the memmap has nothing directly related with the zone. And it's memory can be
allocated outside, so it is wrong to subtract memmap's size from zone's
present pages.
B)======
When system has large holes, the subtracted-present-pages-size will become
very small or negative, make the memory management works bad at the zone or
make the zone unusable even the real-present-pages-size is actually large.
C)======
And subtracted-present-pages-size has problem when memory-hot-removing,
the zone->zone->present_pages may overflow and become huge(unsigned long).
D)======
memory-page-map is large and long living unreclaimable memory, it is good to
subtract them for proper watermark.
So a new proper approach is needed to do it similarly
and new approach should also handle other long living unreclaimable memory.
Current blindly subtracted-present-pages-size approach does wrong, remove it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/page_alloc.c | 20 +-------------------
1 files changed, 1 insertions(+), 19 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 737faf7..03ad63d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4360,30 +4360,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ unsigned long size, realsize;
size = zone_spanned_pages_in_node(nid, j, zones_size);
realsize = size - zone_absent_pages_in_node(nid, j,
zholes_size);
- /*
- * Adjust realsize so that it accounts for how much memory
- * is used by this zone for memmap. This affects the watermark
- * and per-cpu initialisations
- */
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
- if (memmap_pages)
- printk(KERN_DEBUG
- " %s zone: %lu pages used for memmap\n",
- zone_names[j], memmap_pages);
- } else
- printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
-
/* Account for reserved pages */
if (j == 0 && realsize > dma_reserve) {
realsize -= dma_reserve;
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 18/23 V2] page_alloc: add kernelcore_max_addr
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (11 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 17/23 V2] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 21/23 V2] memblock: limit memory address from memblock Lai Jiangshan
` (3 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Rob Landley, Andrew Morton, Michal Hocko,
KAMEZAWA Hiroyuki, Minchan Kim, linux-doc, linux-mm
Current ZONE_MOVABLE (kernelcore=) setting policy with boot option doesn't meet
our requirement. We need something like kernelcore_max_addr=XX boot option
to limit the kernelcore upper address.
The memory with higher address will be migratable(movable) and they
are easier to be offline(always ready to be offline when the system don't require
so much memory).
It makes things easy when we dynamic hot-add/remove memory, make better
utilities of memories, and helps for THP.
All kernelcore_max_addr=, kernelcore= and movablecore= can be safely specified
at the same time(or any 2 of them).
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
Documentation/kernel-parameters.txt | 9 +++++++++
mm/page_alloc.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 37 insertions(+), 1 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 12783fa..48dff61 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1216,6 +1216,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
use the HighMem zone if it exists, and the Normal
zone if it does not.
+ kernelcore_max_addr=nn[KMG] [KNL,X86,IA-64,PPC] This parameter
+ is the same effect as kernelcore parameter, except it
+ specifies the up physical address of memory range
+ usable by the kernel for non-movable allocations.
+ If both kernelcore and kernelcore_max_addr are
+ specified, this requested's priority is higher than
+ kernelcore's.
+ See the kernelcore parameter.
+
kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port.
Format: <Controller#>[,poll interval]
The controller # is the number of the ehci usb debug
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 03ad63d..65ac5c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -204,6 +204,7 @@ static unsigned long __meminitdata dma_reserve;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
+static unsigned long __initdata required_kernelcore_max_pfn;
static unsigned long __initdata required_kernelcore;
static unsigned long __initdata required_movablecore;
static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
@@ -4600,6 +4601,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
{
int i, nid;
unsigned long usable_startpfn;
+ unsigned long kernelcore_max_pfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -4628,6 +4630,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
required_kernelcore = max(required_kernelcore, corepages);
}
+ if (required_kernelcore_max_pfn && !required_kernelcore)
+ required_kernelcore = totalpages;
+
/* If kernelcore was not specified, there is no ZONE_MOVABLE */
if (!required_kernelcore)
goto out;
@@ -4636,6 +4641,12 @@ static void __init find_zone_movable_pfns_for_nodes(void)
find_usable_zone_for_movable();
usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
+ if (required_kernelcore_max_pfn)
+ kernelcore_max_pfn = required_kernelcore_max_pfn;
+ else
+ kernelcore_max_pfn = ULONG_MAX >> PAGE_SHIFT;
+ kernelcore_max_pfn = max(kernelcore_max_pfn, usable_startpfn);
+
restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
kernelcore_node = required_kernelcore / usable_nodes;
@@ -4662,8 +4673,12 @@ restart:
unsigned long size_pages;
start_pfn = max(start_pfn, zone_movable_pfn[nid]);
- if (start_pfn >= end_pfn)
+ end_pfn = min(kernelcore_max_pfn, end_pfn);
+ if (start_pfn >= end_pfn) {
+ if (!zone_movable_pfn[nid])
+ zone_movable_pfn[nid] = start_pfn;
continue;
+ }
/* Account for what is only usable for kernelcore */
if (start_pfn < usable_startpfn) {
@@ -4854,6 +4869,18 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
return 0;
}
+#ifdef CONFIG_MOVABLE_NODE
+/*
+ * kernelcore_max_addr=addr sets the up physical address of memory range
+ * for use for allocations that cannot be reclaimed or migrated.
+ */
+static int __init cmdline_parse_kernelcore_max_addr(char *p)
+{
+ return cmdline_parse_core(p, &required_kernelcore_max_pfn);
+}
+early_param("kernelcore_max_addr", cmdline_parse_kernelcore_max_addr);
+#endif
+
/*
* kernelcore=size sets the amount of memory for use for allocations that
* cannot be reclaimed or migrated.
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 21/23 V2] memblock: limit memory address from memblock
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (12 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 18/23 V2] page_alloc: add kernelcore_max_addr Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 22/23 V2] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
` (2 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Yasuaki Ishimatsu, Lai Jiangshan, Tejun Heo, Andrew Morton,
Yinghai Lu, Sam Ravnborg, Ingo Molnar, Gavin Shan, Michal Hocko,
KAMEZAWA Hiroyuki, Minchan Kim, linux-mm
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Setting kernelcore_max_pfn means all memory which is bigger than
the boot parameter is allocated as ZONE_MOVABLE. So memory which
is allocated by memblock also should be limited by the parameter.
The patch limits memory from memblock.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 5 ++++-
mm/page_alloc.c | 6 +++++-
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 19dc455..f2977ae 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -42,6 +42,7 @@ struct memblock {
extern struct memblock memblock;
extern int memblock_debug;
+extern phys_addr_t memblock_limit;
#define memblock_dbg(fmt, ...) \
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/memblock.c b/mm/memblock.c
index 5cc6731..663b805 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -931,7 +931,10 @@ int __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t si
void __init_memblock memblock_set_current_limit(phys_addr_t limit)
{
- memblock.current_limit = limit;
+ if (!memblock_limit || (memblock_limit > limit))
+ memblock.current_limit = limit;
+ else
+ memblock.current_limit = memblock_limit;
}
static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 65ac5c9..c4d3aa0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -209,6 +209,8 @@ static unsigned long __initdata required_kernelcore;
static unsigned long __initdata required_movablecore;
static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
+phys_addr_t memblock_limit;
+
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
int movable_zone;
EXPORT_SYMBOL(movable_zone);
@@ -4876,7 +4878,9 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
*/
static int __init cmdline_parse_kernelcore_max_addr(char *p)
{
- return cmdline_parse_core(p, &required_kernelcore_max_pfn);
+ cmdline_parse_core(p, &required_kernelcore_max_pfn);
+ memblock_limit = required_kernelcore_max_pfn << PAGE_SHIFT;
+ return 0;
}
early_param("kernelcore_max_addr", cmdline_parse_kernelcore_max_addr);
#endif
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 22/23 V2] memblock: compare current_limit with end variable at memblock_find_in_range_node()
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (13 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 21/23 V2] memblock: limit memory address from memblock Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 23/23 V2] mm, memory-hotplug: add online_movable Lai Jiangshan
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Yasuaki Ishimatsu, Lai Jiangshan, Tejun Heo, Andrew Morton,
Ingo Molnar, Gavin Shan, Yinghai Lu, linux-mm
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
memblock_find_in_range_node() does not compare memblock.current_limit
with end variable. Thus even if memblock.current_limit is smaller than
end variable, the function allocates memory address that is bigger than
memblock.current_limit.
The patch adds the check to "memblock_find_in_range_node()"
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/memblock.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 663b805..ce7fcb6 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -99,11 +99,12 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
phys_addr_t align, int nid)
{
phys_addr_t this_start, this_end, cand;
+ phys_addr_t current_limit = memblock.current_limit;
u64 i;
/* pump up @end */
- if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
- end = memblock.current_limit;
+ if ((end == MEMBLOCK_ALLOC_ACCESSIBLE) || (end > current_limit))
+ end = current_limit;
/* avoid allocating the first page */
start = max_t(phys_addr_t, start, PAGE_SIZE);
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC PATCH 23/23 V2] mm, memory-hotplug: add online_movable
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
` (14 preceding siblings ...)
2012-08-02 6:01 ` [RFC PATCH 22/23 V2] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
@ 2012-08-02 6:01 ` Lai Jiangshan
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-02 6:01 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Rob Landley, Greg Kroah-Hartman, Paul Gortmaker,
Andrew Morton, Bjorn Helgaas, David Rientjes, Wen Congyang,
linux-doc, linux-mm
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.
It makes things easy when we dynamic hot-add/remove memory, make better
utilities of memories, and helps for THP.
Current constraints: Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be onlined from ZONE_NORMAL to ZONE_MOVABLE.
For opposite onlining behavior, we also introduce "online_kernel" to change
a memoryblock of ZONE_MOVABLE to ZONE_KERNEL when online.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
Documentation/memory-hotplug.txt | 14 ++++-
drivers/base/memory.c | 19 ++++--
include/linux/memory_hotplug.h | 13 ++++-
mm/memory_hotplug.c | 114 +++++++++++++++++++++++++++++++++++--
4 files changed, 144 insertions(+), 16 deletions(-)
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 89f21b2..7b1269c 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -161,7 +161,8 @@ a recent addition and not present on older kernels.
in the memory block.
'state' : read-write
at read: contains online/offline state of memory.
- at write: user can specify "online", "offline" command
+ at write: user can specify "online_kernel",
+ "online_movable", "online", "offline" command
which will be performed on al sections in the block.
'phys_device' : read-only: designed to show the name of physical memory
device. This is not well implemented now.
@@ -255,6 +256,17 @@ For onlining, you have to write "online" to the section's state file as:
% echo online > /sys/devices/system/memory/memoryXXX/state
+This onlining will not change the ZONE type of the target memory section,
+If the memory section is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
+
+% echo online_movable > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_MOVABLE)
+
+And if the memory section is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
+
+% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_NORMAL)
+
After this, section memoryXXX's state will be 'online' and the amount of
available memory will be increased.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..1ad2f48 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,7 +246,7 @@ static bool pages_correctly_reserved(unsigned long start_pfn,
* OK to have direct references to sparsemem variables in here.
*/
static int
-memory_block_action(unsigned long phys_index, unsigned long action)
+memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
{
unsigned long start_pfn, start_paddr;
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -262,7 +262,7 @@ memory_block_action(unsigned long phys_index, unsigned long action)
if (!pages_correctly_reserved(start_pfn, nr_pages))
return -EBUSY;
- ret = online_pages(start_pfn, nr_pages);
+ ret = online_pages(start_pfn, nr_pages, online_type);
break;
case MEM_OFFLINE:
start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
@@ -279,7 +279,8 @@ memory_block_action(unsigned long phys_index, unsigned long action)
}
static int memory_block_change_state(struct memory_block *mem,
- unsigned long to_state, unsigned long from_state_req)
+ unsigned long to_state, unsigned long from_state_req,
+ int online_type)
{
int ret = 0;
@@ -293,7 +294,7 @@ static int memory_block_change_state(struct memory_block *mem,
if (to_state == MEM_OFFLINE)
mem->state = MEM_GOING_OFFLINE;
- ret = memory_block_action(mem->start_section_nr, to_state);
+ ret = memory_block_action(mem->start_section_nr, to_state, online_type);
if (ret) {
mem->state = from_state_req;
@@ -325,10 +326,14 @@ store_mem_state(struct device *dev,
mem = container_of(dev, struct memory_block, dev);
- if (!strncmp(buf, "online", min((int)count, 6)))
- ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
+ if (!strncmp(buf, "online_kernel", min((int)count, 13)))
+ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_KERNEL);
+ else if (!strncmp(buf, "online_movable", min((int)count, 14)))
+ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_MOVABLE);
+ else if (!strncmp(buf, "online", min((int)count, 6)))
+ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_KEEP);
else if(!strncmp(buf, "offline", min((int)count, 7)))
- ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+ ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE, -1);
if (ret)
return ret;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..047cd1d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -25,6 +25,13 @@ enum {
MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE = NODE_INFO,
};
+/* Types for control the zone type of onlined memory */
+enum {
+ ONLINE_KEEP,
+ ONLINE_KERNEL,
+ ONLINE_MOVABLE,
+};
+
/*
* pgdat resizing functions
*/
@@ -45,6 +52,10 @@ void pgdat_resize_init(struct pglist_data *pgdat)
}
/*
* Zone resizing functions
+ *
+ * Note: any attempt to resize a zone should has pgdat_resize_lock()
+ * zone_span_writelock() both held. This ensure the size of a zone
+ * can't be changed while pgdat_resize_lock() held.
*/
static inline unsigned zone_span_seqbegin(struct zone *zone)
{
@@ -70,7 +81,7 @@ extern int zone_grow_free_lists(struct zone *zone, unsigned long new_nr_pages);
extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
/* VM interface that may be used by firmware interface */
-extern int online_pages(unsigned long, unsigned long);
+extern int online_pages(unsigned long, unsigned long, int);
extern void __offline_isolated_pages(unsigned long, unsigned long);
typedef void (*online_page_callback_t)(struct page *page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c44b39e..b5ee3db 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -210,6 +210,89 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
zone_span_writeunlock(zone);
}
+static void resize_zone(struct zone *zone, unsigned long start_pfn,
+ unsigned long end_pfn)
+{
+
+ zone_span_writelock(zone);
+
+ zone->zone_start_pfn = start_pfn;
+ zone->spanned_pages = end_pfn - start_pfn;
+
+ zone_span_writeunlock(zone);
+}
+
+static void fix_zone_id(struct zone *zone, unsigned long start_pfn,
+ unsigned long end_pfn)
+{
+ enum zone_type zid = zone_idx(zone);
+ int nid = zone->zone_pgdat->node_id;
+ unsigned long pfn;
+
+ for (pfn = start_pfn; pfn < end_pfn; pfn++)
+ set_page_links(pfn_to_page(pfn), zid, nid, pfn);
+}
+
+static int move_pfn_range_left(struct zone *z1, struct zone *z2,
+ unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long flags;
+
+ pgdat_resize_lock(z1->zone_pgdat, &flags);
+
+ /* can't move pfns which are higher than @z2 */
+ if (end_pfn > z2->zone_start_pfn + z2->spanned_pages)
+ goto out_fail;
+ /* the move out part mast at the left most of @z2 */
+ if (start_pfn > z2->zone_start_pfn)
+ goto out_fail;
+ /* must included/overlap */
+ if (end_pfn <= z2->zone_start_pfn)
+ goto out_fail;
+
+ resize_zone(z1, z1->zone_start_pfn, end_pfn);
+ resize_zone(z2, end_pfn, z2->zone_start_pfn + z2->spanned_pages);
+
+ pgdat_resize_unlock(z1->zone_pgdat, &flags);
+
+ fix_zone_id(z1, start_pfn, end_pfn);
+
+ return 0;
+out_fail:
+ pgdat_resize_unlock(z1->zone_pgdat, &flags);
+ return -1;
+}
+
+static int move_pfn_range_right(struct zone *z1, struct zone *z2,
+ unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long flags;
+
+ pgdat_resize_lock(z1->zone_pgdat, &flags);
+
+ /* can't move pfns which are lower than @z1 */
+ if (z1->zone_start_pfn > start_pfn)
+ goto out_fail;
+ /* the move out part mast at the right most of @z1 */
+ if (z1->zone_start_pfn + z1->spanned_pages > end_pfn)
+ goto out_fail;
+ /* must included/overlap */
+ if (start_pfn >= z1->zone_start_pfn + z1->spanned_pages)
+ goto out_fail;
+
+ resize_zone(z1, z1->zone_start_pfn, start_pfn);
+ resize_zone(z2, start_pfn, z2->zone_start_pfn + z2->spanned_pages);
+
+ pgdat_resize_unlock(z1->zone_pgdat, &flags);
+
+ fix_zone_id(z2, start_pfn, end_pfn);
+
+ return 0;
+out_fail:
+ pgdat_resize_unlock(z1->zone_pgdat, &flags);
+ return -1;
+}
+
static void grow_pgdat_span(struct pglist_data *pgdat, unsigned long start_pfn,
unsigned long end_pfn)
{
@@ -457,7 +540,7 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
}
-int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
+int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
{
unsigned long onlined_pages = 0;
struct zone *zone;
@@ -467,6 +550,29 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
struct memory_notify arg;
lock_memory_hotplug();
+ /*
+ * This doesn't need a lock to do pfn_to_page().
+ * The section can't be removed here because of the
+ * memory_block->state_mutex.
+ */
+ zone = page_zone(pfn_to_page(pfn));
+
+ if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
+ if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
+ unlock_memory_hotplug();
+ return -1;
+ }
+ }
+ if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
+ if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
+ unlock_memory_hotplug();
+ return -1;
+ }
+ }
+
+ /* Previous code may changed the zone of the pfn range */
+ zone = page_zone(pfn_to_page(pfn));
+
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
arg.status_change_nid = -1;
@@ -483,12 +589,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
return ret;
}
/*
- * This doesn't need a lock to do pfn_to_page().
- * The section can't be removed here because of the
- * memory_block->state_mutex.
- */
- zone = page_zone(pfn_to_page(pfn));
- /*
* If this zone is not populated, then it is not in zonelist.
* This means the page allocator ignores this zone.
* So, zonelist must be updated after online.
--
1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 09/23 V2] vmstat: use N_MEMORY instead N_HIGH_MEMORY
2012-08-02 6:01 ` [RFC PATCH 09/23 V2] vmstat: " Lai Jiangshan
@ 2012-08-02 16:09 ` Christoph Lameter
0 siblings, 0 replies; 37+ messages in thread
From: Christoph Lameter @ 2012-08-02 16:09 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Mel Gorman, linux-kernel, Andrew Morton, KAMEZAWA Hiroyuki,
David Rientjes, linux-mm
On Thu, 2 Aug 2012, Lai Jiangshan wrote:
> The code here need to handle with the nodes which have memory, we should
> use N_MEMORY instead.
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 05/23 V2] mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
2012-08-02 6:01 ` [RFC PATCH 05/23 V2] mm,migrate: " Lai Jiangshan
@ 2012-08-02 16:09 ` Christoph Lameter
0 siblings, 0 replies; 37+ messages in thread
From: Christoph Lameter @ 2012-08-02 16:09 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Mel Gorman, linux-kernel, Andrew Morton, Hugh Dickins, Mel Gorman,
Wang Sheng-Hui, linux-mm
On Thu, 2 Aug 2012, Lai Jiangshan wrote:
> The code here need to handle with the nodes which have memory, we should
> use N_MEMORY instead.
Acked-by: Christoph Lameter <cl@linux.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [RFC PATCH 08/23 V2] hugetlb: use N_MEMORY instead N_HIGH_MEMORY
2012-08-02 6:01 ` [RFC PATCH 08/23 V2] hugetlb: " Lai Jiangshan
@ 2012-08-04 14:02 ` Hillf Danton
0 siblings, 0 replies; 37+ messages in thread
From: Hillf Danton @ 2012-08-04 14:02 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Mel Gorman, linux-kernel, Greg Kroah-Hartman, Andrew Morton,
Michal Hocko, KAMEZAWA Hiroyuki, linux-mm
On Thu, Aug 2, 2012 at 2:01 PM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> N_MEMORY stands for the nodes that has any memory.
>
> The code here need to handle with the nodes which have memory, we should
> use N_MEMORY instead.
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> drivers/base/node.c | 2 +-
> mm/hugetlb.c | 24 ++++++++++++------------
> 2 files changed, 13 insertions(+), 13 deletions(-)
>
Better if the patch is split for hugetlb and node respectively.
Acked-by: Hillf Danton <dhillf@gmail.com>
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index af1a177..31f4805 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -227,7 +227,7 @@ static node_registration_func_t __hugetlb_unregister_node;
> static inline bool hugetlb_register_node(struct node *node)
> {
> if (__hugetlb_register_node &&
> - node_state(node->dev.id, N_HIGH_MEMORY)) {
> + node_state(node->dev.id, N_MEMORY)) {
> __hugetlb_register_node(node);
> return true;
> }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e198831..661db47 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1046,7 +1046,7 @@ static void return_unused_surplus_pages(struct hstate *h,
> * on-line nodes with memory and will handle the hstate accounting.
> */
> while (nr_pages--) {
> - if (!free_pool_huge_page(h, &node_states[N_HIGH_MEMORY], 1))
> + if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
> break;
> }
> }
> @@ -1150,14 +1150,14 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
> int __weak alloc_bootmem_huge_page(struct hstate *h)
> {
> struct huge_bootmem_page *m;
> - int nr_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
> + int nr_nodes = nodes_weight(node_states[N_MEMORY]);
>
> while (nr_nodes) {
> void *addr;
>
> addr = __alloc_bootmem_node_nopanic(
> NODE_DATA(hstate_next_node_to_alloc(h,
> - &node_states[N_HIGH_MEMORY])),
> + &node_states[N_MEMORY])),
> huge_page_size(h), huge_page_size(h), 0);
>
> if (addr) {
> @@ -1229,7 +1229,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
> if (!alloc_bootmem_huge_page(h))
> break;
> } else if (!alloc_fresh_huge_page(h,
> - &node_states[N_HIGH_MEMORY]))
> + &node_states[N_MEMORY]))
> break;
> }
> h->max_huge_pages = i;
> @@ -1497,7 +1497,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
> if (!(obey_mempolicy &&
> init_nodemask_of_mempolicy(nodes_allowed))) {
> NODEMASK_FREE(nodes_allowed);
> - nodes_allowed = &node_states[N_HIGH_MEMORY];
> + nodes_allowed = &node_states[N_MEMORY];
> }
> } else if (nodes_allowed) {
> /*
> @@ -1507,11 +1507,11 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
> count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
> init_nodemask_of_node(nodes_allowed, nid);
> } else
> - nodes_allowed = &node_states[N_HIGH_MEMORY];
> + nodes_allowed = &node_states[N_MEMORY];
>
> h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
>
> - if (nodes_allowed != &node_states[N_HIGH_MEMORY])
> + if (nodes_allowed != &node_states[N_MEMORY])
> NODEMASK_FREE(nodes_allowed);
>
> return len;
> @@ -1812,7 +1812,7 @@ static void hugetlb_register_all_nodes(void)
> {
> int nid;
>
> - for_each_node_state(nid, N_HIGH_MEMORY) {
> + for_each_node_state(nid, N_MEMORY) {
> struct node *node = &node_devices[nid];
> if (node->dev.id == nid)
> hugetlb_register_node(node);
> @@ -1906,8 +1906,8 @@ void __init hugetlb_add_hstate(unsigned order)
> h->free_huge_pages = 0;
> for (i = 0; i < MAX_NUMNODES; ++i)
> INIT_LIST_HEAD(&h->hugepage_freelists[i]);
> - h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
> - h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
> + h->next_nid_to_alloc = first_node(node_states[N_MEMORY]);
> + h->next_nid_to_free = first_node(node_states[N_MEMORY]);
> snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
> huge_page_size(h)/1024);
>
> @@ -1995,11 +1995,11 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
> if (!(obey_mempolicy &&
> init_nodemask_of_mempolicy(nodes_allowed))) {
> NODEMASK_FREE(nodes_allowed);
> - nodes_allowed = &node_states[N_HIGH_MEMORY];
> + nodes_allowed = &node_states[N_MEMORY];
> }
> h->max_huge_pages = set_max_huge_pages(h, tmp, nodes_allowed);
>
> - if (nodes_allowed != &node_states[N_HIGH_MEMORY])
> + if (nodes_allowed != &node_states[N_MEMORY])
> NODEMASK_FREE(nodes_allowed);
> }
> out:
> --
> 1.7.1
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 01/25] page_alloc.c: don't subtract unrelated memmap from zone's present pages
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
@ 2012-08-06 9:22 ` Lai Jiangshan
2012-08-06 9:22 ` [RFC V3 PATCH 02/25] memory_hotplug: fix missing nodemask management Lai Jiangshan
` (15 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:22 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, Michal Hocko, KAMEZAWA Hiroyuki,
Minchan Kim, linux-mm
A)======
Currently, memory-page-map(struct page array) is not defined in struct zone.
It is defined in several ways:
FLATMEM: global memmap, can be allocated from any zone <= ZONE_NORMAL
CONFIG_DISCONTIGMEM: node-specific memmap, can be allocated from any
zone <= ZONE_NORMAL within that node.
CONFIG_SPARSEMEM: memorysection-specific memmap, can be allocated from any zone,
when CONFIG_SPARSEMEM_VMEMMAP, it is even not physical continuous.
So, the memmap has nothing directly related with the zone. And it's memory can be
allocated outside, so it is wrong to subtract memmap's size from zone's
present pages.
B)======
When system has large holes, the subtracted-present-pages-size will become
very small or negative, make the memory management works bad at the zone or
make the zone unusable even the real-present-pages-size is actually large.
C)======
And subtracted-present-pages-size has problem when memory-hot-removing,
the zone->zone->present_pages may overflow and become huge(unsigned long).
D)======
memory-page-map is large and long living unreclaimable memory, it is good to
subtract them for proper watermark.
So a new proper approach is needed to do it similarly
and new approach should also handle other long living unreclaimable memory.
Current blindly subtracted-present-pages-size approach does wrong, remove it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/page_alloc.c | 20 +-------------------
1 files changed, 1 insertions(+), 19 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4a4f921..9312702 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4357,30 +4357,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ unsigned long size, realsize;
size = zone_spanned_pages_in_node(nid, j, zones_size);
realsize = size - zone_absent_pages_in_node(nid, j,
zholes_size);
- /*
- * Adjust realsize so that it accounts for how much memory
- * is used by this zone for memmap. This affects the watermark
- * and per-cpu initialisations
- */
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
- if (memmap_pages)
- printk(KERN_DEBUG
- " %s zone: %lu pages used for memmap\n",
- zone_names[j], memmap_pages);
- } else
- printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
-
/* Account for reserved pages */
if (j == 0 && realsize > dma_reserve) {
realsize -= dma_reserve;
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 02/25] memory_hotplug: fix missing nodemask management
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
2012-08-06 9:22 ` [RFC V3 PATCH 01/25] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
@ 2012-08-06 9:22 ` Lai Jiangshan
2012-08-06 9:22 ` [RFC V3 PATCH 03/25] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
` (14 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:22 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Rob Landley, Kay Sievers, Greg Kroah-Hartman,
Andrew Morton, Paul Gortmaker, Bjorn Helgaas, David Rientjes,
linux-doc, linux-mm
Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgot to manage node_states[N_NORMAL_MEMORY]. fix it.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
Documentation/memory-hotplug.txt | 5 ++-
include/linux/memory.h | 1 +
mm/memory_hotplug.c | 94 +++++++++++++++++++++++++++++++------
3 files changed, 83 insertions(+), 17 deletions(-)
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..6e6cbc7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+ int status_change_nid_normal;
int status_change_nid;
}
start_pfn is start_pfn of online/offline memory.
nr_pages is # of pages of online/offline memory.
+status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
-If status_changed_nid >= 0, callback should create/discard structures for the
+If status_changed_nid* >= 0, callback should create/discard structures for the
node if necessary.
--------------
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 1ac7f6e..6b9202b 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+ int status_change_nid_normal;
int status_change_nid;
};
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 427bb29..3438c4a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -456,6 +456,34 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
return 0;
}
+static void check_nodemasks_changes_online(unsigned long nr_pages,
+ struct zone *zone, struct memory_notify *arg)
+{
+ int nid = zone_to_nid(zone);
+ enum zone_type zone_last = ZONE_NORMAL;
+
+ if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
+ arg->status_change_nid_normal = nid;
+ else
+ arg->status_change_nid_normal = -1;
+
+ if (!node_state(nid, N_HIGH_MEMORY))
+ arg->status_change_nid = nid;
+ else
+ arg->status_change_nid = -1;
+}
+
+static void set_nodemasks(int node, struct memory_notify *arg)
+{
+ if (arg->status_change_nid_normal >= 0)
+ node_set_state(node, N_NORMAL_MEMORY);
+
+ node_set_state(node, N_HIGH_MEMORY);
+}
+
int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
{
@@ -467,13 +495,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
struct memory_notify arg;
lock_memory_hotplug();
+ /*
+ * This doesn't need a lock to do pfn_to_page().
+ * The section can't be removed here because of the
+ * memory_block->state_mutex.
+ */
+ zone = page_zone(pfn_to_page(pfn));
+
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
- arg.status_change_nid = -1;
+ check_nodemasks_changes_online(nr_pages, zone, &arg);
nid = page_to_nid(pfn_to_page(pfn));
- if (node_present_pages(nid) == 0)
- arg.status_change_nid = nid;
ret = memory_notify(MEM_GOING_ONLINE, &arg);
ret = notifier_to_errno(ret);
@@ -483,12 +516,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
return ret;
}
/*
- * This doesn't need a lock to do pfn_to_page().
- * The section can't be removed here because of the
- * memory_block->state_mutex.
- */
- zone = page_zone(pfn_to_page(pfn));
- /*
* If this zone is not populated, then it is not in zonelist.
* This means the page allocator ignores this zone.
* So, zonelist must be updated after online.
@@ -523,7 +550,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
if (onlined_pages) {
kswapd_run(zone_to_nid(zone));
- node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
+ set_nodemasks(zone_to_nid(zone), &arg);
}
vm_total_pages = nr_free_pagecache_pages();
@@ -865,6 +892,44 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
return offlined;
}
+static void check_nodemasks_changes_offline(unsigned long nr_pages,
+ struct zone *zone, struct memory_notify *arg)
+{
+ struct pglist_data *pgdat = zone->zone_pgdat;
+ unsigned long present_pages = 0;
+ enum zone_type zt, zone_last = ZONE_NORMAL;
+
+ if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ for (zt = 0; zt <= zone_last; zt++)
+ present_pages += pgdat->node_zones[zt].present_pages;
+ if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+ arg->status_change_nid_normal = zone_to_nid(zone);
+ else
+ arg->status_change_nid_normal = -1;
+
+ zone_last = ZONE_MOVABLE;
+ for (; zt <= zone_last; zt++)
+ present_pages += pgdat->node_zones[zt].present_pages;
+ if (nr_pages >= present_pages)
+ arg->status_change_nid = zone_to_nid(zone);
+ else
+ arg->status_change_nid = -1;
+}
+
+static void clear_nodemasks(int node, struct memory_notify *arg)
+{
+ if (arg->status_change_nid_normal >= 0)
+ node_clear_state(node, N_NORMAL_MEMORY);
+
+ if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ return;
+
+ if (arg->status_change_nid >= 0)
+ node_clear_state(node, N_HIGH_MEMORY);
+}
+
static int __ref offline_pages(unsigned long start_pfn,
unsigned long end_pfn, unsigned long timeout)
{
@@ -898,9 +963,7 @@ static int __ref offline_pages(unsigned long start_pfn,
arg.start_pfn = start_pfn;
arg.nr_pages = nr_pages;
- arg.status_change_nid = -1;
- if (nr_pages >= node_present_pages(node))
- arg.status_change_nid = node;
+ check_nodemasks_changes_offline(nr_pages, zone, &arg);
ret = memory_notify(MEM_GOING_OFFLINE, &arg);
ret = notifier_to_errno(ret);
@@ -965,10 +1028,9 @@ repeat:
init_per_zone_wmark_min();
- if (!node_present_pages(node)) {
- node_clear_state(node, N_HIGH_MEMORY);
+ clear_nodemasks(node, &arg);
+ if (arg.status_change_nid >= 0)
kswapd_stop(node);
- }
vm_total_pages = nr_free_pagecache_pages();
writeback_set_ratelimit();
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 03/25] slub, hotplug: ignore unrelated node's hot-adding and hot-removing
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
2012-08-06 9:22 ` [RFC V3 PATCH 01/25] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
2012-08-06 9:22 ` [RFC V3 PATCH 02/25] memory_hotplug: fix missing nodemask management Lai Jiangshan
@ 2012-08-06 9:22 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 08/25] memcontrol: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
` (13 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:22 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Christoph Lameter, Pekka Enberg, Matt Mackall,
linux-mm
SLUB only fucus on the nodes which has normal memory, so ignore the other
node's hot-adding and hot-removing.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/slub.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 8c691fa..f8b137a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3568,7 +3568,7 @@ static void slab_mem_offline_callback(void *arg)
struct memory_notify *marg = arg;
int offline_node;
- offline_node = marg->status_change_nid;
+ offline_node = marg->status_change_nid_normal;
/*
* If the node still has available memory. we need kmem_cache_node
@@ -3601,7 +3601,7 @@ static int slab_mem_going_online_callback(void *arg)
struct kmem_cache_node *n;
struct kmem_cache *s;
struct memory_notify *marg = arg;
- int nid = marg->status_change_nid;
+ int nid = marg->status_change_nid_normal;
int ret = 0;
/*
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 08/25] memcontrol: use N_MEMORY instead N_HIGH_MEMORY
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (2 preceding siblings ...)
2012-08-06 9:22 ` [RFC V3 PATCH 03/25] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 09/25] oom: " Lai Jiangshan
` (12 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Johannes Weiner, Michal Hocko, Balbir Singh,
KAMEZAWA Hiroyuki, Tejun Heo, Li Zefan, cgroups, linux-mm,
containers
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/memcontrol.c | 18 +++++++++---------
mm/page_cgroup.c | 2 +-
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f72b5e5..4402c2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -797,7 +797,7 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
int nid;
u64 total = 0;
- for_each_node_state(nid, N_HIGH_MEMORY)
+ for_each_node_state(nid, N_MEMORY)
total += mem_cgroup_node_nr_lru_pages(memcg, nid, lru_mask);
return total;
}
@@ -1549,9 +1549,9 @@ static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
return;
/* make a nodemask where this memcg uses memory from */
- memcg->scan_nodes = node_states[N_HIGH_MEMORY];
+ memcg->scan_nodes = node_states[N_MEMORY];
- for_each_node_mask(nid, node_states[N_HIGH_MEMORY]) {
+ for_each_node_mask(nid, node_states[N_MEMORY]) {
if (!test_mem_cgroup_node_reclaimable(memcg, nid, false))
node_clear(nid, memcg->scan_nodes);
@@ -1622,7 +1622,7 @@ static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
/*
* Check rest of nodes.
*/
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
if (node_isset(nid, memcg->scan_nodes))
continue;
if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
@@ -3700,7 +3700,7 @@ move_account:
drain_all_stock_sync(memcg);
ret = 0;
mem_cgroup_start_move(memcg);
- for_each_node_state(node, N_HIGH_MEMORY) {
+ for_each_node_state(node, N_MEMORY) {
for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
enum lru_list lru;
for_each_lru(lru) {
@@ -4025,7 +4025,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
seq_printf(m, "total=%lu", total_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL);
seq_printf(m, " N%d=%lu", nid, node_nr);
}
@@ -4033,7 +4033,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
file_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_FILE);
seq_printf(m, "file=%lu", file_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
LRU_ALL_FILE);
seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4042,7 +4042,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
anon_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_ANON);
seq_printf(m, "anon=%lu", anon_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
LRU_ALL_ANON);
seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4051,7 +4051,7 @@ static int mem_control_numa_stat_show(struct cgroup *cont, struct cftype *cft,
unevictable_nr = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
seq_printf(m, "unevictable=%lu", unevictable_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
BIT(LRU_UNEVICTABLE));
seq_printf(m, " N%d=%lu", nid, node_nr);
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index eb750f8..e775239 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -271,7 +271,7 @@ void __init page_cgroup_init(void)
if (mem_cgroup_disabled())
return;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long start_pfn, end_pfn;
start_pfn = node_start_pfn(nid);
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 09/25] oom: use N_MEMORY instead N_HIGH_MEMORY
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (3 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 08/25] memcontrol: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 10/25] mm,migrate: " Lai Jiangshan
` (11 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, David Rientjes, KAMEZAWA Hiroyuki,
Michal Hocko, KOSAKI Motohiro, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
mm/oom_kill.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index ac300c9..1e58f12 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -257,7 +257,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
* the page allocator means a mempolicy is in effect. Cpuset policy
* is enforced in get_page_from_freelist().
*/
- if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask)) {
+ if (nodemask && !nodes_subset(node_states[N_MEMORY], *nodemask)) {
*totalpages = total_swap_pages;
for_each_node_mask(nid, *nodemask)
*totalpages += node_spanned_pages(nid);
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 10/25] mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (4 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 09/25] oom: " Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 11/25] mempolicy: " Lai Jiangshan
` (10 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, Hugh Dickins, Mel Gorman,
Christoph Lameter, Wang Sheng-Hui, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
mm/migrate.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..dbe4f86 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1226,7 +1226,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
if (node < 0 || node >= MAX_NUMNODES)
goto out_pm;
- if (!node_state(node, N_HIGH_MEMORY))
+ if (!node_state(node, N_MEMORY))
goto out_pm;
err = -EACCES;
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 11/25] mempolicy: use N_MEMORY instead N_HIGH_MEMORY
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (5 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 10/25] mm,migrate: " Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 12/25] hugetlb: " Lai Jiangshan
` (9 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, Mel Gorman, David Rientjes,
Rik van Riel, KOSAKI Motohiro, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/mempolicy.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..ad0381d 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -212,9 +212,9 @@ static int mpol_set_nodemask(struct mempolicy *pol,
/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
if (pol == NULL)
return 0;
- /* Check N_HIGH_MEMORY */
+ /* Check N_MEMORY */
nodes_and(nsc->mask1,
- cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
+ cpuset_current_mems_allowed, node_states[N_MEMORY]);
VM_BUG_ON(!nodes);
if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes))
@@ -1363,7 +1363,7 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
goto out_put;
}
- if (!nodes_subset(*new, node_states[N_HIGH_MEMORY])) {
+ if (!nodes_subset(*new, node_states[N_MEMORY])) {
err = -EINVAL;
goto out_put;
}
@@ -2314,7 +2314,7 @@ void __init numa_policy_init(void)
* fall back to the largest node if they're all smaller.
*/
nodes_clear(interleave_nodes);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long total_pages = node_present_pages(nid);
/* Preserve the largest node */
@@ -2395,7 +2395,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
*nodelist++ = '\0';
if (nodelist_parse(nodelist, nodes))
goto out;
- if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY]))
+ if (!nodes_subset(nodes, node_states[N_MEMORY]))
goto out;
} else
nodes_clear(nodes);
@@ -2429,7 +2429,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
* Default to online nodes with memory if no nodelist
*/
if (!nodelist)
- nodes = node_states[N_HIGH_MEMORY];
+ nodes = node_states[N_MEMORY];
break;
case MPOL_LOCAL:
/*
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 12/25] hugetlb: use N_MEMORY instead N_HIGH_MEMORY
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (6 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 11/25] mempolicy: " Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 13/25] vmstat: " Lai Jiangshan
` (8 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Greg Kroah-Hartman, Andrew Morton, Hillf Danton,
Michal Hocko, KAMEZAWA Hiroyuki, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
drivers/base/node.c | 2 +-
mm/hugetlb.c | 24 ++++++++++++------------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5d7731e..4c3aa7c 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -227,7 +227,7 @@ static node_registration_func_t __hugetlb_unregister_node;
static inline bool hugetlb_register_node(struct node *node)
{
if (__hugetlb_register_node &&
- node_state(node->dev.id, N_HIGH_MEMORY)) {
+ node_state(node->dev.id, N_MEMORY)) {
__hugetlb_register_node(node);
return true;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e198831..661db47 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1046,7 +1046,7 @@ static void return_unused_surplus_pages(struct hstate *h,
* on-line nodes with memory and will handle the hstate accounting.
*/
while (nr_pages--) {
- if (!free_pool_huge_page(h, &node_states[N_HIGH_MEMORY], 1))
+ if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
break;
}
}
@@ -1150,14 +1150,14 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
int __weak alloc_bootmem_huge_page(struct hstate *h)
{
struct huge_bootmem_page *m;
- int nr_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+ int nr_nodes = nodes_weight(node_states[N_MEMORY]);
while (nr_nodes) {
void *addr;
addr = __alloc_bootmem_node_nopanic(
NODE_DATA(hstate_next_node_to_alloc(h,
- &node_states[N_HIGH_MEMORY])),
+ &node_states[N_MEMORY])),
huge_page_size(h), huge_page_size(h), 0);
if (addr) {
@@ -1229,7 +1229,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
if (!alloc_bootmem_huge_page(h))
break;
} else if (!alloc_fresh_huge_page(h,
- &node_states[N_HIGH_MEMORY]))
+ &node_states[N_MEMORY]))
break;
}
h->max_huge_pages = i;
@@ -1497,7 +1497,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
if (!(obey_mempolicy &&
init_nodemask_of_mempolicy(nodes_allowed))) {
NODEMASK_FREE(nodes_allowed);
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
}
} else if (nodes_allowed) {
/*
@@ -1507,11 +1507,11 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
init_nodemask_of_node(nodes_allowed, nid);
} else
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
- if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+ if (nodes_allowed != &node_states[N_MEMORY])
NODEMASK_FREE(nodes_allowed);
return len;
@@ -1812,7 +1812,7 @@ static void hugetlb_register_all_nodes(void)
{
int nid;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
struct node *node = &node_devices[nid];
if (node->dev.id == nid)
hugetlb_register_node(node);
@@ -1906,8 +1906,8 @@ void __init hugetlb_add_hstate(unsigned order)
h->free_huge_pages = 0;
for (i = 0; i < MAX_NUMNODES; ++i)
INIT_LIST_HEAD(&h->hugepage_freelists[i]);
- h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
- h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
+ h->next_nid_to_alloc = first_node(node_states[N_MEMORY]);
+ h->next_nid_to_free = first_node(node_states[N_MEMORY]);
snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
huge_page_size(h)/1024);
@@ -1995,11 +1995,11 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
if (!(obey_mempolicy &&
init_nodemask_of_mempolicy(nodes_allowed))) {
NODEMASK_FREE(nodes_allowed);
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
}
h->max_huge_pages = set_max_huge_pages(h, tmp, nodes_allowed);
- if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+ if (nodes_allowed != &node_states[N_MEMORY])
NODEMASK_FREE(nodes_allowed);
}
out:
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 13/25] vmstat: use N_MEMORY instead N_HIGH_MEMORY
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (7 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 12/25] hugetlb: " Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 16/25] vmscan: " Lai Jiangshan
` (7 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, Christoph Lameter,
KAMEZAWA Hiroyuki, David Rientjes, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
---
mm/vmstat.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1bbbbd9..aa3da12 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -917,7 +917,7 @@ static int pagetypeinfo_show(struct seq_file *m, void *arg)
pg_data_t *pgdat = (pg_data_t *)arg;
/* check memoryless node */
- if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ if (!node_state(pgdat->node_id, N_MEMORY))
return 0;
seq_printf(m, "Page block order: %d\n", pageblock_order);
@@ -1279,7 +1279,7 @@ static int unusable_show(struct seq_file *m, void *arg)
pg_data_t *pgdat = (pg_data_t *)arg;
/* check memoryless node */
- if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ if (!node_state(pgdat->node_id, N_MEMORY))
return 0;
walk_zones_in_node(m, pgdat, unusable_show_print);
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 16/25] vmscan: use N_MEMORY instead N_HIGH_MEMORY
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (8 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 13/25] vmstat: " Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 17/25] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
` (6 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Andrew Morton, KAMEZAWA Hiroyuki, Hugh Dickins,
Minchan Kim, linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
---
mm/vmscan.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 66e4310..1888026 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2921,7 +2921,7 @@ static int __devinit cpu_callback(struct notifier_block *nfb,
int nid;
if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
pg_data_t *pgdat = NODE_DATA(nid);
const struct cpumask *mask;
@@ -2976,7 +2976,7 @@ static int __init kswapd_init(void)
int nid;
swap_setup();
- for_each_node_state(nid, N_HIGH_MEMORY)
+ for_each_node_state(nid, N_MEMORY)
kswapd_run(nid);
hotcpu_notifier(cpu_callback, 0);
return 0;
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 17/25] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (9 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 16/25] vmscan: " Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 18/25] hotplug: update nodemasks management Lai Jiangshan
` (5 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Tejun Heo, Pekka Enberg, Yinghai Lu, David Rientjes,
Andrew Morton, Michal Hocko, KAMEZAWA Hiroyuki, Minchan Kim,
linux-mm
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Since we introduced N_MEMORY, we update the initialization of node_states.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
arch/x86/mm/init_64.c | 4 +++-
mm/page_alloc.c | 40 ++++++++++++++++++++++------------------
2 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 2b6b4a3..005f00c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -625,7 +625,9 @@ void __init paging_init(void)
* numa support is not compiled in, and later node_set_state
* will not set it back.
*/
- node_clear_state(0, N_NORMAL_MEMORY);
+ node_clear_state(0, N_MEMORY);
+ if (N_MEMORY != N_NORMAL_MEMORY)
+ node_clear_state(0, N_NORMAL_MEMORY);
zone_sizes_init();
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9312702..edffc35 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1646,7 +1646,7 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
*
* If the zonelist cache is present in the passed in zonelist, then
* returns a pointer to the allowed node mask (either the current
- * tasks mems_allowed, or node_states[N_HIGH_MEMORY].)
+ * tasks mems_allowed, or node_states[N_MEMORY].)
*
* If the zonelist cache is not available for this zonelist, does
* nothing and returns NULL.
@@ -1675,7 +1675,7 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags)
allowednodes = !in_interrupt() && (alloc_flags & ALLOC_CPUSET) ?
&cpuset_current_mems_allowed :
- &node_states[N_HIGH_MEMORY];
+ &node_states[N_MEMORY];
return allowednodes;
}
@@ -3070,7 +3070,7 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
return node;
}
- for_each_node_state(n, N_HIGH_MEMORY) {
+ for_each_node_state(n, N_MEMORY) {
/* Don't want a node to appear more than once */
if (node_isset(n, *used_node_mask))
@@ -3212,7 +3212,7 @@ static int default_zonelist_order(void)
* local memory, NODE_ORDER may be suitable.
*/
average_size = total_size /
- (nodes_weight(node_states[N_HIGH_MEMORY]) + 1);
+ (nodes_weight(node_states[N_MEMORY]) + 1);
for_each_online_node(nid) {
low_kmem_size = 0;
total_size = 0;
@@ -4569,7 +4569,7 @@ unsigned long __init find_min_pfn_with_active_regions(void)
/*
* early_calculate_totalpages()
* Sum pages in active regions for movable zone.
- * Populate N_HIGH_MEMORY for calculating usable_nodes.
+ * Populate N_MEMORY for calculating usable_nodes.
*/
static unsigned long __init early_calculate_totalpages(void)
{
@@ -4582,7 +4582,7 @@ static unsigned long __init early_calculate_totalpages(void)
totalpages += pages;
if (pages)
- node_set_state(nid, N_HIGH_MEMORY);
+ node_set_state(nid, N_MEMORY);
}
return totalpages;
}
@@ -4599,9 +4599,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
unsigned long usable_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
- nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
+ nodemask_t saved_node_state = node_states[N_MEMORY];
unsigned long totalpages = early_calculate_totalpages();
- int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+ int usable_nodes = nodes_weight(node_states[N_MEMORY]);
/*
* If movablecore was specified, calculate what size of
@@ -4636,7 +4636,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
kernelcore_node = required_kernelcore / usable_nodes;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long start_pfn, end_pfn;
/*
@@ -4728,23 +4728,27 @@ restart:
out:
/* restore the node_state */
- node_states[N_HIGH_MEMORY] = saved_node_state;
+ node_states[N_MEMORY] = saved_node_state;
}
-/* Any regular memory on that node ? */
-static void check_for_regular_memory(pg_data_t *pgdat)
+/* Any regular or high memory on that node ? */
+static void check_for_memory(pg_data_t *pgdat, int nid)
{
-#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;
- for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
+ if (N_MEMORY == N_NORMAL_MEMORY)
+ return;
+
+ for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) {
struct zone *zone = &pgdat->node_zones[zone_type];
if (zone->present_pages) {
- node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
+ node_set_state(nid, N_HIGH_MEMORY);
+ if (N_NORMAL_MEMORY != N_HIGH_MEMORY &&
+ zone_type <= ZONE_NORMAL)
+ node_set_state(nid, N_NORMAL_MEMORY);
break;
}
}
-#endif
}
/**
@@ -4827,8 +4831,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
/* Any memory on that node */
if (pgdat->node_present_pages)
- node_set_state(nid, N_HIGH_MEMORY);
- check_for_regular_memory(pgdat);
+ node_set_state(nid, N_MEMORY);
+ check_for_memory(pgdat, nid);
}
}
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 18/25] hotplug: update nodemasks management
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (10 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 17/25] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 19/25] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
` (4 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Rob Landley, Kay Sievers, Greg Kroah-Hartman,
Andrew Morton, Paul Gortmaker, Bjorn Helgaas, David Rientjes,
linux-doc, linux-mm
update nodemasks management for N_MEMORY
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
Documentation/memory-hotplug.txt | 5 +++-
include/linux/memory.h | 1 +
mm/memory_hotplug.c | 49 +++++++++++++++++++++++++++++++++----
3 files changed, 48 insertions(+), 7 deletions(-)
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6e6cbc7..70bc1c7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -378,6 +378,7 @@ struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
+ int status_change_nid_high;
int status_change_nid;
}
@@ -385,7 +386,9 @@ start_pfn is start_pfn of online/offline memory.
nr_pages is # of pages of online/offline memory.
status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
-status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
+status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
+status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid* >= 0, callback should create/discard structures for the
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 6b9202b..8089e49 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -54,6 +54,7 @@ struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
+ int status_change_nid_high;
int status_change_nid;
};
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3438c4a..c2c96a4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -462,7 +462,7 @@ static void check_nodemasks_changes_online(unsigned long nr_pages,
int nid = zone_to_nid(zone);
enum zone_type zone_last = ZONE_NORMAL;
- if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ if (N_MEMORY == N_NORMAL_MEMORY)
zone_last = ZONE_MOVABLE;
if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
@@ -470,7 +470,20 @@ static void check_nodemasks_changes_online(unsigned long nr_pages,
else
arg->status_change_nid_normal = -1;
- if (!node_state(nid, N_HIGH_MEMORY))
+#ifdef CONFIG_HIGHMEM
+ zone_last = ZONE_HIGH;
+ if (N_MEMORY == N_HIGH_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ if (zone_idx(zone) <= zone_last && !node_state(nid, N_HIGH_MEMORY))
+ arg->status_change_nid_high = nid;
+ else
+ arg->status_change_nid_high = -1;
+#else
+ arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
+ if (!node_state(nid, N_MEMORY))
arg->status_change_nid = nid;
else
arg->status_change_nid = -1;
@@ -481,7 +494,10 @@ static void set_nodemasks(int node, struct memory_notify *arg)
if (arg->status_change_nid_normal >= 0)
node_set_state(node, N_NORMAL_MEMORY);
- node_set_state(node, N_HIGH_MEMORY);
+ if (arg->status_change_nid_high >= 0)
+ node_set_state(node, N_HIGH_MEMORY);
+
+ node_set_state(node, N_MEMORY);
}
@@ -899,7 +915,7 @@ static void check_nodemasks_changes_offline(unsigned long nr_pages,
unsigned long present_pages = 0;
enum zone_type zt, zone_last = ZONE_NORMAL;
- if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ if (N_MEMORY == N_NORMAL_MEMORY)
zone_last = ZONE_MOVABLE;
for (zt = 0; zt <= zone_last; zt++)
@@ -909,6 +925,21 @@ static void check_nodemasks_changes_offline(unsigned long nr_pages,
else
arg->status_change_nid_normal = -1;
+#ifdef CONIG_HIGHMEM
+ zone_last = ZONE_HIGH;
+ if (N_MEMORY == N_HIGH_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ for (; zt <= zone_last; zt++)
+ present_pages += pgdat->node_zones[zt].present_pages;
+ if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+ arg->status_change_nid_high = zone_to_nid(zone);
+ else
+ arg->status_change_nid_high = -1;
+#else
+ arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
zone_last = ZONE_MOVABLE;
for (; zt <= zone_last; zt++)
present_pages += pgdat->node_zones[zt].present_pages;
@@ -923,11 +954,17 @@ static void clear_nodemasks(int node, struct memory_notify *arg)
if (arg->status_change_nid_normal >= 0)
node_clear_state(node, N_NORMAL_MEMORY);
- if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ if (N_MEMORY == N_NORMAL_MEMORY)
return;
- if (arg->status_change_nid >= 0)
+ if (arg->status_change_nid_high >= 0)
node_clear_state(node, N_HIGH_MEMORY);
+
+ if (N_MEMORY == N_HIGH_MEMORY)
+ return;
+
+ if (arg->status_change_nid >= 0)
+ node_clear_state(node, N_MEMORY);
}
static int __ref offline_pages(unsigned long start_pfn,
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 19/25] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (11 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 18/25] hotplug: update nodemasks management Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 20/25] page_alloc: add kernelcore_max_addr Lai Jiangshan
` (3 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Greg Kroah-Hartman, Christoph Lameter,
Hillf Danton, Andrew Morton, Jan Beulich, Seth Jennings,
Dan Magenheimer, Michal Hocko, KAMEZAWA Hiroyuki, Minchan Kim,
linux-mm
All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
drivers/base/node.c | 6 ++++++
include/linux/nodemask.h | 4 ++++
mm/Kconfig | 8 ++++++++
mm/page_alloc.c | 3 +++
4 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 4c3aa7c..653b5e2 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -620,6 +620,9 @@ static struct node_attr node_state_attr[] = {
#ifdef CONFIG_HIGHMEM
[N_HIGH_MEMORY] = _NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
#endif
+#ifdef CONFIG_MOVABLE_NODE
+ [N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
+#endif
[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
};
@@ -630,6 +633,9 @@ static struct attribute *node_state_attrs[] = {
#ifdef CONFIG_HIGHMEM
&node_state_attr[N_HIGH_MEMORY].attr.attr,
#endif
+#ifdef CONFIG_MOVABLE_NODE
+ &node_state_attr[N_MEMORY].attr.attr,
+#endif
&node_state_attr[N_CPU].attr.attr,
NULL
};
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index c6ebdc9..4e2cbfa 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -380,7 +380,11 @@ enum node_states {
#else
N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif
+#ifdef CONFIG_MOVABLE_NODE
+ N_MEMORY, /* The node has memory(regular, high, movable) */
+#else
N_MEMORY = N_HIGH_MEMORY,
+#endif
N_CPU, /* The node has one or more cpus */
NR_NODE_STATES
};
diff --git a/mm/Kconfig b/mm/Kconfig
index 82fed4e..4371c65 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -140,6 +140,14 @@ config ARCH_DISCARD_MEMBLOCK
config NO_BOOTMEM
boolean
+config MOVABLE_NODE
+ boolean "Enable to assign a node has only movable memory"
+ depends on HAVE_MEMBLOCK
+ depends on NO_BOOTMEM
+ depends on X86_64
+ depends on NUMA
+ default y
+
# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index edffc35..03ad63d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -91,6 +91,9 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
#ifdef CONFIG_HIGHMEM
[N_HIGH_MEMORY] = { { [0] = 1UL } },
#endif
+#ifdef CONFIG_MOVABLE_NODE
+ [N_MEMORY] = { { [0] = 1UL } },
+#endif
[N_CPU] = { { [0] = 1UL } },
#endif /* NUMA */
};
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 20/25] page_alloc: add kernelcore_max_addr
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (12 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 19/25] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 23/25] memblock: limit memory address from memblock Lai Jiangshan
` (2 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Rob Landley, Andrew Morton, Michal Hocko,
KAMEZAWA Hiroyuki, Minchan Kim, linux-doc, linux-mm
Current ZONE_MOVABLE (kernelcore=) setting policy with boot option doesn't meet
our requirement. We need something like kernelcore_max_addr=XX boot option
to limit the kernelcore upper address.
The memory with higher address will be migratable(movable) and they
are easier to be offline(always ready to be offline when the system don't require
so much memory).
It makes things easy when we dynamic hot-add/remove memory, make better
utilities of memories, and helps for THP.
All kernelcore_max_addr=, kernelcore= and movablecore= can be safely specified
at the same time(or any 2 of them).
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
Documentation/kernel-parameters.txt | 9 +++++++++
mm/page_alloc.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 37 insertions(+), 1 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 12783fa..48dff61 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1216,6 +1216,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
use the HighMem zone if it exists, and the Normal
zone if it does not.
+ kernelcore_max_addr=nn[KMG] [KNL,X86,IA-64,PPC] This parameter
+ is the same effect as kernelcore parameter, except it
+ specifies the up physical address of memory range
+ usable by the kernel for non-movable allocations.
+ If both kernelcore and kernelcore_max_addr are
+ specified, this requested's priority is higher than
+ kernelcore's.
+ See the kernelcore parameter.
+
kgdbdbgp= [KGDB,HW] kgdb over EHCI usb debug port.
Format: <Controller#>[,poll interval]
The controller # is the number of the ehci usb debug
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 03ad63d..65ac5c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -204,6 +204,7 @@ static unsigned long __meminitdata dma_reserve;
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
+static unsigned long __initdata required_kernelcore_max_pfn;
static unsigned long __initdata required_kernelcore;
static unsigned long __initdata required_movablecore;
static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
@@ -4600,6 +4601,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
{
int i, nid;
unsigned long usable_startpfn;
+ unsigned long kernelcore_max_pfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
nodemask_t saved_node_state = node_states[N_MEMORY];
@@ -4628,6 +4630,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
required_kernelcore = max(required_kernelcore, corepages);
}
+ if (required_kernelcore_max_pfn && !required_kernelcore)
+ required_kernelcore = totalpages;
+
/* If kernelcore was not specified, there is no ZONE_MOVABLE */
if (!required_kernelcore)
goto out;
@@ -4636,6 +4641,12 @@ static void __init find_zone_movable_pfns_for_nodes(void)
find_usable_zone_for_movable();
usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone];
+ if (required_kernelcore_max_pfn)
+ kernelcore_max_pfn = required_kernelcore_max_pfn;
+ else
+ kernelcore_max_pfn = ULONG_MAX >> PAGE_SHIFT;
+ kernelcore_max_pfn = max(kernelcore_max_pfn, usable_startpfn);
+
restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
kernelcore_node = required_kernelcore / usable_nodes;
@@ -4662,8 +4673,12 @@ restart:
unsigned long size_pages;
start_pfn = max(start_pfn, zone_movable_pfn[nid]);
- if (start_pfn >= end_pfn)
+ end_pfn = min(kernelcore_max_pfn, end_pfn);
+ if (start_pfn >= end_pfn) {
+ if (!zone_movable_pfn[nid])
+ zone_movable_pfn[nid] = start_pfn;
continue;
+ }
/* Account for what is only usable for kernelcore */
if (start_pfn < usable_startpfn) {
@@ -4854,6 +4869,18 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
return 0;
}
+#ifdef CONFIG_MOVABLE_NODE
+/*
+ * kernelcore_max_addr=addr sets the up physical address of memory range
+ * for use for allocations that cannot be reclaimed or migrated.
+ */
+static int __init cmdline_parse_kernelcore_max_addr(char *p)
+{
+ return cmdline_parse_core(p, &required_kernelcore_max_pfn);
+}
+early_param("kernelcore_max_addr", cmdline_parse_kernelcore_max_addr);
+#endif
+
/*
* kernelcore=size sets the amount of memory for use for allocations that
* cannot be reclaimed or migrated.
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 23/25] memblock: limit memory address from memblock
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (13 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 20/25] page_alloc: add kernelcore_max_addr Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 24/25] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 25/25] mm, memory-hotplug: add online_movable and online_kernel Lai Jiangshan
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Yasuaki Ishimatsu, Lai Jiangshan, Tejun Heo, Andrew Morton,
Yinghai Lu, Sam Ravnborg, Ingo Molnar, Gavin Shan, Michal Hocko,
KAMEZAWA Hiroyuki, Minchan Kim, linux-mm
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Setting kernelcore_max_pfn means all memory which is bigger than
the boot parameter is allocated as ZONE_MOVABLE. So memory which
is allocated by memblock also should be limited by the parameter.
The patch limits memory from memblock.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 5 ++++-
mm/page_alloc.c | 6 +++++-
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 19dc455..f2977ae 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -42,6 +42,7 @@ struct memblock {
extern struct memblock memblock;
extern int memblock_debug;
+extern phys_addr_t memblock_limit;
#define memblock_dbg(fmt, ...) \
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/memblock.c b/mm/memblock.c
index 5cc6731..663b805 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -931,7 +931,10 @@ int __init_memblock memblock_is_region_reserved(phys_addr_t base, phys_addr_t si
void __init_memblock memblock_set_current_limit(phys_addr_t limit)
{
- memblock.current_limit = limit;
+ if (!memblock_limit || (memblock_limit > limit))
+ memblock.current_limit = limit;
+ else
+ memblock.current_limit = memblock_limit;
}
static void __init_memblock memblock_dump(struct memblock_type *type, char *name)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 65ac5c9..c4d3aa0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -209,6 +209,8 @@ static unsigned long __initdata required_kernelcore;
static unsigned long __initdata required_movablecore;
static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
+phys_addr_t memblock_limit;
+
/* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
int movable_zone;
EXPORT_SYMBOL(movable_zone);
@@ -4876,7 +4878,9 @@ static int __init cmdline_parse_core(char *p, unsigned long *core)
*/
static int __init cmdline_parse_kernelcore_max_addr(char *p)
{
- return cmdline_parse_core(p, &required_kernelcore_max_pfn);
+ cmdline_parse_core(p, &required_kernelcore_max_pfn);
+ memblock_limit = required_kernelcore_max_pfn << PAGE_SHIFT;
+ return 0;
}
early_param("kernelcore_max_addr", cmdline_parse_kernelcore_max_addr);
#endif
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 24/25] memblock: compare current_limit with end variable at memblock_find_in_range_node()
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (14 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 23/25] memblock: limit memory address from memblock Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 25/25] mm, memory-hotplug: add online_movable and online_kernel Lai Jiangshan
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Yasuaki Ishimatsu, Lai Jiangshan, Tejun Heo, Andrew Morton,
Ingo Molnar, Gavin Shan, linux-mm
From: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
memblock_find_in_range_node() does not compare memblock.current_limit
with end variable. Thus even if memblock.current_limit is smaller than
end variable, the function allocates memory address that is bigger than
memblock.current_limit.
The patch adds the check to "memblock_find_in_range_node()"
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
mm/memblock.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 663b805..ce7fcb6 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -99,11 +99,12 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
phys_addr_t align, int nid)
{
phys_addr_t this_start, this_end, cand;
+ phys_addr_t current_limit = memblock.current_limit;
u64 i;
/* pump up @end */
- if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
- end = memblock.current_limit;
+ if ((end == MEMBLOCK_ALLOC_ACCESSIBLE) || (end > current_limit))
+ end = current_limit;
/* avoid allocating the first page */
start = max_t(phys_addr_t, start, PAGE_SIZE);
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [RFC V3 PATCH 25/25] mm, memory-hotplug: add online_movable and online_kernel
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
` (15 preceding siblings ...)
2012-08-06 9:23 ` [RFC V3 PATCH 24/25] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
@ 2012-08-06 9:23 ` Lai Jiangshan
16 siblings, 0 replies; 37+ messages in thread
From: Lai Jiangshan @ 2012-08-06 9:23 UTC (permalink / raw)
To: Mel Gorman, linux-kernel
Cc: Lai Jiangshan, Rob Landley, Greg Kroah-Hartman, Paul Gortmaker,
Andrew Morton, Bjorn Helgaas, David Rientjes, linux-doc, linux-mm
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.
It makes things easy when we dynamic hot-add/remove memory, make better
utilities of memories, and helps for THP.
Current constraints: Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be onlined from ZONE_NORMAL to ZONE_MOVABLE.
For opposite onlining behavior, we also introduce "online_kernel" to change
a memoryblock of ZONE_MOVABLE to ZONE_KERNEL when online.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
Documentation/memory-hotplug.txt | 14 +++++-
drivers/base/memory.c | 19 +++++---
include/linux/memory_hotplug.h | 13 +++++-
mm/memory_hotplug.c | 101 +++++++++++++++++++++++++++++++++++++-
4 files changed, 137 insertions(+), 10 deletions(-)
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 70bc1c7..8e5eacb 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -161,7 +161,8 @@ a recent addition and not present on older kernels.
in the memory block.
'state' : read-write
at read: contains online/offline state of memory.
- at write: user can specify "online", "offline" command
+ at write: user can specify "online_kernel",
+ "online_movable", "online", "offline" command
which will be performed on al sections in the block.
'phys_device' : read-only: designed to show the name of physical memory
device. This is not well implemented now.
@@ -255,6 +256,17 @@ For onlining, you have to write "online" to the section's state file as:
% echo online > /sys/devices/system/memory/memoryXXX/state
+This onlining will not change the ZONE type of the target memory section,
+If the memory section is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
+
+% echo online_movable > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_MOVABLE)
+
+And if the memory section is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
+
+% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
+(NOTE: current limit: this memory section must be adjacent to ZONE_NORMAL)
+
After this, section memoryXXX's state will be 'online' and the amount of
available memory will be increased.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7dda4f7..1ad2f48 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -246,7 +246,7 @@ static bool pages_correctly_reserved(unsigned long start_pfn,
* OK to have direct references to sparsemem variables in here.
*/
static int
-memory_block_action(unsigned long phys_index, unsigned long action)
+memory_block_action(unsigned long phys_index, unsigned long action, int online_type)
{
unsigned long start_pfn, start_paddr;
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
@@ -262,7 +262,7 @@ memory_block_action(unsigned long phys_index, unsigned long action)
if (!pages_correctly_reserved(start_pfn, nr_pages))
return -EBUSY;
- ret = online_pages(start_pfn, nr_pages);
+ ret = online_pages(start_pfn, nr_pages, online_type);
break;
case MEM_OFFLINE:
start_paddr = page_to_pfn(first_page) << PAGE_SHIFT;
@@ -279,7 +279,8 @@ memory_block_action(unsigned long phys_index, unsigned long action)
}
static int memory_block_change_state(struct memory_block *mem,
- unsigned long to_state, unsigned long from_state_req)
+ unsigned long to_state, unsigned long from_state_req,
+ int online_type)
{
int ret = 0;
@@ -293,7 +294,7 @@ static int memory_block_change_state(struct memory_block *mem,
if (to_state == MEM_OFFLINE)
mem->state = MEM_GOING_OFFLINE;
- ret = memory_block_action(mem->start_section_nr, to_state);
+ ret = memory_block_action(mem->start_section_nr, to_state, online_type);
if (ret) {
mem->state = from_state_req;
@@ -325,10 +326,14 @@ store_mem_state(struct device *dev,
mem = container_of(dev, struct memory_block, dev);
- if (!strncmp(buf, "online", min((int)count, 6)))
- ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
+ if (!strncmp(buf, "online_kernel", min((int)count, 13)))
+ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_KERNEL);
+ else if (!strncmp(buf, "online_movable", min((int)count, 14)))
+ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_MOVABLE);
+ else if (!strncmp(buf, "online", min((int)count, 6)))
+ ret = memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE, ONLINE_KEEP);
else if(!strncmp(buf, "offline", min((int)count, 7)))
- ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE);
+ ret = memory_block_change_state(mem, MEM_OFFLINE, MEM_ONLINE, -1);
if (ret)
return ret;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 910550f..047cd1d 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -25,6 +25,13 @@ enum {
MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE = NODE_INFO,
};
+/* Types for control the zone type of onlined memory */
+enum {
+ ONLINE_KEEP,
+ ONLINE_KERNEL,
+ ONLINE_MOVABLE,
+};
+
/*
* pgdat resizing functions
*/
@@ -45,6 +52,10 @@ void pgdat_resize_init(struct pglist_data *pgdat)
}
/*
* Zone resizing functions
+ *
+ * Note: any attempt to resize a zone should has pgdat_resize_lock()
+ * zone_span_writelock() both held. This ensure the size of a zone
+ * can't be changed while pgdat_resize_lock() held.
*/
static inline unsigned zone_span_seqbegin(struct zone *zone)
{
@@ -70,7 +81,7 @@ extern int zone_grow_free_lists(struct zone *zone, unsigned long new_nr_pages);
extern int zone_grow_waitqueues(struct zone *zone, unsigned long nr_pages);
extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
/* VM interface that may be used by firmware interface */
-extern int online_pages(unsigned long, unsigned long);
+extern int online_pages(unsigned long, unsigned long, int);
extern void __offline_isolated_pages(unsigned long, unsigned long);
typedef void (*online_page_callback_t)(struct page *page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c2c96a4..4e1db0a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -210,6 +210,89 @@ static void grow_zone_span(struct zone *zone, unsigned long start_pfn,
zone_span_writeunlock(zone);
}
+static void resize_zone(struct zone *zone, unsigned long start_pfn,
+ unsigned long end_pfn)
+{
+
+ zone_span_writelock(zone);
+
+ zone->zone_start_pfn = start_pfn;
+ zone->spanned_pages = end_pfn - start_pfn;
+
+ zone_span_writeunlock(zone);
+}
+
+static void fix_zone_id(struct zone *zone, unsigned long start_pfn,
+ unsigned long end_pfn)
+{
+ enum zone_type zid = zone_idx(zone);
+ int nid = zone->zone_pgdat->node_id;
+ unsigned long pfn;
+
+ for (pfn = start_pfn; pfn < end_pfn; pfn++)
+ set_page_links(pfn_to_page(pfn), zid, nid, pfn);
+}
+
+static int move_pfn_range_left(struct zone *z1, struct zone *z2,
+ unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long flags;
+
+ pgdat_resize_lock(z1->zone_pgdat, &flags);
+
+ /* can't move pfns which are higher than @z2 */
+ if (end_pfn > z2->zone_start_pfn + z2->spanned_pages)
+ goto out_fail;
+ /* the move out part mast at the left most of @z2 */
+ if (start_pfn > z2->zone_start_pfn)
+ goto out_fail;
+ /* must included/overlap */
+ if (end_pfn <= z2->zone_start_pfn)
+ goto out_fail;
+
+ resize_zone(z1, z1->zone_start_pfn, end_pfn);
+ resize_zone(z2, end_pfn, z2->zone_start_pfn + z2->spanned_pages);
+
+ pgdat_resize_unlock(z1->zone_pgdat, &flags);
+
+ fix_zone_id(z1, start_pfn, end_pfn);
+
+ return 0;
+out_fail:
+ pgdat_resize_unlock(z1->zone_pgdat, &flags);
+ return -1;
+}
+
+static int move_pfn_range_right(struct zone *z1, struct zone *z2,
+ unsigned long start_pfn, unsigned long end_pfn)
+{
+ unsigned long flags;
+
+ pgdat_resize_lock(z1->zone_pgdat, &flags);
+
+ /* can't move pfns which are lower than @z1 */
+ if (z1->zone_start_pfn > start_pfn)
+ goto out_fail;
+ /* the move out part mast at the right most of @z1 */
+ if (z1->zone_start_pfn + z1->spanned_pages > end_pfn)
+ goto out_fail;
+ /* must included/overlap */
+ if (start_pfn >= z1->zone_start_pfn + z1->spanned_pages)
+ goto out_fail;
+
+ resize_zone(z1, z1->zone_start_pfn, start_pfn);
+ resize_zone(z2, start_pfn, z2->zone_start_pfn + z2->spanned_pages);
+
+ pgdat_resize_unlock(z1->zone_pgdat, &flags);
+
+ fix_zone_id(z2, start_pfn, end_pfn);
+
+ return 0;
+out_fail:
+ pgdat_resize_unlock(z1->zone_pgdat, &flags);
+ return -1;
+}
+
static void grow_pgdat_span(struct pglist_data *pgdat, unsigned long start_pfn,
unsigned long end_pfn)
{
@@ -501,7 +584,7 @@ static void set_nodemasks(int node, struct memory_notify *arg)
}
-int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
+int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type)
{
unsigned long onlined_pages = 0;
struct zone *zone;
@@ -518,6 +601,22 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
*/
zone = page_zone(pfn_to_page(pfn));
+ if (online_type == ONLINE_KERNEL && zone_idx(zone) == ZONE_MOVABLE) {
+ if (move_pfn_range_left(zone - 1, zone, pfn, pfn + nr_pages)) {
+ unlock_memory_hotplug();
+ return -1;
+ }
+ }
+ if (online_type == ONLINE_MOVABLE && zone_idx(zone) == ZONE_MOVABLE - 1) {
+ if (move_pfn_range_right(zone, zone + 1, pfn, pfn + nr_pages)) {
+ unlock_memory_hotplug();
+ return -1;
+ }
+ }
+
+ /* Previous code may changed the zone of the pfn range */
+ zone = page_zone(pfn_to_page(pfn));
+
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
check_nodemasks_changes_online(nr_pages, zone, &arg);
--
1.7.4.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 37+ messages in thread
end of thread, other threads:[~2012-08-06 9:24 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1343887288-8866-1-git-send-email-laijs@cn.fujitsu.com>
2012-08-02 6:01 ` [RFC PATCH 04/23 V2] oom: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 05/23 V2] mm,migrate: " Lai Jiangshan
2012-08-02 16:09 ` Christoph Lameter
2012-08-02 6:01 ` [RFC PATCH 06/23 V2] mempolicy: " Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 07/23 V2] memcontrol: " Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 08/23 V2] hugetlb: " Lai Jiangshan
2012-08-04 14:02 ` Hillf Danton
2012-08-02 6:01 ` [RFC PATCH 09/23 V2] vmstat: " Lai Jiangshan
2012-08-02 16:09 ` Christoph Lameter
2012-08-02 6:01 ` [RFC PATCH 12/23 V2] vmscan: " Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 13/23 V2] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 14/23 V2] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 15/23 V2] memory_hotplug: fix missing nodemask management Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 16/23 V2] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 17/23 V2] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 18/23 V2] page_alloc: add kernelcore_max_addr Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 21/23 V2] memblock: limit memory address from memblock Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 22/23 V2] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
2012-08-02 6:01 ` [RFC PATCH 23/23 V2] mm, memory-hotplug: add online_movable Lai Jiangshan
[not found] ` <1344244999-5081-1-git-send-email-laijs@cn.fujitsu.com>
2012-08-06 9:22 ` [RFC V3 PATCH 01/25] page_alloc.c: don't subtract unrelated memmap from zone's present pages Lai Jiangshan
2012-08-06 9:22 ` [RFC V3 PATCH 02/25] memory_hotplug: fix missing nodemask management Lai Jiangshan
2012-08-06 9:22 ` [RFC V3 PATCH 03/25] slub, hotplug: ignore unrelated node's hot-adding and hot-removing Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 08/25] memcontrol: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 09/25] oom: " Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 10/25] mm,migrate: " Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 11/25] mempolicy: " Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 12/25] hugetlb: " Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 13/25] vmstat: " Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 16/25] vmscan: " Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 17/25] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 18/25] hotplug: update nodemasks management Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 19/25] numa: add CONFIG_MOVABLE_NODE for movable-dedicated node Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 20/25] page_alloc: add kernelcore_max_addr Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 23/25] memblock: limit memory address from memblock Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 24/25] memblock: compare current_limit with end variable at memblock_find_in_range_node() Lai Jiangshan
2012-08-06 9:23 ` [RFC V3 PATCH 25/25] mm, memory-hotplug: add online_movable and online_kernel Lai Jiangshan
2012-08-02 2:52 [RFC PATCH 00/23 V2] memory, numa: introduce MOVABLE-dedicated node and online_movable for hotplug Lai Jiangshan
[not found] ` <1343875991-7533-1-git-send-email-laijs-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2012-08-02 2:52 ` [RFC PATCH 06/23 V2] mempolicy: use N_MEMORY instead N_HIGH_MEMORY Lai Jiangshan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).