* [PATCH] mm: fix invalid node in alloc_migrate_target() @ 2016-03-25 6:56 Xishi Qiu 2016-03-25 19:22 ` Andrew Morton 2016-03-29 12:25 ` Vlastimil Babka 0 siblings, 2 replies; 14+ messages in thread From: Xishi Qiu @ 2016-03-25 6:56 UTC (permalink / raw) To: Andrew Morton, Vlastimil Babka, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10 Cc: Xishi Qiu, Linux MM, LKML It is incorrect to use next_node to find a target node, it will return MAX_NUMNODES or invalid node. This will lead to crash in buddy system allocation. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> --- mm/page_isolation.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 92c4c36..31555b6 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, * now as a simple work-around, we use the next node for destination. */ if (PageHuge(page)) { - nodemask_t src = nodemask_of_node(page_to_nid(page)); - nodemask_t dst; - nodes_complement(dst, src); + int node = next_online_node(page_to_nid(page)); + if (node == MAX_NUMNODES) + node = first_online_node; return alloc_huge_page_node(page_hstate(compound_head(page)), - next_node(page_to_nid(page), dst)); + node); } if (PageHighMem(page)) -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-25 6:56 [PATCH] mm: fix invalid node in alloc_migrate_target() Xishi Qiu @ 2016-03-25 19:22 ` Andrew Morton 2016-03-26 5:31 ` Xishi Qiu ` (2 more replies) 2016-03-29 12:25 ` Vlastimil Babka 1 sibling, 3 replies; 14+ messages in thread From: Andrew Morton @ 2016-03-25 19:22 UTC (permalink / raw) To: Xishi Qiu Cc: Vlastimil Babka, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote: > It is incorrect to use next_node to find a target node, it will > return MAX_NUMNODES or invalid node. This will lead to crash in > buddy system allocation. > > ... > > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, > * now as a simple work-around, we use the next node for destination. > */ > if (PageHuge(page)) { > - nodemask_t src = nodemask_of_node(page_to_nid(page)); > - nodemask_t dst; > - nodes_complement(dst, src); > + int node = next_online_node(page_to_nid(page)); > + if (node == MAX_NUMNODES) > + node = first_online_node; > return alloc_huge_page_node(page_hstate(compound_head(page)), > - next_node(page_to_nid(page), dst)); > + node); > } > > if (PageHighMem(page)) Indeed. Can you tell us more about this circumstances under which the kernel will crash? I need to decide which kernel version(s) need the patch, but the changelog doesn't contain the info needed to make this decision (it should). next_node() isn't a very useful interface, really. Just about every caller does this: node = next_node(node, XXX); if (node == MAX_NUMNODES) node = first_node(XXX); so how about we write a function which does that, and stop open-coding the same thing everywhere? And I think your fix could then use such a function: int node = that_new_function(page_to_nid(page), node_online_map); Also, mm/mempolicy.c:offset_il_node() worries me: do { nid = next_node(nid, pol->v.nodes); c++; } while (c <= target); Can't `nid' hit MAX_NUMNODES? And can someone please explain mem_cgroup_select_victim_node() to me? How can we hit the "node = numa_node_id()" path? Only if memcg->scan_nodes is empty? is that even valid? The comment seems to have not much to do with the code? mpol_rebind_nodemask() is similar. Something like this? From: Andrew Morton <akpm@linux-foundation.org> Subject: include/linux/nodemask.h: create next_node_in() helper Lots of code does node = next_node(node, XXX); if (node == MAX_NUMNODES) node = first_node(XXX); so create next_node_in() to do this and use it in various places. Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <js1304@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Laura Abbott" <lauraa@codeaurora.org> Cc: Hui Zhu <zhuhui@xiaomi.com> Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/nodemask.h | 18 +++++++++++++++++- kernel/cpuset.c | 8 +------- mm/hugetlb.c | 4 +--- mm/memcontrol.c | 4 +--- mm/mempolicy.c | 8 ++------ mm/page_isolation.c | 9 +++------ mm/slab.c | 13 +++---------- 7 files changed, 28 insertions(+), 36 deletions(-) diff -puN include/linux/nodemask.h~include-linux-nodemaskh-create-next_node_in-helper include/linux/nodemask.h --- a/include/linux/nodemask.h~include-linux-nodemaskh-create-next_node_in-helper +++ a/include/linux/nodemask.h @@ -43,8 +43,10 @@ * * int first_node(mask) Number lowest set bit, or MAX_NUMNODES * int next_node(node, mask) Next node past 'node', or MAX_NUMNODES + * int next_node_in(node, mask) Next node past 'node', or wrap to first, + * or MAX_NUMNODES * int first_unset_node(mask) First node not set in mask, or - * MAX_NUMNODES. + * MAX_NUMNODES * * nodemask_t nodemask_of_node(node) Return nodemask with bit 'node' set * NODE_MASK_ALL Initializer - all bits set @@ -259,6 +261,20 @@ static inline int __next_node(int n, con return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1)); } +/* + * Find the next present node in src, starting after node n, wrapping around to + * the first node in src if needed. Returns MAX_NUMNODES if src is empty. + */ +#define next_node_in(n, src) __next_node_in((n), &(src)) +static inline int __next_node_in(int node, const nodemask_t *srcp) +{ + int ret = __next_node(node, srcp); + + if (ret == MAX_NUMNODES) + ret = __first_node(srcp); + return ret; +} + static inline void init_nodemask_of_node(nodemask_t *mask, int node) { nodes_clear(*mask); diff -puN kernel/cpuset.c~include-linux-nodemaskh-create-next_node_in-helper kernel/cpuset.c --- a/kernel/cpuset.c~include-linux-nodemaskh-create-next_node_in-helper +++ a/kernel/cpuset.c @@ -2591,13 +2591,7 @@ int __cpuset_node_allowed(int node, gfp_ static int cpuset_spread_node(int *rotor) { - int node; - - node = next_node(*rotor, current->mems_allowed); - if (node == MAX_NUMNODES) - node = first_node(current->mems_allowed); - *rotor = node; - return node; + return *rotor = next_node_in(*rotor, current->mems_allowed); } int cpuset_mem_spread_node(void) diff -puN mm/hugetlb.c~include-linux-nodemaskh-create-next_node_in-helper mm/hugetlb.c --- a/mm/hugetlb.c~include-linux-nodemaskh-create-next_node_in-helper +++ a/mm/hugetlb.c @@ -937,9 +937,7 @@ err: */ static int next_node_allowed(int nid, nodemask_t *nodes_allowed) { - nid = next_node(nid, *nodes_allowed); - if (nid == MAX_NUMNODES) - nid = first_node(*nodes_allowed); + nid = next_node_in(nid, *nodes_allowed); VM_BUG_ON(nid >= MAX_NUMNODES); return nid; diff -puN mm/memcontrol.c~include-linux-nodemaskh-create-next_node_in-helper mm/memcontrol.c --- a/mm/memcontrol.c~include-linux-nodemaskh-create-next_node_in-helper +++ a/mm/memcontrol.c @@ -1388,9 +1388,7 @@ int mem_cgroup_select_victim_node(struct mem_cgroup_may_update_nodemask(memcg); node = memcg->last_scanned_node; - node = next_node(node, memcg->scan_nodes); - if (node == MAX_NUMNODES) - node = first_node(memcg->scan_nodes); + node = next_node_in(node, memcg->scan_nodes); /* * We call this when we hit limit, not when pages are added to LRU. * No LRU may hold pages because all pages are UNEVICTABLE or diff -puN mm/mempolicy.c~include-linux-nodemaskh-create-next_node_in-helper mm/mempolicy.c --- a/mm/mempolicy.c~include-linux-nodemaskh-create-next_node_in-helper +++ a/mm/mempolicy.c @@ -347,9 +347,7 @@ static void mpol_rebind_nodemask(struct BUG(); if (!node_isset(current->il_next, tmp)) { - current->il_next = next_node(current->il_next, tmp); - if (current->il_next >= MAX_NUMNODES) - current->il_next = first_node(tmp); + current->il_next = next_node_in(current->il_next, tmp); if (current->il_next >= MAX_NUMNODES) current->il_next = numa_node_id(); } @@ -1709,9 +1707,7 @@ static unsigned interleave_nodes(struct struct task_struct *me = current; nid = me->il_next; - next = next_node(nid, policy->v.nodes); - if (next >= MAX_NUMNODES) - next = first_node(policy->v.nodes); + next = next_node_in(nid, policy->v.nodes); if (next < MAX_NUMNODES) me->il_next = next; return nid; diff -puN mm/page_isolation.c~include-linux-nodemaskh-create-next_node_in-helper mm/page_isolation.c --- a/mm/page_isolation.c~include-linux-nodemaskh-create-next_node_in-helper +++ a/mm/page_isolation.c @@ -288,13 +288,10 @@ struct page *alloc_migrate_target(struct * accordance with memory policy of the user process if possible. For * now as a simple work-around, we use the next node for destination. */ - if (PageHuge(page)) { - int node = next_online_node(page_to_nid(page)); - if (node == MAX_NUMNODES) - node = first_online_node; + if (PageHuge(page)) return alloc_huge_page_node(page_hstate(compound_head(page)), - node); - } + next_node_in(page_to_nid(page), + node_online_map)); if (PageHighMem(page)) gfp_mask |= __GFP_HIGHMEM; diff -puN mm/slab.c~include-linux-nodemaskh-create-next_node_in-helper mm/slab.c --- a/mm/slab.c~include-linux-nodemaskh-create-next_node_in-helper +++ a/mm/slab.c @@ -519,22 +519,15 @@ static DEFINE_PER_CPU(unsigned long, sla static void init_reap_node(int cpu) { - int node; - - node = next_node(cpu_to_mem(cpu), node_online_map); - if (node == MAX_NUMNODES) - node = first_node(node_online_map); - - per_cpu(slab_reap_node, cpu) = node; + per_cpu(slab_reap_node, cpu) = next_node_in(cpu_to_mem(cpu), + node_online_map); } static void next_reap_node(void) { int node = __this_cpu_read(slab_reap_node); - node = next_node(node, node_online_map); - if (unlikely(node >= MAX_NUMNODES)) - node = first_node(node_online_map); + node = next_node_in(node, node_online_map); __this_cpu_write(slab_reap_node, node); } _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-25 19:22 ` Andrew Morton @ 2016-03-26 5:31 ` Xishi Qiu 2016-03-29 9:52 ` Vlastimil Babka 2016-03-29 13:06 ` Vlastimil Babka 2016-03-29 15:52 ` Michal Hocko 2 siblings, 1 reply; 14+ messages in thread From: Xishi Qiu @ 2016-03-26 5:31 UTC (permalink / raw) To: Andrew Morton Cc: Vlastimil Babka, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML On 2016/3/26 3:22, Andrew Morton wrote: > On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote: > >> It is incorrect to use next_node to find a target node, it will >> return MAX_NUMNODES or invalid node. This will lead to crash in >> buddy system allocation. >> >> ... >> >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, >> * now as a simple work-around, we use the next node for destination. >> */ >> if (PageHuge(page)) { >> - nodemask_t src = nodemask_of_node(page_to_nid(page)); >> - nodemask_t dst; >> - nodes_complement(dst, src); >> + int node = next_online_node(page_to_nid(page)); >> + if (node == MAX_NUMNODES) >> + node = first_online_node; >> return alloc_huge_page_node(page_hstate(compound_head(page)), >> - next_node(page_to_nid(page), dst)); >> + node); >> } >> >> if (PageHighMem(page)) > > Indeed. Can you tell us more about this circumstances under which the > kernel will crash? I need to decide which kernel version(s) need the > patch, but the changelog doesn't contain the info needed to make this > decision (it should). > Hi Andrew, I read the code v4.4, and find the following path maybe trigger the bug. alloc_migrate_target() alloc_huge_page_node() // the node may be offline or MAX_NUMNODES __alloc_buddy_huge_page_no_mpol() __alloc_buddy_huge_page() __hugetlb_alloc_buddy_huge_page() alloc_pages_node() __alloc_pages_node() VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); VM_WARN_ON(!node_online(nid)); Thanks, Xishi Qiu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-26 5:31 ` Xishi Qiu @ 2016-03-29 9:52 ` Vlastimil Babka 2016-03-29 10:06 ` Vlastimil Babka 2016-03-29 10:37 ` Xishi Qiu 0 siblings, 2 replies; 14+ messages in thread From: Vlastimil Babka @ 2016-03-29 9:52 UTC (permalink / raw) To: Xishi Qiu, Andrew Morton Cc: Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML, Dave Hansen On 03/26/2016 06:31 AM, Xishi Qiu wrote: > On 2016/3/26 3:22, Andrew Morton wrote: > >> On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote: >> >>> It is incorrect to use next_node to find a target node, it will >>> return MAX_NUMNODES or invalid node. This will lead to crash in >>> buddy system allocation. >>> >>> ... >>> >>> --- a/mm/page_isolation.c >>> +++ b/mm/page_isolation.c >>> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, >>> * now as a simple work-around, we use the next node for destination. >>> */ >>> if (PageHuge(page)) { >>> - nodemask_t src = nodemask_of_node(page_to_nid(page)); >>> - nodemask_t dst; >>> - nodes_complement(dst, src); >>> + int node = next_online_node(page_to_nid(page)); >>> + if (node == MAX_NUMNODES) >>> + node = first_online_node; >>> return alloc_huge_page_node(page_hstate(compound_head(page)), >>> - next_node(page_to_nid(page), dst)); >>> + node); >>> } >>> >>> if (PageHighMem(page)) >> >> Indeed. Can you tell us more about this circumstances under which the >> kernel will crash? I need to decide which kernel version(s) need the >> patch, but the changelog doesn't contain the info needed to make this >> decision (it should). >> > > Hi Andrew, > > I read the code v4.4, and find the following path maybe trigger the bug. > > alloc_migrate_target() > alloc_huge_page_node() // the node may be offline or MAX_NUMNODES > __alloc_buddy_huge_page_no_mpol() > __alloc_buddy_huge_page() > __hugetlb_alloc_buddy_huge_page() The code in this functions seems to come from 099730d67417d ("mm, hugetlb: use memory policy when available") by Dave Hansen (adding to CC), which was indeed merged in 4.4-rc1. However, alloc_pages_node() is only called in the block guarded by: if (!IS_ENABLED(CONFIG_NUMA) || !vma) { The rather weird "!IS_ENABLED(CONFIG_NUMA)" part comes from immediate followup commit e0ec90ee7e6f ("mm, hugetlbfs: optimize when NUMA=n") So I doubt the code path here can actually happen. But it's fragile and confusing nevertheless. > alloc_pages_node() > __alloc_pages_node() > VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); > VM_WARN_ON(!node_online(nid)); > > Thanks, > Xishi Qiu > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-29 9:52 ` Vlastimil Babka @ 2016-03-29 10:06 ` Vlastimil Babka 2016-03-29 10:37 ` Xishi Qiu 1 sibling, 0 replies; 14+ messages in thread From: Vlastimil Babka @ 2016-03-29 10:06 UTC (permalink / raw) To: Xishi Qiu, Andrew Morton Cc: Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML, Dave Hansen On 03/29/2016 11:52 AM, Vlastimil Babka wrote: > On 03/26/2016 06:31 AM, Xishi Qiu wrote: >> On 2016/3/26 3:22, Andrew Morton wrote: >> >>> On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote: >>> >>>> It is incorrect to use next_node to find a target node, it will >>>> return MAX_NUMNODES or invalid node. This will lead to crash in >>>> buddy system allocation. >>>> >>>> ... >>>> >>>> --- a/mm/page_isolation.c >>>> +++ b/mm/page_isolation.c >>>> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, >>>> * now as a simple work-around, we use the next node for destination. >>>> */ >>>> if (PageHuge(page)) { >>>> - nodemask_t src = nodemask_of_node(page_to_nid(page)); >>>> - nodemask_t dst; >>>> - nodes_complement(dst, src); >>>> + int node = next_online_node(page_to_nid(page)); >>>> + if (node == MAX_NUMNODES) >>>> + node = first_online_node; >>>> return alloc_huge_page_node(page_hstate(compound_head(page)), >>>> - next_node(page_to_nid(page), dst)); >>>> + node); >>>> } >>>> >>>> if (PageHighMem(page)) >>> >>> Indeed. Can you tell us more about this circumstances under which the >>> kernel will crash? I need to decide which kernel version(s) need the >>> patch, but the changelog doesn't contain the info needed to make this >>> decision (it should). >>> >> >> Hi Andrew, >> >> I read the code v4.4, and find the following path maybe trigger the bug. >> >> alloc_migrate_target() >> alloc_huge_page_node() // the node may be offline or MAX_NUMNODES >> __alloc_buddy_huge_page_no_mpol() >> __alloc_buddy_huge_page() >> __hugetlb_alloc_buddy_huge_page() > > The code in this functions seems to come from 099730d67417d ("mm, > hugetlb: use memory policy when available") by Dave Hansen (adding to > CC), which was indeed merged in 4.4-rc1. > > However, alloc_pages_node() is only called in the block guarded by: > > if (!IS_ENABLED(CONFIG_NUMA) || !vma) { > > The rather weird "!IS_ENABLED(CONFIG_NUMA)" part comes from immediate > followup commit e0ec90ee7e6f ("mm, hugetlbfs: optimize when NUMA=n") > > So I doubt the code path here can actually happen. But it's fragile and > confusing nevertheless. Ah, so there's actually a dangerous path: alloc_huge_page_node() dequeue_huge_page_node() list_for_each_entry(page, &h->hugepage_freelists[nid], lru) hugepage_freelists is MAX_NUMNODES sized, so when nid is MAX_NUMNODES, we access past it. However, look closer at how nid is obtained in alloc_migrate_target(): nodemask_t src = nodemask_of_node(page_to_nid(page)); nodemask_t dst; nodes_complement(dst, src); nid = next_node(page_to_nid(page), dst) for nid to be MAX_NUMNODES, the original page has to be on node MAX_NUMNODES-1, otherwise the complement part means we hit the very next bit which is set. It's actually a rather obfuscated way of doing: nid = page_to_nid(page) + 1; In that case the problem is in commit c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") from 3.12 and will likely affect only people that tune down MAX_NUMNODES to match their machine. >> alloc_pages_node() >> __alloc_pages_node() >> VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); >> VM_WARN_ON(!node_online(nid)); >> >> Thanks, >> Xishi Qiu >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-29 9:52 ` Vlastimil Babka 2016-03-29 10:06 ` Vlastimil Babka @ 2016-03-29 10:37 ` Xishi Qiu 2016-03-29 12:21 ` Vlastimil Babka 1 sibling, 1 reply; 14+ messages in thread From: Xishi Qiu @ 2016-03-29 10:37 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML, Dave Hansen On 2016/3/29 17:52, Vlastimil Babka wrote: > On 03/26/2016 06:31 AM, Xishi Qiu wrote: >> On 2016/3/26 3:22, Andrew Morton wrote: >> >>> On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote: >>> >>>> It is incorrect to use next_node to find a target node, it will >>>> return MAX_NUMNODES or invalid node. This will lead to crash in >>>> buddy system allocation. >>>> >>>> ... >>>> >>>> --- a/mm/page_isolation.c >>>> +++ b/mm/page_isolation.c >>>> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, >>>> * now as a simple work-around, we use the next node for destination. >>>> */ >>>> if (PageHuge(page)) { >>>> - nodemask_t src = nodemask_of_node(page_to_nid(page)); >>>> - nodemask_t dst; >>>> - nodes_complement(dst, src); >>>> + int node = next_online_node(page_to_nid(page)); >>>> + if (node == MAX_NUMNODES) >>>> + node = first_online_node; >>>> return alloc_huge_page_node(page_hstate(compound_head(page)), >>>> - next_node(page_to_nid(page), dst)); >>>> + node); >>>> } >>>> >>>> if (PageHighMem(page)) >>> >>> Indeed. Can you tell us more about this circumstances under which the >>> kernel will crash? I need to decide which kernel version(s) need the >>> patch, but the changelog doesn't contain the info needed to make this >>> decision (it should). >>> >> >> Hi Andrew, >> >> I read the code v4.4, and find the following path maybe trigger the bug. >> >> alloc_migrate_target() >> alloc_huge_page_node() // the node may be offline or MAX_NUMNODES >> __alloc_buddy_huge_page_no_mpol() >> __alloc_buddy_huge_page() >> __hugetlb_alloc_buddy_huge_page() > > The code in this functions seems to come from 099730d67417d ("mm, hugetlb: use memory policy when available") by Dave Hansen (adding to CC), which was indeed merged in 4.4-rc1. > > However, alloc_pages_node() is only called in the block guarded by: > > if (!IS_ENABLED(CONFIG_NUMA) || !vma) { > > The rather weird "!IS_ENABLED(CONFIG_NUMA)" part comes from immediate followup commit e0ec90ee7e6f ("mm, hugetlbfs: optimize when NUMA=n") > > So I doubt the code path here can actually happen. But it's fragile and confusing nevertheless. > Hi Vlastimil __alloc_buddy_huge_page(h, NULL, addr, nid); // so the vma is NULL Thanks, Xishi Qiu >> alloc_pages_node() >> __alloc_pages_node() >> VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); >> VM_WARN_ON(!node_online(nid)); >> >> Thanks, >> Xishi Qiu >> > > > . > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-29 10:37 ` Xishi Qiu @ 2016-03-29 12:21 ` Vlastimil Babka 0 siblings, 0 replies; 14+ messages in thread From: Vlastimil Babka @ 2016-03-29 12:21 UTC (permalink / raw) To: Xishi Qiu Cc: Andrew Morton, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML, Dave Hansen On 03/29/2016 12:37 PM, Xishi Qiu wrote: > On 2016/3/29 17:52, Vlastimil Babka wrote: >> The code in this functions seems to come from 099730d67417d ("mm, hugetlb: use memory policy when available") by Dave Hansen (adding to CC), which was indeed merged in 4.4-rc1. >> >> However, alloc_pages_node() is only called in the block guarded by: >> >> if (!IS_ENABLED(CONFIG_NUMA) || !vma) { >> >> The rather weird "!IS_ENABLED(CONFIG_NUMA)" part comes from immediate followup commit e0ec90ee7e6f ("mm, hugetlbfs: optimize when NUMA=n") >> >> So I doubt the code path here can actually happen. But it's fragile and confusing nevertheless. >> > > Hi Vlastimil > > __alloc_buddy_huge_page(h, NULL, addr, nid); // so the vma is NULL Hm that's true, I got lost in the logic, thanks. But the problem with dequeue_huge_page_node() is also IMHO true, and older, so we should fix 3.12+. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-25 19:22 ` Andrew Morton 2016-03-26 5:31 ` Xishi Qiu @ 2016-03-29 13:06 ` Vlastimil Babka 2016-03-31 13:13 ` Vlastimil Babka 2016-03-29 15:52 ` Michal Hocko 2 siblings, 1 reply; 14+ messages in thread From: Vlastimil Babka @ 2016-03-29 13:06 UTC (permalink / raw) To: Andrew Morton, Xishi Qiu Cc: Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML On 03/25/2016 08:22 PM, Andrew Morton wrote: > On Fri, 25 Mar 2016 14:56:04 +0800 Xishi Qiu <qiuxishi@huawei.com> wrote: > >> It is incorrect to use next_node to find a target node, it will >> return MAX_NUMNODES or invalid node. This will lead to crash in >> buddy system allocation. >> >> ... >> >> --- a/mm/page_isolation.c >> +++ b/mm/page_isolation.c >> @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, >> * now as a simple work-around, we use the next node for destination. >> */ >> if (PageHuge(page)) { >> - nodemask_t src = nodemask_of_node(page_to_nid(page)); >> - nodemask_t dst; >> - nodes_complement(dst, src); >> + int node = next_online_node(page_to_nid(page)); >> + if (node == MAX_NUMNODES) >> + node = first_online_node; >> return alloc_huge_page_node(page_hstate(compound_head(page)), >> - next_node(page_to_nid(page), dst)); >> + node); >> } >> >> if (PageHighMem(page)) > > Indeed. Can you tell us more about this circumstances under which the > kernel will crash? I need to decide which kernel version(s) need the > patch, but the changelog doesn't contain the info needed to make this > decision (it should). > > > > next_node() isn't a very useful interface, really. Just about every > caller does this: > > > node = next_node(node, XXX); > if (node == MAX_NUMNODES) > node = first_node(XXX); > > so how about we write a function which does that, and stop open-coding > the same thing everywhere? Good idea. > And I think your fix could then use such a function: > > int node = that_new_function(page_to_nid(page), node_online_map); > > > > Also, mm/mempolicy.c:offset_il_node() worries me: > > do { > nid = next_node(nid, pol->v.nodes); > c++; > } while (c <= target); > > Can't `nid' hit MAX_NUMNODES? AFAICS it can. interleave_nid() uses this and the nid is then used e.g. in node_zonelist() where it's used for NODE_DATA(nid). That's quite scary. It also predates git. Why don't we see crashes or KASAN finding this? > > And can someone please explain mem_cgroup_select_victim_node() to me? > How can we hit the "node = numa_node_id()" path? Only if > memcg->scan_nodes is empty? is that even valid? The comment seems to > have not much to do with the code? I understand the comment that it's valid to be empty and the comment lists reasons why that can happen (with somewhat broken language). Note that I didn't verify these reasons: - we call this when hitting memcg limit, not when adding pages to LRU, as adding to LRU means it would contain the given LRU's node - adding to unevictable LRU means it's not added to scan_nodes (probably because scanning unevictable lru would be useless) - for other reasons (which?) it might have pages not on LRU and it's so small there are no other pages that would be on LRU > mpol_rebind_nodemask() is similar. > > > > Something like this? > > > From: Andrew Morton <akpm@linux-foundation.org> > Subject: include/linux/nodemask.h: create next_node_in() helper > > Lots of code does > > node = next_node(node, XXX); > if (node == MAX_NUMNODES) > node = first_node(XXX); > > so create next_node_in() to do this and use it in various places. > > Cc: Xishi Qiu <qiuxishi@huawei.com> > Cc: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Patch doesn't address offset_il_node() which is good, because if it's indeed buggy, it's serious and needs a non-cleanup patch. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-29 13:06 ` Vlastimil Babka @ 2016-03-31 13:13 ` Vlastimil Babka 2016-03-31 21:01 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Vlastimil Babka @ 2016-03-31 13:13 UTC (permalink / raw) To: Andrew Morton, Xishi Qiu Cc: Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML On 03/29/2016 03:06 PM, Vlastimil Babka wrote: > On 03/25/2016 08:22 PM, Andrew Morton wrote: >> Also, mm/mempolicy.c:offset_il_node() worries me: >> >> do { >> nid = next_node(nid, pol->v.nodes); >> c++; >> } while (c <= target); >> >> Can't `nid' hit MAX_NUMNODES? > > AFAICS it can. interleave_nid() uses this and the nid is then used e.g. > in node_zonelist() where it's used for NODE_DATA(nid). That's quite > scary. It also predates git. Why don't we see crashes or KASAN finding this? Ah, I see. In offset_il_node(), nid is initialized to -1, and the number of do-while iterations calling next_node() is up to the number of bits set in the pol->v.nodes bitmap, so it can't reach past the last set bit and return MAX_NUMNODES. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-31 13:13 ` Vlastimil Babka @ 2016-03-31 21:01 ` Andrew Morton 2016-04-01 8:42 ` Vlastimil Babka 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2016-03-31 21:01 UTC (permalink / raw) To: Vlastimil Babka Cc: Xishi Qiu, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML On Thu, 31 Mar 2016 15:13:41 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: > On 03/29/2016 03:06 PM, Vlastimil Babka wrote: > > On 03/25/2016 08:22 PM, Andrew Morton wrote: > >> Also, mm/mempolicy.c:offset_il_node() worries me: > >> > >> do { > >> nid = next_node(nid, pol->v.nodes); > >> c++; > >> } while (c <= target); > >> > >> Can't `nid' hit MAX_NUMNODES? > > > > AFAICS it can. interleave_nid() uses this and the nid is then used e.g. > > in node_zonelist() where it's used for NODE_DATA(nid). That's quite > > scary. It also predates git. Why don't we see crashes or KASAN finding this? > > Ah, I see. In offset_il_node(), nid is initialized to -1, and the number > of do-while iterations calling next_node() is up to the number of bits > set in the pol->v.nodes bitmap, so it can't reach past the last set bit > and return MAX_NUMNODES. Gack. offset_il_node() should be dragged out, strangled, shot then burnt. static unsigned offset_il_node(struct mempolicy *pol, struct vm_area_struct *vma, unsigned long off) { unsigned nnodes = nodes_weight(pol->v.nodes); unsigned target; int c; int nid = NUMA_NO_NODE; if (!nnodes) return numa_node_id(); target = (unsigned int)off % nnodes; c = 0; do { nid = next_node(nid, pol->v.nodes); c++; } while (c <= target); return nid; } For starters it is relying upon next_node(-1, ...) behaving like first_node(). Fair enough I guess, but that isn't very clear. static inline int __next_node(int n, const nodemask_t *srcp) { return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1)); } will start from node 0 when it does the n+1. Also it is relying upon NUMA_NO_NODE having a value of -1. That's just grubby - this code shouldn't "know" that NUMA_NO_NODE==-1. It would have been better to use plain old "-1" here. Does this look clearer and correct? /* * Do static interleaving for a VMA with known offset @n. Returns the n'th * node in pol->v.nodes (starting from n=0), wrapping around if n exceeds the * number of present nodes. */ static unsigned offset_il_node(struct mempolicy *pol, struct vm_area_struct *vma, unsigned long n) { unsigned nnodes = nodes_weight(pol->v.nodes); unsigned target; int i; int nid; if (!nnodes) return numa_node_id(); target = (unsigned int)n % nnodes; nid = first_node(pol->v.nodes); for (i = 0; i < target; i++) nid = next_node(nid, pol->v.nodes); return nid; } From: Andrew Morton <akpm@linux-foundation.org> Subject: mm/mempolicy.c:offset_il_node() document and clarify This code was pretty obscure and was relying upon obscure side-effects of next_node(-1, ...) and was relying upon NUMA_NO_NODE being equal to -1. Clean that all up and document the function's intent. Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/mempolicy.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff -puN mm/mempolicy.c~mm-mempolicyc-offset_il_node-document-and-clarify mm/mempolicy.c --- a/mm/mempolicy.c~mm-mempolicyc-offset_il_node-document-and-clarify +++ a/mm/mempolicy.c @@ -1763,23 +1763,25 @@ unsigned int mempolicy_slab_node(void) } } -/* Do static interleaving for a VMA with known offset. */ +/* + * Do static interleaving for a VMA with known offset @n. Returns the n'th + * node in pol->v.nodes (starting from n=0), wrapping around if n exceeds the + * number of present nodes. + */ static unsigned offset_il_node(struct mempolicy *pol, - struct vm_area_struct *vma, unsigned long off) + struct vm_area_struct *vma, unsigned long n) { unsigned nnodes = nodes_weight(pol->v.nodes); unsigned target; - int c; - int nid = NUMA_NO_NODE; + int i; + int nid; if (!nnodes) return numa_node_id(); - target = (unsigned int)off % nnodes; - c = 0; - do { + target = (unsigned int)n % nnodes; + nid = first_node(pol->v.nodes); + for (i = 0; i < target; i++) nid = next_node(nid, pol->v.nodes); - c++; - } while (c <= target); return nid; } _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-31 21:01 ` Andrew Morton @ 2016-04-01 8:42 ` Vlastimil Babka 0 siblings, 0 replies; 14+ messages in thread From: Vlastimil Babka @ 2016-04-01 8:42 UTC (permalink / raw) To: Andrew Morton Cc: Xishi Qiu, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML On 03/31/2016 11:01 PM, Andrew Morton wrote: > On Thu, 31 Mar 2016 15:13:41 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: > >> On 03/29/2016 03:06 PM, Vlastimil Babka wrote: >> > On 03/25/2016 08:22 PM, Andrew Morton wrote: >> >> Also, mm/mempolicy.c:offset_il_node() worries me: >> >> >> >> do { >> >> nid = next_node(nid, pol->v.nodes); >> >> c++; >> >> } while (c <= target); >> >> >> >> Can't `nid' hit MAX_NUMNODES? >> > >> > AFAICS it can. interleave_nid() uses this and the nid is then used e.g. >> > in node_zonelist() where it's used for NODE_DATA(nid). That's quite >> > scary. It also predates git. Why don't we see crashes or KASAN finding this? >> >> Ah, I see. In offset_il_node(), nid is initialized to -1, and the number >> of do-while iterations calling next_node() is up to the number of bits >> set in the pol->v.nodes bitmap, so it can't reach past the last set bit >> and return MAX_NUMNODES. > > Gack. offset_il_node() should be dragged out, strangled, shot then burnt. Ah, but you went with the much less amusing alternative of just fixing it. > static unsigned offset_il_node(struct mempolicy *pol, > struct vm_area_struct *vma, unsigned long off) > { > unsigned nnodes = nodes_weight(pol->v.nodes); > unsigned target; > int c; > int nid = NUMA_NO_NODE; > > if (!nnodes) > return numa_node_id(); > target = (unsigned int)off % nnodes; > c = 0; > do { > nid = next_node(nid, pol->v.nodes); > c++; > } while (c <= target); > return nid; > } > > For starters it is relying upon next_node(-1, ...) behaving like > first_node(). Fair enough I guess, but that isn't very clear. > > static inline int __next_node(int n, const nodemask_t *srcp) > { > return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1)); > } > > will start from node 0 when it does the n+1. > > Also it is relying upon NUMA_NO_NODE having a value of -1. That's just > grubby - this code shouldn't "know" that NUMA_NO_NODE==-1. It would have > been better to use plain old "-1" here. Yeah looks like a blind change of all "-1" to "NUMA_NO_NODE" happened at some point. > > Does this look clearer and correct? Definitely. > /* > * Do static interleaving for a VMA with known offset @n. Returns the n'th > * node in pol->v.nodes (starting from n=0), wrapping around if n exceeds the > * number of present nodes. > */ > static unsigned offset_il_node(struct mempolicy *pol, > struct vm_area_struct *vma, unsigned long n) > { > unsigned nnodes = nodes_weight(pol->v.nodes); > unsigned target; > int i; > int nid; > > if (!nnodes) > return numa_node_id(); > target = (unsigned int)n % nnodes; > nid = first_node(pol->v.nodes); > for (i = 0; i < target; i++) > nid = next_node(nid, pol->v.nodes); > return nid; > } > > > From: Andrew Morton <akpm@linux-foundation.org> > Subject: mm/mempolicy.c:offset_il_node() document and clarify > > This code was pretty obscure and was relying upon obscure side-effects of > next_node(-1, ...) and was relying upon NUMA_NO_NODE being equal to -1. > > Clean that all up and document the function's intent. > > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Xishi Qiu <qiuxishi@huawei.com> > Cc: Joonsoo Kim <js1304@gmail.com> > Cc: David Rientjes <rientjes@google.com> > Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > Cc: Laura Abbott <lauraa@codeaurora.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-25 19:22 ` Andrew Morton 2016-03-26 5:31 ` Xishi Qiu 2016-03-29 13:06 ` Vlastimil Babka @ 2016-03-29 15:52 ` Michal Hocko 2 siblings, 0 replies; 14+ messages in thread From: Michal Hocko @ 2016-03-29 15:52 UTC (permalink / raw) To: Andrew Morton Cc: Xishi Qiu, Vlastimil Babka, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10, Linux MM, LKML On Fri 25-03-16 12:22:37, Andrew Morton wrote: [...] > And can someone please explain mem_cgroup_select_victim_node() to me? > How can we hit the "node = numa_node_id()" path? Only if > memcg->scan_nodes is empty? Yes, this seems to be the primary motivation. mem_cgroup_may_update_nodemask might have seen all the pages on unevictable LRU last time it checked something. > is that even valid? I suspect it is really rare but it seems possible > The comment seems to have not much to do with the code? I guess the comment tries to say that the code path is triggered when we charge the page which happens _before_ it is added to the LRU list and so last_scanned_node might contain the stale data. Would something like the following be more clear? --- diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 17a847c96618..cff095318950 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1390,10 +1390,9 @@ int mem_cgroup_select_victim_node(struct mem_cgroup *memcg) node = next_node_in(node, memcg->scan_nodes); /* - * We call this when we hit limit, not when pages are added to LRU. - * No LRU may hold pages because all pages are UNEVICTABLE or - * memcg is too small and all pages are not on LRU. In that case, - * we use curret node. + * mem_cgroup_may_update_nodemask might have seen no reclaimmable pages + * last time it really checked all the LRUs due to rate limiting. + * Fallback to the current node in that case for simplicity. */ if (unlikely(node == MAX_NUMNODES)) node = numa_node_id(); -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-25 6:56 [PATCH] mm: fix invalid node in alloc_migrate_target() Xishi Qiu 2016-03-25 19:22 ` Andrew Morton @ 2016-03-29 12:25 ` Vlastimil Babka 2016-03-30 1:13 ` Naoya Horiguchi 1 sibling, 1 reply; 14+ messages in thread From: Vlastimil Babka @ 2016-03-29 12:25 UTC (permalink / raw) To: Xishi Qiu, Andrew Morton, Joonsoo Kim, David Rientjes, Naoya Horiguchi, Laura Abbott, zhuhui, wangxq10 Cc: Linux MM, LKML On 03/25/2016 07:56 AM, Xishi Qiu wrote: > It is incorrect to use next_node to find a target node, it will > return MAX_NUMNODES or invalid node. This will lead to crash in > buddy system allocation. One possible place of crash is: alloc_huge_page_node() dequeue_huge_page_node() [accesses h->hugepage_freelists[nid] with size MAX_NUMANODES] > Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Cc: stable Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > mm/page_isolation.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 92c4c36..31555b6 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, > * now as a simple work-around, we use the next node for destination. > */ > if (PageHuge(page)) { > - nodemask_t src = nodemask_of_node(page_to_nid(page)); > - nodemask_t dst; > - nodes_complement(dst, src); > + int node = next_online_node(page_to_nid(page)); > + if (node == MAX_NUMNODES) > + node = first_online_node; > return alloc_huge_page_node(page_hstate(compound_head(page)), > - next_node(page_to_nid(page), dst)); > + node); > } > > if (PageHighMem(page)) > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: fix invalid node in alloc_migrate_target() 2016-03-29 12:25 ` Vlastimil Babka @ 2016-03-30 1:13 ` Naoya Horiguchi 0 siblings, 0 replies; 14+ messages in thread From: Naoya Horiguchi @ 2016-03-30 1:13 UTC (permalink / raw) To: Vlastimil Babka Cc: Xishi Qiu, Andrew Morton, Joonsoo Kim, David Rientjes, Laura Abbott, zhuhui@xiaomi.com, wangxq10@lzu.edu.cn, Linux MM, LKML On Tue, Mar 29, 2016 at 02:25:03PM +0200, Vlastimil Babka wrote: > On 03/25/2016 07:56 AM, Xishi Qiu wrote: > >It is incorrect to use next_node to find a target node, it will > >return MAX_NUMNODES or invalid node. This will lead to crash in > >buddy system allocation. > > One possible place of crash is: > alloc_huge_page_node() > dequeue_huge_page_node() > [accesses h->hugepage_freelists[nid] with size MAX_NUMANODES] > > >Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> > > Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle > hugepage") > Cc: stable > Acked-by: Vlastimil Babka <vbabka@suse.cz> Thanks everyone for finding/fixing the bug! Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> > >--- > > mm/page_isolation.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > >diff --git a/mm/page_isolation.c b/mm/page_isolation.c > >index 92c4c36..31555b6 100644 > >--- a/mm/page_isolation.c > >+++ b/mm/page_isolation.c > >@@ -289,11 +289,11 @@ struct page *alloc_migrate_target(struct page *page, unsigned long private, > > * now as a simple work-around, we use the next node for destination. > > */ > > if (PageHuge(page)) { > >- nodemask_t src = nodemask_of_node(page_to_nid(page)); > >- nodemask_t dst; > >- nodes_complement(dst, src); > >+ int node = next_online_node(page_to_nid(page)); > >+ if (node == MAX_NUMNODES) > >+ node = first_online_node; > > return alloc_huge_page_node(page_hstate(compound_head(page)), > >- next_node(page_to_nid(page), dst)); > >+ node); > > } > > > > if (PageHighMem(page)) > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-04-01 8:42 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-25 6:56 [PATCH] mm: fix invalid node in alloc_migrate_target() Xishi Qiu 2016-03-25 19:22 ` Andrew Morton 2016-03-26 5:31 ` Xishi Qiu 2016-03-29 9:52 ` Vlastimil Babka 2016-03-29 10:06 ` Vlastimil Babka 2016-03-29 10:37 ` Xishi Qiu 2016-03-29 12:21 ` Vlastimil Babka 2016-03-29 13:06 ` Vlastimil Babka 2016-03-31 13:13 ` Vlastimil Babka 2016-03-31 21:01 ` Andrew Morton 2016-04-01 8:42 ` Vlastimil Babka 2016-03-29 15:52 ` Michal Hocko 2016-03-29 12:25 ` Vlastimil Babka 2016-03-30 1:13 ` Naoya Horiguchi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).