* [PATCH v3 0/2] handle memoryless nodes more appropriately @ 2023-10-19 10:43 Qi Zheng 2023-10-19 10:43 ` [PATCH v3 1/2] mm: page_alloc: skip memoryless nodes entirely Qi Zheng 2023-10-19 10:43 ` [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists Qi Zheng 0 siblings, 2 replies; 14+ messages in thread From: Qi Zheng @ 2023-10-19 10:43 UTC (permalink / raw) To: akpm, rppt, david, vbabka, mhocko Cc: willy, mgorman, mingo, aneesh.kumar, ying.huang, hannes, osalvador, linux-kernel, linux-mm, Qi Zheng Hi all, Currently, in the process of initialization or offline memory, memoryless nodes will still be built into the fallback list of itself or other nodes. This is not what we expected, so this patch series removes memoryless nodes from the fallback list entirely. This series is based on the next-20231018. Comments and suggestions are welcome. Thanks, Qi Changlog in v2 -> v3: - add a comment in [PATCH v2 2/2] (suggested by David Hildenbrand) - collect Acked-bys Changlog in v1 -> v2: - modify the commit message in [PATCH 1/2], mention that it can also fix the specific crash. (suggested by Ingo Molnar) - rebase onto the next-20231018 Qi Zheng (2): mm: page_alloc: skip memoryless nodes entirely mm: memory_hotplug: drop memoryless node from fallback lists mm/memory_hotplug.c | 6 +++++- mm/page_alloc.c | 7 +++++-- 2 files changed, 10 insertions(+), 3 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3 1/2] mm: page_alloc: skip memoryless nodes entirely 2023-10-19 10:43 [PATCH v3 0/2] handle memoryless nodes more appropriately Qi Zheng @ 2023-10-19 10:43 ` Qi Zheng 2023-10-20 8:31 ` Ingo Molnar 2023-10-19 10:43 ` [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists Qi Zheng 1 sibling, 1 reply; 14+ messages in thread From: Qi Zheng @ 2023-10-19 10:43 UTC (permalink / raw) To: akpm, rppt, david, vbabka, mhocko Cc: willy, mgorman, mingo, aneesh.kumar, ying.huang, hannes, osalvador, linux-kernel, linux-mm, Qi Zheng In find_next_best_node(), We skipped the memoryless nodes when building the zonelists of other normal nodes (N_NORMAL), but did not skip the memoryless node itself when building the zonelist. This will cause it to be traversed at runtime. For example, say we have node0 and node1, node0 is memoryless node, then the fallback order of node0 and node1 as follows: [ 0.153005] Fallback order for Node 0: 0 1 [ 0.153564] Fallback order for Node 1: 1 After this patch, we skip memoryless node0 entirely, then the fallback order of node0 and node1 as follows: [ 0.155236] Fallback order for Node 0: 1 [ 0.155806] Fallback order for Node 1: 1 So it becomes completely invisible, which will reduce runtime overhead. And in this way, we will not try to allocate pages from memoryless node0, then the panic mentioned in [1] will also be fixed. Even though this problem has been solved by dropping the NODE_MIN_SIZE constrain in x86 [2], it would be better to fix it in core MM as well. [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand <david@redhat.com> --- mm/page_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ee392a324802..e978272699d3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5052,8 +5052,11 @@ int find_next_best_node(int node, nodemask_t *used_node_mask) int min_val = INT_MAX; int best_node = NUMA_NO_NODE; - /* Use the local node if we haven't already */ - if (!node_isset(node, *used_node_mask)) { + /* + * Use the local node if we haven't already. But for memoryless local + * node, we should skip it and fallback to other nodes. + */ + if (!node_isset(node, *used_node_mask) && node_state(node, N_MEMORY)) { node_set(node, *used_node_mask); return node; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v3 1/2] mm: page_alloc: skip memoryless nodes entirely 2023-10-19 10:43 ` [PATCH v3 1/2] mm: page_alloc: skip memoryless nodes entirely Qi Zheng @ 2023-10-20 8:31 ` Ingo Molnar 2023-10-20 9:09 ` Qi Zheng 0 siblings, 1 reply; 14+ messages in thread From: Ingo Molnar @ 2023-10-20 8:31 UTC (permalink / raw) To: Qi Zheng Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, aneesh.kumar, ying.huang, hannes, osalvador, linux-kernel, linux-mm * Qi Zheng <zhengqi.arch@bytedance.com> wrote: > In find_next_best_node(), We skipped the memoryless nodes s/We /we s/the memoryless nodes /memoryless nodes > when building the zonelists of other normal nodes (N_NORMAL), > but did not skip the memoryless node itself when building > the zonelist. This will cause it to be traversed at runtime. > > For example, say we have node0 and node1, node0 is memoryless > node, then the fallback order of node0 and node1 as follows: > > [ 0.153005] Fallback order for Node 0: 0 1 > [ 0.153564] Fallback order for Node 1: 1 > > After this patch, we skip memoryless node0 entirely, then > the fallback order of node0 and node1 as follows: s/fallback /fall back > > [ 0.155236] Fallback order for Node 0: 1 > [ 0.155806] Fallback order for Node 1: 1 > > So it becomes completely invisible, which will reduce runtime > overhead. > > And in this way, we will not try to allocate pages from memoryless > node0, then the panic mentioned in [1] will also be fixed. Even though > this problem has been solved by dropping the NODE_MIN_SIZE constrain > in x86 [2], it would be better to fix it in core MM as well. s/in core MM /in the core MM > [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ > [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Acked-by: David Hildenbrand <david@redhat.com> > + /* > + * Use the local node if we haven't already. But for memoryless local > + * node, we should skip it and fallback to other nodes. s/fallback /fall back s/already. But /already, but Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 1/2] mm: page_alloc: skip memoryless nodes entirely 2023-10-20 8:31 ` Ingo Molnar @ 2023-10-20 9:09 ` Qi Zheng 0 siblings, 0 replies; 14+ messages in thread From: Qi Zheng @ 2023-10-20 9:09 UTC (permalink / raw) To: Ingo Molnar Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, aneesh.kumar, ying.huang, hannes, osalvador, linux-kernel, linux-mm On 2023/10/20 16:31, Ingo Molnar wrote: > > * Qi Zheng <zhengqi.arch@bytedance.com> wrote: > >> In find_next_best_node(), We skipped the memoryless nodes > > s/We > /we > > s/the memoryless nodes > /memoryless nodes > >> when building the zonelists of other normal nodes (N_NORMAL), >> but did not skip the memoryless node itself when building >> the zonelist. This will cause it to be traversed at runtime. >> >> For example, say we have node0 and node1, node0 is memoryless >> node, then the fallback order of node0 and node1 as follows: >> >> [ 0.153005] Fallback order for Node 0: 0 1 >> [ 0.153564] Fallback order for Node 1: 1 >> >> After this patch, we skip memoryless node0 entirely, then >> the fallback order of node0 and node1 as follows: > > s/fallback > /fall back > >> >> [ 0.155236] Fallback order for Node 0: 1 >> [ 0.155806] Fallback order for Node 1: 1 >> >> So it becomes completely invisible, which will reduce runtime >> overhead. >> >> And in this way, we will not try to allocate pages from memoryless >> node0, then the panic mentioned in [1] will also be fixed. Even though >> this problem has been solved by dropping the NODE_MIN_SIZE constrain >> in x86 [2], it would be better to fix it in core MM as well. > > s/in core MM > /in the core MM > >> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ >> [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> Acked-by: David Hildenbrand <david@redhat.com> > >> + /* >> + * Use the local node if we haven't already. But for memoryless local >> + * node, we should skip it and fallback to other nodes. > > s/fallback > /fall back > > s/already. But > /already, but Will fix the typos above. > > Acked-by: Ingo Molnar <mingo@kernel.org> Thanks. > > Thanks, > > Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-19 10:43 [PATCH v3 0/2] handle memoryless nodes more appropriately Qi Zheng 2023-10-19 10:43 ` [PATCH v3 1/2] mm: page_alloc: skip memoryless nodes entirely Qi Zheng @ 2023-10-19 10:43 ` Qi Zheng 2023-10-20 7:05 ` Huang, Ying 2023-10-20 8:32 ` Ingo Molnar 1 sibling, 2 replies; 14+ messages in thread From: Qi Zheng @ 2023-10-19 10:43 UTC (permalink / raw) To: akpm, rppt, david, vbabka, mhocko Cc: willy, mgorman, mingo, aneesh.kumar, ying.huang, hannes, osalvador, linux-kernel, linux-mm, Qi Zheng In offline_pages(), if a node becomes memoryless, we will clear its N_MEMORY state by calling node_states_clear_node(). But we do this after rebuilding the zonelists by calling build_all_zonelists(), which will cause this memoryless node to still be in the fallback list of other nodes. This will incur some runtime overhead. To drop memoryless node from fallback lists in this case, just call node_states_clear_node() before calling build_all_zonelists(). Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand <david@redhat.com> --- mm/memory_hotplug.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d4a364fdaf8f..f019f7d6272c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -2036,12 +2036,16 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, /* reinitialise watermarks and update pcp limits */ init_per_zone_wmark_min(); + /* + * Make sure to mark the node as memory-less before rebuilding the zone + * list. Otherwise this node would still appear in the fallback lists. + */ + node_states_clear_node(node, &arg); if (!populated_zone(zone)) { zone_pcp_reset(zone); build_all_zonelists(NULL); } - node_states_clear_node(node, &arg); if (arg.status_change_nid >= 0) { kcompactd_stop(node); kswapd_stop(node); -- 2.30.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-19 10:43 ` [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists Qi Zheng @ 2023-10-20 7:05 ` Huang, Ying 2023-10-20 7:35 ` Qi Zheng 2023-10-20 8:32 ` Ingo Molnar 1 sibling, 1 reply; 14+ messages in thread From: Huang, Ying @ 2023-10-20 7:05 UTC (permalink / raw) To: Qi Zheng Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, mingo, aneesh.kumar, hannes, osalvador, linux-kernel, linux-mm Qi Zheng <zhengqi.arch@bytedance.com> writes: > In offline_pages(), if a node becomes memoryless, we > will clear its N_MEMORY state by calling node_states_clear_node(). > But we do this after rebuilding the zonelists by calling > build_all_zonelists(), which will cause this memoryless node to > still be in the fallback list of other nodes. For fallback list, do you mean pgdat->node_zonelists[]? If so, in build_all_zonelists __build_all_zonelists build_zonelists build_zonelists_in_node_order build_zonerefs_node populated_zone() will be checked before adding zone into zonelist. So, IIUC, we will not try to allocate from the memory less node. -- Best Regards, Huang, Ying > This will incur > some runtime overhead. > > To drop memoryless node from fallback lists in this case, just > call node_states_clear_node() before calling build_all_zonelists(). > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Acked-by: David Hildenbrand <david@redhat.com> [snip] -- Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-20 7:05 ` Huang, Ying @ 2023-10-20 7:35 ` Qi Zheng 2023-10-23 1:18 ` Huang, Ying 0 siblings, 1 reply; 14+ messages in thread From: Qi Zheng @ 2023-10-20 7:35 UTC (permalink / raw) To: Huang, Ying Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, mingo, aneesh.kumar, hannes, osalvador, linux-kernel, linux-mm Hi Ying, On 2023/10/20 15:05, Huang, Ying wrote: > Qi Zheng <zhengqi.arch@bytedance.com> writes: > >> In offline_pages(), if a node becomes memoryless, we >> will clear its N_MEMORY state by calling node_states_clear_node(). >> But we do this after rebuilding the zonelists by calling >> build_all_zonelists(), which will cause this memoryless node to >> still be in the fallback list of other nodes. > > For fallback list, do you mean pgdat->node_zonelists[]? If so, in > > build_all_zonelists > __build_all_zonelists > build_zonelists > build_zonelists_in_node_order > build_zonerefs_node > > populated_zone() will be checked before adding zone into zonelist. > > So, IIUC, we will not try to allocate from the memory less node. Normally yes, but if it is the weird topology mentioned in [1], it's possible to allocate memory from it, it is a memoryless node, but it also has memory. In addition to the above case, I think it's reasonable to remove memory less node from node_order[] in advance. In this way it will not to be traversed in build_zonelists_in_node_order(). [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ Thanks, Qi > > -- > Best Regards, > Huang, Ying > > >> This will incur >> some runtime overhead. >> >> To drop memoryless node from fallback lists in this case, just >> call node_states_clear_node() before calling build_all_zonelists(). >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> Acked-by: David Hildenbrand <david@redhat.com> > > [snip] > > -- > Best Regards, > Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-20 7:35 ` Qi Zheng @ 2023-10-23 1:18 ` Huang, Ying 2023-10-23 2:53 ` Qi Zheng 0 siblings, 1 reply; 14+ messages in thread From: Huang, Ying @ 2023-10-23 1:18 UTC (permalink / raw) To: Qi Zheng Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, mingo, aneesh.kumar, hannes, osalvador, linux-kernel, linux-mm Qi Zheng <zhengqi.arch@bytedance.com> writes: > Hi Ying, > > On 2023/10/20 15:05, Huang, Ying wrote: >> Qi Zheng <zhengqi.arch@bytedance.com> writes: >> >>> In offline_pages(), if a node becomes memoryless, we >>> will clear its N_MEMORY state by calling node_states_clear_node(). >>> But we do this after rebuilding the zonelists by calling >>> build_all_zonelists(), which will cause this memoryless node to >>> still be in the fallback list of other nodes. >> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >> build_all_zonelists >> __build_all_zonelists >> build_zonelists >> build_zonelists_in_node_order >> build_zonerefs_node >> populated_zone() will be checked before adding zone into zonelist. >> So, IIUC, we will not try to allocate from the memory less node. > > Normally yes, but if it is the weird topology mentioned in [1], it's > possible to allocate memory from it, it is a memoryless node, but it > also has memory. > > In addition to the above case, I think it's reasonable to remove > memory less node from node_order[] in advance. In this way it will > not to be traversed in build_zonelists_in_node_order(). > > [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ Got it! Thank you for information. I think that it may be good to include this in the patch description to avoid potential confusing in the future. -- Best Regards, Huang, Ying > Thanks, > Qi > > >> -- >> Best Regards, >> Huang, Ying >> >>> This will incur >>> some runtime overhead. >>> >>> To drop memoryless node from fallback lists in this case, just >>> call node_states_clear_node() before calling build_all_zonelists(). >>> >>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>> Acked-by: David Hildenbrand <david@redhat.com> >> [snip] >> -- >> Best Regards, >> Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-23 1:18 ` Huang, Ying @ 2023-10-23 2:53 ` Qi Zheng 2023-10-23 3:10 ` Huang, Ying 0 siblings, 1 reply; 14+ messages in thread From: Qi Zheng @ 2023-10-23 2:53 UTC (permalink / raw) To: Huang, Ying Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, mingo, aneesh.kumar, hannes, osalvador, linux-kernel, linux-mm Hi Ying, On 2023/10/23 09:18, Huang, Ying wrote: > Qi Zheng <zhengqi.arch@bytedance.com> writes: > >> Hi Ying, >> >> On 2023/10/20 15:05, Huang, Ying wrote: >>> Qi Zheng <zhengqi.arch@bytedance.com> writes: >>> >>>> In offline_pages(), if a node becomes memoryless, we >>>> will clear its N_MEMORY state by calling node_states_clear_node(). >>>> But we do this after rebuilding the zonelists by calling >>>> build_all_zonelists(), which will cause this memoryless node to >>>> still be in the fallback list of other nodes. >>> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >>> build_all_zonelists >>> __build_all_zonelists >>> build_zonelists >>> build_zonelists_in_node_order >>> build_zonerefs_node >>> populated_zone() will be checked before adding zone into zonelist. >>> So, IIUC, we will not try to allocate from the memory less node. >> >> Normally yes, but if it is the weird topology mentioned in [1], it's >> possible to allocate memory from it, it is a memoryless node, but it >> also has memory. >> >> In addition to the above case, I think it's reasonable to remove >> memory less node from node_order[] in advance. In this way it will >> not to be traversed in build_zonelists_in_node_order(). >> >> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ > > Got it! Thank you for information. I think that it may be good to > include this in the patch description to avoid potential confusing in > the future. OK, maybe the commit message can be changed to the following: ``` In offline_pages(), if a node becomes memoryless, we will clear its N_MEMORY state by calling node_states_clear_node(). But we do this after rebuilding the zonelists by calling build_all_zonelists(), which will cause this memoryless node to still be in the fallback nodes (node_order[]) of other nodes. To drop memoryless nodes from fallback nodes in this case, just call node_states_clear_node() before calling build_all_zonelists(). In this way, we will not try to allocate pages from memoryless node0, then the panic mentioned in [1] will also be fixed. Even though this problem has been solved by dropping the NODE_MIN_SIZE constrain in x86 [2], it would be better to fix it in the core MM as well. [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ ``` Thanks, Qi > > -- > Best Regards, > Huang, Ying > >> Thanks, >> Qi >> >> >>> -- >>> Best Regards, >>> Huang, Ying >>> >>>> This will incur >>>> some runtime overhead. >>>> >>>> To drop memoryless node from fallback lists in this case, just >>>> call node_states_clear_node() before calling build_all_zonelists(). >>>> >>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>> Acked-by: David Hildenbrand <david@redhat.com> >>> [snip] >>> -- >>> Best Regards, >>> Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-23 2:53 ` Qi Zheng @ 2023-10-23 3:10 ` Huang, Ying 2023-10-23 3:17 ` Qi Zheng 0 siblings, 1 reply; 14+ messages in thread From: Huang, Ying @ 2023-10-23 3:10 UTC (permalink / raw) To: Qi Zheng Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, mingo, aneesh.kumar, hannes, osalvador, linux-kernel, linux-mm Qi Zheng <zhengqi.arch@bytedance.com> writes: > Hi Ying, > > On 2023/10/23 09:18, Huang, Ying wrote: >> Qi Zheng <zhengqi.arch@bytedance.com> writes: >> >>> Hi Ying, >>> >>> On 2023/10/20 15:05, Huang, Ying wrote: >>>> Qi Zheng <zhengqi.arch@bytedance.com> writes: >>>> >>>>> In offline_pages(), if a node becomes memoryless, we >>>>> will clear its N_MEMORY state by calling node_states_clear_node(). >>>>> But we do this after rebuilding the zonelists by calling >>>>> build_all_zonelists(), which will cause this memoryless node to >>>>> still be in the fallback list of other nodes. >>>> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >>>> build_all_zonelists >>>> __build_all_zonelists >>>> build_zonelists >>>> build_zonelists_in_node_order >>>> build_zonerefs_node >>>> populated_zone() will be checked before adding zone into zonelist. >>>> So, IIUC, we will not try to allocate from the memory less node. >>> >>> Normally yes, but if it is the weird topology mentioned in [1], it's >>> possible to allocate memory from it, it is a memoryless node, but it >>> also has memory. >>> >>> In addition to the above case, I think it's reasonable to remove >>> memory less node from node_order[] in advance. In this way it will >>> not to be traversed in build_zonelists_in_node_order(). >>> >>> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ >> Got it! Thank you for information. I think that it may be good to >> include this in the patch description to avoid potential confusing in >> the future. > > OK, maybe the commit message can be changed to the following: > > ``` > In offline_pages(), if a node becomes memoryless, we > will clear its N_MEMORY state by calling node_states_clear_node(). > But we do this after rebuilding the zonelists by calling > build_all_zonelists(), which will cause this memoryless node to > still be in the fallback nodes (node_order[]) of other nodes. > > To drop memoryless nodes from fallback nodes in this case, just > call node_states_clear_node() before calling build_all_zonelists(). > > In this way, we will not try to allocate pages from memoryless > node0, then the panic mentioned in [1] will also be fixed. Even though > this problem has been solved by dropping the NODE_MIN_SIZE constrain > in x86 [2], it would be better to fix it in the core MM as well. > > [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ > [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ > > ``` This is helpful. Thanks! -- Best Regards, Huang, Ying > Thanks, > Qi > >> -- >> Best Regards, >> Huang, Ying >> >>> Thanks, >>> Qi >>> >>> >>>> -- >>>> Best Regards, >>>> Huang, Ying >>>> >>>>> This will incur >>>>> some runtime overhead. >>>>> >>>>> To drop memoryless node from fallback lists in this case, just >>>>> call node_states_clear_node() before calling build_all_zonelists(). >>>>> >>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>>> Acked-by: David Hildenbrand <david@redhat.com> >>>> [snip] >>>> -- >>>> Best Regards, >>>> Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-23 3:10 ` Huang, Ying @ 2023-10-23 3:17 ` Qi Zheng 2023-10-23 18:19 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Qi Zheng @ 2023-10-23 3:17 UTC (permalink / raw) To: akpm, Huang, Ying Cc: rppt, david, vbabka, mhocko, willy, mgorman, mingo, aneesh.kumar, hannes, osalvador, linux-kernel, linux-mm On 2023/10/23 11:10, Huang, Ying wrote: > Qi Zheng <zhengqi.arch@bytedance.com> writes: > >> Hi Ying, >> >> On 2023/10/23 09:18, Huang, Ying wrote: >>> Qi Zheng <zhengqi.arch@bytedance.com> writes: >>> >>>> Hi Ying, >>>> >>>> On 2023/10/20 15:05, Huang, Ying wrote: >>>>> Qi Zheng <zhengqi.arch@bytedance.com> writes: >>>>> >>>>>> In offline_pages(), if a node becomes memoryless, we >>>>>> will clear its N_MEMORY state by calling node_states_clear_node(). >>>>>> But we do this after rebuilding the zonelists by calling >>>>>> build_all_zonelists(), which will cause this memoryless node to >>>>>> still be in the fallback list of other nodes. >>>>> For fallback list, do you mean pgdat->node_zonelists[]? If so, in >>>>> build_all_zonelists >>>>> __build_all_zonelists >>>>> build_zonelists >>>>> build_zonelists_in_node_order >>>>> build_zonerefs_node >>>>> populated_zone() will be checked before adding zone into zonelist. >>>>> So, IIUC, we will not try to allocate from the memory less node. >>>> >>>> Normally yes, but if it is the weird topology mentioned in [1], it's >>>> possible to allocate memory from it, it is a memoryless node, but it >>>> also has memory. >>>> >>>> In addition to the above case, I think it's reasonable to remove >>>> memory less node from node_order[] in advance. In this way it will >>>> not to be traversed in build_zonelists_in_node_order(). >>>> >>>> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ >>> Got it! Thank you for information. I think that it may be good to >>> include this in the patch description to avoid potential confusing in >>> the future. >> >> OK, maybe the commit message can be changed to the following: >> >> ``` >> In offline_pages(), if a node becomes memoryless, we >> will clear its N_MEMORY state by calling node_states_clear_node(). >> But we do this after rebuilding the zonelists by calling >> build_all_zonelists(), which will cause this memoryless node to >> still be in the fallback nodes (node_order[]) of other nodes. >> >> To drop memoryless nodes from fallback nodes in this case, just >> call node_states_clear_node() before calling build_all_zonelists(). >> >> In this way, we will not try to allocate pages from memoryless >> node0, then the panic mentioned in [1] will also be fixed. Even though >> this problem has been solved by dropping the NODE_MIN_SIZE constrain >> in x86 [2], it would be better to fix it in the core MM as well. >> >> [1]. https://lore.kernel.org/all/20230212110305.93670-1-zhengqi.arch@bytedance.com/ >> [2]. https://lore.kernel.org/all/20231017062215.171670-1-rppt@kernel.org/ >> >> ``` Hi Andrew, can you help modify the commit message to this? :) Thanks, Qi > > This is helpful. Thanks! > > -- > Best Regards, > Huang, Ying > >> Thanks, >> Qi >> >>> -- >>> Best Regards, >>> Huang, Ying >>> >>>> Thanks, >>>> Qi >>>> >>>> >>>>> -- >>>>> Best Regards, >>>>> Huang, Ying >>>>> >>>>>> This will incur >>>>>> some runtime overhead. >>>>>> >>>>>> To drop memoryless node from fallback lists in this case, just >>>>>> call node_states_clear_node() before calling build_all_zonelists(). >>>>>> >>>>>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>>>>> Acked-by: David Hildenbrand <david@redhat.com> >>>>> [snip] >>>>> -- >>>>> Best Regards, >>>>> Huang, Ying ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-23 3:17 ` Qi Zheng @ 2023-10-23 18:19 ` Andrew Morton 0 siblings, 0 replies; 14+ messages in thread From: Andrew Morton @ 2023-10-23 18:19 UTC (permalink / raw) To: Qi Zheng Cc: Huang, Ying, rppt, david, vbabka, mhocko, willy, mgorman, mingo, aneesh.kumar, hannes, osalvador, linux-kernel, linux-mm On Mon, 23 Oct 2023 11:17:03 +0800 Qi Zheng <zhengqi.arch@bytedance.com> wrote: > Hi Andrew, can you help modify the commit message to this? :) Done, thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-19 10:43 ` [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists Qi Zheng 2023-10-20 7:05 ` Huang, Ying @ 2023-10-20 8:32 ` Ingo Molnar 2023-10-20 9:10 ` Qi Zheng 1 sibling, 1 reply; 14+ messages in thread From: Ingo Molnar @ 2023-10-20 8:32 UTC (permalink / raw) To: Qi Zheng Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, aneesh.kumar, ying.huang, hannes, osalvador, linux-kernel, linux-mm * Qi Zheng <zhengqi.arch@bytedance.com> wrote: > In offline_pages(), if a node becomes memoryless, we > will clear its N_MEMORY state by calling node_states_clear_node(). > But we do this after rebuilding the zonelists by calling > build_all_zonelists(), which will cause this memoryless node to > still be in the fallback list of other nodes. This will incur > some runtime overhead. > > To drop memoryless node from fallback lists in this case, just > call node_states_clear_node() before calling build_all_zonelists(). s/memoryless node /memoryless nodes > > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > Acked-by: David Hildenbrand <david@redhat.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index d4a364fdaf8f..f019f7d6272c 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -2036,12 +2036,16 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, > /* reinitialise watermarks and update pcp limits */ > init_per_zone_wmark_min(); > > + /* > + * Make sure to mark the node as memory-less before rebuilding the zone > + * list. Otherwise this node would still appear in the fallback lists. > + */ > + node_states_clear_node(node, &arg); Acked-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists 2023-10-20 8:32 ` Ingo Molnar @ 2023-10-20 9:10 ` Qi Zheng 0 siblings, 0 replies; 14+ messages in thread From: Qi Zheng @ 2023-10-20 9:10 UTC (permalink / raw) To: Ingo Molnar Cc: akpm, rppt, david, vbabka, mhocko, willy, mgorman, aneesh.kumar, ying.huang, hannes, osalvador, linux-kernel, linux-mm Hi Ingo, On 2023/10/20 16:32, Ingo Molnar wrote: > > * Qi Zheng <zhengqi.arch@bytedance.com> wrote: > >> In offline_pages(), if a node becomes memoryless, we >> will clear its N_MEMORY state by calling node_states_clear_node(). >> But we do this after rebuilding the zonelists by calling >> build_all_zonelists(), which will cause this memoryless node to >> still be in the fallback list of other nodes. This will incur >> some runtime overhead. >> >> To drop memoryless node from fallback lists in this case, just >> call node_states_clear_node() before calling build_all_zonelists(). > > s/memoryless node > /memoryless nodes Will do. > >> >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> Acked-by: David Hildenbrand <david@redhat.com> >> --- >> mm/memory_hotplug.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index d4a364fdaf8f..f019f7d6272c 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -2036,12 +2036,16 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, >> /* reinitialise watermarks and update pcp limits */ >> init_per_zone_wmark_min(); >> >> + /* >> + * Make sure to mark the node as memory-less before rebuilding the zone >> + * list. Otherwise this node would still appear in the fallback lists. >> + */ >> + node_states_clear_node(node, &arg); > > Acked-by: Ingo Molnar <mingo@kernel.org> Thanks. > > Thanks, > > Ingo ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-10-23 18:19 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-19 10:43 [PATCH v3 0/2] handle memoryless nodes more appropriately Qi Zheng 2023-10-19 10:43 ` [PATCH v3 1/2] mm: page_alloc: skip memoryless nodes entirely Qi Zheng 2023-10-20 8:31 ` Ingo Molnar 2023-10-20 9:09 ` Qi Zheng 2023-10-19 10:43 ` [PATCH v3 2/2] mm: memory_hotplug: drop memoryless node from fallback lists Qi Zheng 2023-10-20 7:05 ` Huang, Ying 2023-10-20 7:35 ` Qi Zheng 2023-10-23 1:18 ` Huang, Ying 2023-10-23 2:53 ` Qi Zheng 2023-10-23 3:10 ` Huang, Ying 2023-10-23 3:17 ` Qi Zheng 2023-10-23 18:19 ` Andrew Morton 2023-10-20 8:32 ` Ingo Molnar 2023-10-20 9:10 ` Qi Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).