* [PATCH 1/2] mm: update the memmap stat before page is freed
@ 2024-08-06 22:14 Pasha Tatashin
2024-08-06 22:14 ` [PATCH 2/2] mm: keep nid around during hot-remove Pasha Tatashin
2024-08-07 11:24 ` [PATCH 1/2] mm: update the memmap stat before page is freed David Hildenbrand
0 siblings, 2 replies; 7+ messages in thread
From: Pasha Tatashin @ 2024-08-06 22:14 UTC (permalink / raw)
To: agordeev, akpm, alexghiti, aou, ardb, arnd, bhe, bjorn,
borntraeger, bp, catalin.marinas, chenhuacai, chenjiahao16,
christophe.leroy, dave.hansen, david, dawei.li, gerald.schaefer,
gor, hca, hpa, kent.overstreet, kernel, linux-arm-kernel,
linux-kernel, linux-mm, linuxppc-dev, linux-riscv, linux-s390,
loongarch, luto, maobibo, mark.rutland, mcgrof, mingo, mpe,
muchun.song, namcao, naveen, npiggin, osalvador, palmer,
pasha.tatashin, paul.walmsley, peterz, philmd, rdunlap, rientjes,
rppt, ryan.roberts, souravpanda, svens, tglx, tzimmermann, will,
x86
It is more logical to update the stat before the page is freed, to avoid
use after free scenarios.
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
mm/hugetlb_vmemmap.c | 4 ++--
mm/page_ext.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 829112b0a914..fa83a7b38199 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -185,11 +185,11 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end,
static inline void free_vmemmap_page(struct page *page)
{
if (PageReserved(page)) {
- free_bootmem_page(page);
mod_node_page_state(page_pgdat(page), NR_MEMMAP_BOOT, -1);
+ free_bootmem_page(page);
} else {
- __free_page(page);
mod_node_page_state(page_pgdat(page), NR_MEMMAP, -1);
+ __free_page(page);
}
}
diff --git a/mm/page_ext.c b/mm/page_ext.c
index c191e490c401..962d45eee1f8 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -330,18 +330,18 @@ static void free_page_ext(void *addr)
if (is_vmalloc_addr(addr)) {
page = vmalloc_to_page(addr);
pgdat = page_pgdat(page);
+ mod_node_page_state(pgdat, NR_MEMMAP,
+ -1L * (DIV_ROUND_UP(table_size, PAGE_SIZE)));
vfree(addr);
} else {
page = virt_to_page(addr);
pgdat = page_pgdat(page);
+ mod_node_page_state(pgdat, NR_MEMMAP,
+ -1L * (DIV_ROUND_UP(table_size, PAGE_SIZE)));
BUG_ON(PageReserved(page));
kmemleak_free(addr);
free_pages_exact(addr, table_size);
}
-
- mod_node_page_state(pgdat, NR_MEMMAP,
- -1L * (DIV_ROUND_UP(table_size, PAGE_SIZE)));
-
}
static void __free_page_ext(unsigned long pfn)
--
2.46.0.76.ge559c4bf1a-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] mm: keep nid around during hot-remove
2024-08-06 22:14 [PATCH 1/2] mm: update the memmap stat before page is freed Pasha Tatashin
@ 2024-08-06 22:14 ` Pasha Tatashin
2024-08-07 11:32 ` David Hildenbrand
2024-08-07 11:24 ` [PATCH 1/2] mm: update the memmap stat before page is freed David Hildenbrand
1 sibling, 1 reply; 7+ messages in thread
From: Pasha Tatashin @ 2024-08-06 22:14 UTC (permalink / raw)
To: agordeev, akpm, alexghiti, aou, ardb, arnd, bhe, bjorn,
borntraeger, bp, catalin.marinas, chenhuacai, chenjiahao16,
christophe.leroy, dave.hansen, david, dawei.li, gerald.schaefer,
gor, hca, hpa, kent.overstreet, kernel, linux-arm-kernel,
linux-kernel, linux-mm, linuxppc-dev, linux-riscv, linux-s390,
loongarch, luto, maobibo, mark.rutland, mcgrof, mingo, mpe,
muchun.song, namcao, naveen, npiggin, osalvador, palmer,
pasha.tatashin, paul.walmsley, peterz, philmd, rdunlap, rientjes,
rppt, ryan.roberts, souravpanda, svens, tglx, tzimmermann, will,
x86
nid is needed during memory hot-remove in order to account the
information about the memmap overhead that is being removed.
In addition, we cannot use page_pgdat(pfn_to_page(pfn)) during
hotremove after remove_pfn_range_from_zone().
We also cannot determine nid from walking through memblocks after
remove_memory_block_devices() is called.
Therefore, pass nid down from the beginning of hotremove to where
it is used for the accounting purposes.
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.com
Reported-by: Alison Schofield <alison.schofield@intel.com>
Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#t
Fixes: 15995a352474 ("mm: report per-page metadata information")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
arch/arm64/mm/mmu.c | 5 +++--
arch/loongarch/mm/init.c | 5 +++--
arch/powerpc/mm/mem.c | 5 +++--
arch/riscv/mm/init.c | 5 +++--
arch/s390/mm/init.c | 5 +++--
arch/x86/mm/init_64.c | 5 +++--
include/linux/memory_hotplug.h | 7 ++++---
mm/memory_hotplug.c | 18 +++++++++---------
mm/memremap.c | 6 ++++--
mm/sparse-vmemmap.c | 14 ++++++++------
mm/sparse.c | 20 +++++++++++---------
11 files changed, 54 insertions(+), 41 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 353ea5dc32b8..cd0808d05551 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1363,12 +1363,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
return ret;
}
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ int nid)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, nid);
__remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size);
}
diff --git a/arch/loongarch/mm/init.c b/arch/loongarch/mm/init.c
index bf789d114c2d..64cfbfb75c15 100644
--- a/arch/loongarch/mm/init.c
+++ b/arch/loongarch/mm/init.c
@@ -106,7 +106,8 @@ int arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params)
return ret;
}
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ int nid)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -115,7 +116,7 @@ void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
/* With altmap the first mapped page is offset from @start */
if (altmap)
page += vmem_altmap_offset(altmap);
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, nid);
}
#ifdef CONFIG_NUMA
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index d325217ab201..74c0213f995a 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -157,12 +157,13 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return rc;
}
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ int nid)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, nid);
arch_remove_linear_mapping(start, size);
}
#endif
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 8b698d9609e7..bf1be25cc513 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -1796,9 +1796,10 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *param
return ret;
}
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ int nid)
{
- __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap);
+ __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap, nid);
remove_linear_mapping(start, size);
flush_tlb_all();
}
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index e3d258f9e726..bf596d87543a 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -290,12 +290,13 @@ int arch_add_memory(int nid, u64 start, u64 size,
return rc;
}
-void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ int nid)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, nid);
vmem_remove_mapping(start, size);
}
#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index d8dbeac8b206..5bb82fbb7c2c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1262,12 +1262,13 @@ kernel_physical_mapping_remove(unsigned long start, unsigned long end)
remove_pagetable(start, end, true, NULL);
}
-void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap)
+void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ int nid)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
- __remove_pages(start_pfn, nr_pages, altmap);
+ __remove_pages(start_pfn, nr_pages, altmap, nid);
kernel_physical_mapping_remove(start, start + size);
}
#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ebe876930e78..47c9af202884 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -201,9 +201,10 @@ static inline bool movable_node_is_enabled(void)
return movable_node_enabled;
}
-extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap);
+extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap,
+ int nid);
extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap);
+ struct vmem_altmap *altmap, int nid);
/* reasonably generic interface to expand the physical pages */
extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
@@ -369,7 +370,7 @@ extern int sparse_add_section(int nid, unsigned long pfn,
unsigned long nr_pages, struct vmem_altmap *altmap,
struct dev_pagemap *pgmap);
extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap);
+ struct vmem_altmap *altmap, int nid);
extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
unsigned long pnum);
extern struct zone *zone_for_pfn_range(int online_type, int nid,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 66267c26ca1b..c66148049fa6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -571,7 +571,7 @@ void __ref remove_pfn_range_from_zone(struct zone *zone,
* calling offline_pages().
*/
void __remove_pages(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, int nid)
{
const unsigned long end_pfn = pfn + nr_pages;
unsigned long cur_nr_pages;
@@ -586,7 +586,7 @@ void __remove_pages(unsigned long pfn, unsigned long nr_pages,
/* Select all remaining pages up to the next section boundary */
cur_nr_pages = min(end_pfn - pfn,
SECTION_ALIGN_UP(pfn + 1) - pfn);
- sparse_remove_section(pfn, cur_nr_pages, altmap);
+ sparse_remove_section(pfn, cur_nr_pages, altmap, nid);
}
}
@@ -1386,7 +1386,7 @@ bool mhp_supports_memmap_on_memory(void)
}
EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory);
-static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size)
+static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size, int nid)
{
unsigned long memblock_size = memory_block_size_bytes();
u64 cur_start;
@@ -1409,7 +1409,7 @@ static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size)
remove_memory_block_devices(cur_start, memblock_size);
- arch_remove_memory(cur_start, memblock_size, altmap);
+ arch_remove_memory(cur_start, memblock_size, altmap, nid);
/* Verify that all vmemmap pages have actually been freed. */
WARN(altmap->alloc, "Altmap not fully unmapped");
@@ -1454,7 +1454,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
ret = create_memory_block_devices(cur_start, memblock_size,
params.altmap, group);
if (ret) {
- arch_remove_memory(cur_start, memblock_size, NULL);
+ arch_remove_memory(cur_start, memblock_size, NULL, nid);
kfree(params.altmap);
goto out;
}
@@ -1463,7 +1463,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
return 0;
out:
if (ret && cur_start != start)
- remove_memory_blocks_and_altmaps(start, cur_start - start);
+ remove_memory_blocks_and_altmaps(start, cur_start - start, nid);
return ret;
}
@@ -1532,7 +1532,7 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
/* create memory block devices after memory was added */
ret = create_memory_block_devices(start, size, NULL, group);
if (ret) {
- arch_remove_memory(start, size, params.altmap);
+ arch_remove_memory(start, size, params.altmap, nid);
goto error;
}
}
@@ -2275,10 +2275,10 @@ static int __ref try_remove_memory(u64 start, u64 size)
* No altmaps present, do the removal directly
*/
remove_memory_block_devices(start, size);
- arch_remove_memory(start, size, NULL);
+ arch_remove_memory(start, size, NULL, nid);
} else {
/* all memblocks in the range have altmaps */
- remove_memory_blocks_and_altmaps(start, size);
+ remove_memory_blocks_and_altmaps(start, size, nid);
}
if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
diff --git a/mm/memremap.c b/mm/memremap.c
index 40d4547ce514..08e72959eb48 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -112,9 +112,11 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
{
struct range *range = &pgmap->ranges[range_id];
struct page *first_page;
+ int nid;
/* make sure to access a memmap that was actually initialized */
first_page = pfn_to_page(pfn_first(pgmap, range_id));
+ nid = page_to_nid(first_page);
/* pages are dead and unused, undo the arch mapping */
mem_hotplug_begin();
@@ -122,10 +124,10 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
PHYS_PFN(range_len(range)));
if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
__remove_pages(PHYS_PFN(range->start),
- PHYS_PFN(range_len(range)), NULL);
+ PHYS_PFN(range_len(range)), NULL, nid);
} else {
arch_remove_memory(range->start, range_len(range),
- pgmap_altmap(pgmap));
+ pgmap_altmap(pgmap), nid);
kasan_remove_zero_shadow(__va(range->start), range_len(range));
}
mem_hotplug_done();
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 1dda6c53370b..0dafad626ab8 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -469,12 +469,14 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn,
if (r < 0)
return NULL;
- if (system_state == SYSTEM_BOOTING) {
- mod_node_early_perpage_metadata(nid, DIV_ROUND_UP(end - start,
- PAGE_SIZE));
- } else {
- mod_node_page_state(NODE_DATA(nid), NR_MEMMAP,
- DIV_ROUND_UP(end - start, PAGE_SIZE));
+ if (nid != NUMA_NO_NODE) {
+ if (system_state == SYSTEM_BOOTING) {
+ mod_node_early_perpage_metadata(nid, DIV_ROUND_UP(end - start,
+ PAGE_SIZE));
+ } else {
+ mod_node_page_state(NODE_DATA(nid), NR_MEMMAP,
+ DIV_ROUND_UP(end - start, PAGE_SIZE));
+ }
}
return pfn_to_page(pfn);
diff --git a/mm/sparse.c b/mm/sparse.c
index e4b830091d13..fc01bc5f0f1d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -638,13 +638,15 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
}
static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, int nid)
{
unsigned long start = (unsigned long) pfn_to_page(pfn);
unsigned long end = start + nr_pages * sizeof(struct page);
- mod_node_page_state(page_pgdat(pfn_to_page(pfn)), NR_MEMMAP,
- -1L * (DIV_ROUND_UP(end - start, PAGE_SIZE)));
+ if (nid != NUMA_NO_NODE) {
+ mod_node_page_state(NODE_DATA(nid), NR_MEMMAP,
+ -1L * (DIV_ROUND_UP(end - start, PAGE_SIZE)));
+ }
vmemmap_free(start, end, altmap);
}
static void free_map_bootmem(struct page *memmap)
@@ -713,7 +715,7 @@ static struct page * __meminit populate_section_memmap(unsigned long pfn,
}
static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, int nid)
{
kvfree(pfn_to_page(pfn));
}
@@ -781,7 +783,7 @@ static int fill_subsection_map(unsigned long pfn, unsigned long nr_pages)
* For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified
*/
static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, int nid)
{
struct mem_section *ms = __pfn_to_section(pfn);
bool section_is_early = early_section(ms);
@@ -821,7 +823,7 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
* section_activate() and pfn_valid() .
*/
if (!section_is_early)
- depopulate_section_memmap(pfn, nr_pages, altmap);
+ depopulate_section_memmap(pfn, nr_pages, altmap, nid);
else if (memmap)
free_map_bootmem(memmap);
@@ -865,7 +867,7 @@ static struct page * __meminit section_activate(int nid, unsigned long pfn,
memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap);
if (!memmap) {
- section_deactivate(pfn, nr_pages, altmap);
+ section_deactivate(pfn, nr_pages, altmap, nid);
return ERR_PTR(-ENOMEM);
}
@@ -928,13 +930,13 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn,
}
void sparse_remove_section(unsigned long pfn, unsigned long nr_pages,
- struct vmem_altmap *altmap)
+ struct vmem_altmap *altmap, int nid)
{
struct mem_section *ms = __pfn_to_section(pfn);
if (WARN_ON_ONCE(!valid_section(ms)))
return;
- section_deactivate(pfn, nr_pages, altmap);
+ section_deactivate(pfn, nr_pages, altmap, nid);
}
#endif /* CONFIG_MEMORY_HOTPLUG */
--
2.46.0.76.ge559c4bf1a-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] mm: update the memmap stat before page is freed
2024-08-06 22:14 [PATCH 1/2] mm: update the memmap stat before page is freed Pasha Tatashin
2024-08-06 22:14 ` [PATCH 2/2] mm: keep nid around during hot-remove Pasha Tatashin
@ 2024-08-07 11:24 ` David Hildenbrand
1 sibling, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2024-08-07 11:24 UTC (permalink / raw)
To: Pasha Tatashin, agordeev, akpm, alexghiti, aou, ardb, arnd, bhe,
bjorn, borntraeger, bp, catalin.marinas, chenhuacai, chenjiahao16,
christophe.leroy, dave.hansen, dawei.li, gerald.schaefer, gor,
hca, hpa, kent.overstreet, kernel, linux-arm-kernel, linux-kernel,
linux-mm, linuxppc-dev, linux-riscv, linux-s390, loongarch, luto,
maobibo, mark.rutland, mcgrof, mingo, mpe, muchun.song, namcao,
naveen, npiggin, osalvador, palmer, paul.walmsley, peterz, philmd,
rdunlap, rientjes, rppt, ryan.roberts, souravpanda, svens, tglx,
tzimmermann, will, x86
On 07.08.24 00:14, Pasha Tatashin wrote:
> It is more logical to update the stat before the page is freed, to avoid
> use after free scenarios.
>
> Fixes: 15995a352474 ("mm: report per-page metadata information")
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] mm: keep nid around during hot-remove
2024-08-06 22:14 ` [PATCH 2/2] mm: keep nid around during hot-remove Pasha Tatashin
@ 2024-08-07 11:32 ` David Hildenbrand
2024-08-07 11:50 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2024-08-07 11:32 UTC (permalink / raw)
To: Pasha Tatashin, agordeev, akpm, alexghiti, aou, ardb, arnd, bhe,
bjorn, borntraeger, bp, catalin.marinas, chenhuacai, chenjiahao16,
christophe.leroy, dave.hansen, dawei.li, gerald.schaefer, gor,
hca, hpa, kent.overstreet, kernel, linux-arm-kernel, linux-kernel,
linux-mm, linuxppc-dev, linux-riscv, linux-s390, loongarch, luto,
maobibo, mark.rutland, mcgrof, mingo, mpe, muchun.song, namcao,
naveen, npiggin, osalvador, palmer, paul.walmsley, peterz, philmd,
rdunlap, rientjes, rppt, ryan.roberts, souravpanda, svens, tglx,
tzimmermann, will, x86
On 07.08.24 00:14, Pasha Tatashin wrote:
> nid is needed during memory hot-remove in order to account the
> information about the memmap overhead that is being removed.
>
> In addition, we cannot use page_pgdat(pfn_to_page(pfn)) during
> hotremove after remove_pfn_range_from_zone().
>
> We also cannot determine nid from walking through memblocks after
> remove_memory_block_devices() is called.
>
> Therefore, pass nid down from the beginning of hotremove to where
> it is used for the accounting purposes.
I was happy to finally remove that nid parameter for good in:
commit 65a2aa5f482ed0c1b5afb9e6b0b9e0b16bb8b616
Author: David Hildenbrand <david@redhat.com>
Date: Tue Sep 7 19:55:04 2021 -0700
mm/memory_hotplug: remove nid parameter from arch_remove_memory()
To ask the real question: Do we really need this counter per-nid at all?
Seems to over-complicate things.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] mm: keep nid around during hot-remove
2024-08-07 11:32 ` David Hildenbrand
@ 2024-08-07 11:50 ` David Hildenbrand
2024-08-07 14:40 ` Pasha Tatashin
0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2024-08-07 11:50 UTC (permalink / raw)
To: Pasha Tatashin, agordeev, akpm, alexghiti, aou, ardb, arnd, bhe,
bjorn, borntraeger, bp, catalin.marinas, chenhuacai, chenjiahao16,
christophe.leroy, dave.hansen, dawei.li, gerald.schaefer, gor,
hca, hpa, kent.overstreet, kernel, linux-arm-kernel, linux-kernel,
linux-mm, linuxppc-dev, linux-riscv, linux-s390, loongarch, luto,
maobibo, mark.rutland, mcgrof, mingo, mpe, muchun.song, namcao,
naveen, npiggin, osalvador, palmer, paul.walmsley, peterz, philmd,
rdunlap, rientjes, rppt, ryan.roberts, souravpanda, svens, tglx,
tzimmermann, will, x86
On 07.08.24 13:32, David Hildenbrand wrote:
> On 07.08.24 00:14, Pasha Tatashin wrote:
>> nid is needed during memory hot-remove in order to account the
>> information about the memmap overhead that is being removed.
>>
>> In addition, we cannot use page_pgdat(pfn_to_page(pfn)) during
>> hotremove after remove_pfn_range_from_zone().
>>
>> We also cannot determine nid from walking through memblocks after
>> remove_memory_block_devices() is called.
>>
>> Therefore, pass nid down from the beginning of hotremove to where
>> it is used for the accounting purposes.
>
> I was happy to finally remove that nid parameter for good in:
>
> commit 65a2aa5f482ed0c1b5afb9e6b0b9e0b16bb8b616
> Author: David Hildenbrand <david@redhat.com>
> Date: Tue Sep 7 19:55:04 2021 -0700
>
> mm/memory_hotplug: remove nid parameter from arch_remove_memory()
>
> To ask the real question: Do we really need this counter per-nid at all?
>
> Seems to over-complicate things.
Case in point: I think the handling is wrong?
Just because some memory belongs to a nid doesn't mean that the vmemmap
was allocated from that nid?
Wouldn't we want to look at the actual nid the vmemmap page belongs to
that we are removing?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] mm: keep nid around during hot-remove
2024-08-07 11:50 ` David Hildenbrand
@ 2024-08-07 14:40 ` Pasha Tatashin
2024-08-07 15:23 ` David Hildenbrand
0 siblings, 1 reply; 7+ messages in thread
From: Pasha Tatashin @ 2024-08-07 14:40 UTC (permalink / raw)
To: David Hildenbrand
Cc: mark.rutland, muchun.song, luto, peterz, catalin.marinas,
dave.hansen, bjorn, linux-mm, souravpanda, rdunlap, hpa, kernel,
will, agordeev, namcao, linux-s390, arnd, bhe, chenhuacai,
christophe.leroy, ardb, mingo, rientjes, gerald.schaefer,
borntraeger, aou, ryan.roberts, alexghiti, gor, hca, dawei.li,
naveen, maobibo, chenjiahao16, bp, npiggin, loongarch,
paul.walmsley, tglx, linux-arm-kernel, osalvador, x86, philmd,
kent.overstreet, linux-kernel, mcgrof, linux-riscv, palmer, svens,
tzimmermann, akpm, linuxppc-dev, rppt
On Wed, Aug 7, 2024 at 7:50 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 07.08.24 13:32, David Hildenbrand wrote:
> > On 07.08.24 00:14, Pasha Tatashin wrote:
> >> nid is needed during memory hot-remove in order to account the
> >> information about the memmap overhead that is being removed.
> >>
> >> In addition, we cannot use page_pgdat(pfn_to_page(pfn)) during
> >> hotremove after remove_pfn_range_from_zone().
> >>
> >> We also cannot determine nid from walking through memblocks after
> >> remove_memory_block_devices() is called.
> >>
> >> Therefore, pass nid down from the beginning of hotremove to where
> >> it is used for the accounting purposes.
> >
> > I was happy to finally remove that nid parameter for good in:
> >
> > commit 65a2aa5f482ed0c1b5afb9e6b0b9e0b16bb8b616
> > Author: David Hildenbrand <david@redhat.com>
> > Date: Tue Sep 7 19:55:04 2021 -0700
> >
> > mm/memory_hotplug: remove nid parameter from arch_remove_memory()
> >
> > To ask the real question: Do we really need this counter per-nid at all?
> >
> > Seems to over-complicate things.
>
> Case in point: I think the handling is wrong?
>
> Just because some memory belongs to a nid doesn't mean that the vmemmap
> was allocated from that nid?
I believe when we hot-add we use nid for the memory that is being
added to account vmemmap, and when we do hot-remove we also use nid of
the memory that is being removed. But, you are correct, this does not
guarantee that the actual vmemmap memory is being allocated or removed
from the given nid.
> Wouldn't we want to look at the actual nid the vmemmap page belongs to
> that we are removing?
I am now looking into converting this counter to be system wide, i.e.
vm_event, it is all done under hotplug lock, so there is no
contention.
Pasha
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] mm: keep nid around during hot-remove
2024-08-07 14:40 ` Pasha Tatashin
@ 2024-08-07 15:23 ` David Hildenbrand
0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2024-08-07 15:23 UTC (permalink / raw)
To: Pasha Tatashin
Cc: mark.rutland, muchun.song, luto, peterz, catalin.marinas,
dave.hansen, bjorn, linux-mm, souravpanda, rdunlap, hpa, kernel,
will, agordeev, namcao, linux-s390, arnd, bhe, chenhuacai,
christophe.leroy, ardb, mingo, rientjes, gerald.schaefer,
borntraeger, aou, ryan.roberts, alexghiti, gor, hca, dawei.li,
naveen, maobibo, chenjiahao16, bp, npiggin, loongarch,
paul.walmsley, tglx, linux-arm-kernel, osalvador, x86, philmd,
kent.overstreet, linux-kernel, mcgrof, linux-riscv, palmer, svens,
tzimmermann, akpm, linuxppc-dev, rppt
On 07.08.24 16:40, Pasha Tatashin wrote:
> On Wed, Aug 7, 2024 at 7:50 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 07.08.24 13:32, David Hildenbrand wrote:
>>> On 07.08.24 00:14, Pasha Tatashin wrote:
>>>> nid is needed during memory hot-remove in order to account the
>>>> information about the memmap overhead that is being removed.
>>>>
>>>> In addition, we cannot use page_pgdat(pfn_to_page(pfn)) during
>>>> hotremove after remove_pfn_range_from_zone().
>>>>
>>>> We also cannot determine nid from walking through memblocks after
>>>> remove_memory_block_devices() is called.
>>>>
>>>> Therefore, pass nid down from the beginning of hotremove to where
>>>> it is used for the accounting purposes.
>>>
>>> I was happy to finally remove that nid parameter for good in:
>>>
>>> commit 65a2aa5f482ed0c1b5afb9e6b0b9e0b16bb8b616
>>> Author: David Hildenbrand <david@redhat.com>
>>> Date: Tue Sep 7 19:55:04 2021 -0700
>>>
>>> mm/memory_hotplug: remove nid parameter from arch_remove_memory()
>>>
>>> To ask the real question: Do we really need this counter per-nid at all?
>>>
>>> Seems to over-complicate things.
>>
>> Case in point: I think the handling is wrong?
>>
>> Just because some memory belongs to a nid doesn't mean that the vmemmap
>> was allocated from that nid?
>
> I believe when we hot-add we use nid for the memory that is being
> added to account vmemmap, and when we do hot-remove we also use nid of
> the memory that is being removed. But, you are correct, this does not
> guarantee that the actual vmemmap memory is being allocated or removed
> from the given nid.
Right. For boot memory that we might want to unplug later it might be
different. I recall that with "movable_node", we might end up allocating
the vmemmap from remote nodes, such that all memory of a node stays
movable. That's why __earlyonly_bootmem_alloc() ends up calling
memblock_alloc_try_nid_raw(), to fallback to other nodes if required.
>
>> Wouldn't we want to look at the actual nid the vmemmap page belongs to
>> that we are removing?
>
> I am now looking into converting this counter to be system wide, i.e.
> vm_event, it is all done under hotplug lock, so there is no
> contention.
That would be easiest, assuming per-node information is not strictly
required for now.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-08-07 15:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-06 22:14 [PATCH 1/2] mm: update the memmap stat before page is freed Pasha Tatashin
2024-08-06 22:14 ` [PATCH 2/2] mm: keep nid around during hot-remove Pasha Tatashin
2024-08-07 11:32 ` David Hildenbrand
2024-08-07 11:50 ` David Hildenbrand
2024-08-07 14:40 ` Pasha Tatashin
2024-08-07 15:23 ` David Hildenbrand
2024-08-07 11:24 ` [PATCH 1/2] mm: update the memmap stat before page is freed David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).