* Re: [PATCH v4 24/26] arch_numa: switch over to numa_memblks [not found] <MW4PR12MB72616723E1A090E315681FF6A38B2@MW4PR12MB7261.namprd12.prod.outlook.com> @ 2024-08-27 8:52 ` Mike Rapoport 0 siblings, 0 replies; 7+ messages in thread From: Mike Rapoport @ 2024-08-27 8:52 UTC (permalink / raw) To: Bruno Faccini Cc: linux-kernel@vger.kernel.org, Alexander Gordeev, Andreas Larsson, Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas, Christophe Leroy, Dan Williams, Dave Hansen, David Hildenbrand, David S. Miller, Davidlohr Bueso, Greg Kroah-Hartman, Heiko Carstens, Huacai Chen, Ingo Molnar, Jiaxun Yang, John Paul Adrian Glaubitz, Jonathan Cameron, Jonathan Corbet, Michael Ellerman, Palmer Dabbelt, Rafael J. Wysocki, Rob Herring, Samuel Holland, Thomas Bogendoerfer, Thomas Gleixner, Vasily Gorbik, Will Deacon, devicetree@vger.kernel.org, linux-acpi@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-cxl@vger.kernel.org, linux-doc@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, nvdimm@lists.linux.dev, sparclinux@vger.kernel.org, x86@kernel.org, Zi Yan Hi, On Mon, Aug 26, 2024 at 06:17:22PM +0000, Bruno Faccini wrote: > > On 7 Aug 2024, at 2:41, Mike Rapoport wrote: > > > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > > > Until now arch_numa was directly translating firmware NUMA information > > to memblock. > > > > Using numa_memblks as an intermediate step has a few advantages: > > * alignment with more battle tested x86 implementation > > * availability of NUMA emulation > > * maintaining node information for not yet populated memory > > > > Adjust a few places in numa_memblks to compile with 32-bit phys_addr_t > > and replace current functionality related to numa_add_memblk() and > > __node_distance() in arch_numa with the implementation based on > > numa_memblks and add functions required by numa_emulation. > > > > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> > > Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via > > QEMU] > > Acked-by: Dan Williams <dan.j.williams@intel.com> > > Acked-by: David Hildenbrand <david@redhat.com> > > --- > > drivers/base/Kconfig | 1 + > > drivers/base/arch_numa.c | 201 +++++++++++-------------------------- > > include/asm-generic/numa.h | 6 +- > > mm/numa_memblks.c | 17 ++-- > > 4 files changed, 75 insertions(+), 150 deletions(-) > > > > <snip> > > > > + > > +u64 __init numa_emu_dma_end(void) > > +{ > > + return PFN_PHYS(memblock_start_of_DRAM() + SZ_4G); > > +} > > + > > PFN_PHYS() translation is unnecessary here, as > memblock_start_of_DRAM() + SZ_4G is already a > memory size. > > This should fix it: > > diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c > index 8d49893c0e94..e18701676426 100644 > --- a/drivers/base/arch_numa.c > +++ b/drivers/base/arch_numa.c > @@ -346,7 +346,7 @@ void __init numa_emu_update_cpu_to_node(int > *emu_nid_to_phys, > > u64 __init numa_emu_dma_end(void) > { > - return PFN_PHYS(memblock_start_of_DRAM() + SZ_4G); > + return memblock_start_of_DRAM() + SZ_4G; > } > > void debug_cpumask_set_cpu(unsigned int cpu, int node, bool enable) Right, I've missed that. Thanks for the fix! Andrew, can you please apply this (with fixed formatting) diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c index 8d49893c0e94..e18701676426 100644 --- a/drivers/base/arch_numa.c +++ b/drivers/base/arch_numa.c @@ -346,7 +346,7 @@ void __init numa_emu_update_cpu_to_node(int *emu_nid_to_phys, u64 __init numa_emu_dma_end(void) { - return PFN_PHYS(memblock_start_of_DRAM() + SZ_4G); + return memblock_start_of_DRAM() + SZ_4G; } void debug_cpumask_set_cpu(unsigned int cpu, int node, bool enable) -- Sincerely yours, Mike. ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 24/26] arch_numa: switch over to numa_memblks @ 2024-08-26 22:46 Bruno Faccini 0 siblings, 0 replies; 7+ messages in thread From: Bruno Faccini @ 2024-08-26 22:46 UTC (permalink / raw) To: Mike Rapoport Cc: linux-kernel@vger.kernel.org, Alexander Gordeev, Andreas Larsson, Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas, Christophe Leroy, Dan Williams, Dave Hansen, David Hildenbrand, David S. Miller, Davidlohr Bueso, Greg Kroah-Hartman, Heiko Carstens, Huacai Chen, Ingo Molnar, Jiaxun Yang, John Paul Adrian Glaubitz, Jonathan Cameron, Jonathan Corbet, Michael Ellerman, Palmer Dabbelt, Rafael J. Wysocki, Rob Herring, Samuel Holland, Thomas Bogendoerfer, Thomas Gleixner, Vasily Gorbik, Will Deacon, devicetree@vger.kernel.org, linux-acpi@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-cxl@vger.kernel.org, linux-doc@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, nvdimm@lists.linux.dev, sparclinux@vger.kernel.org, x86@kernel.org, Zi Yan, Bruno Faccini On 7 Aug 2024, at 2:41, Mike Rapoport wrote: From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Until now arch_numa was directly translating firmware NUMA information to memblock. Using numa_memblks as an intermediate step has a few advantages: * alignment with more battle tested x86 implementation * availability of NUMA emulation * maintaining node information for not yet populated memory Adjust a few places in numa_memblks to compile with 32-bit phys_addr_t and replace current functionality related to numa_add_memblk() and __node_distance() in arch_numa with the implementation based on numa_memblks and add functions required by numa_emulation. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Hildenbrand <david@redhat.com> --- drivers/base/Kconfig | 1 + drivers/base/arch_numa.c | 201 +++++++++++-------------------------- include/asm-generic/numa.h | 6 +- mm/numa_memblks.c | 17 ++-- 4 files changed, 75 insertions(+), 150 deletions(-) <snip> + +u64 __init numa_emu_dma_end(void) +{ + return PFN_PHYS(memblock_start_of_DRAM() + SZ_4G); +} + PFN_PHYS() translation is unnecessary here, as memblock_start_of_DRAM() + SZ_4G is already a memory size. This should fix it: ==================================================== diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c index 8d49893c0e94..e18701676426 100644 --- a/drivers/base/arch_numa.c +++ b/drivers/base/arch_numa.c @@ -346,7 +346,7 @@ void __init numa_emu_update_cpu_to_node(int *emu_nid_to_phys, u64 __init numa_emu_dma_end(void) { - return PFN_PHYS(memblock_start_of_DRAM() + SZ_4G); + return memblock_start_of_DRAM() + SZ_4G; } void debug_cpumask_set_cpu(unsigned int cpu, int node, bool enable) ==================================================== !!! I had a lot of trouble to send in plain text from Outlook on my Mac, sorry for the noise and the duplicate copies !!! ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v4 00/26] mm: introduce numa_memblks @ 2024-08-07 6:40 Mike Rapoport 2024-08-07 6:41 ` [PATCH v4 24/26] arch_numa: switch over to numa_memblks Mike Rapoport 0 siblings, 1 reply; 7+ messages in thread From: Mike Rapoport @ 2024-08-07 6:40 UTC (permalink / raw) To: linux-kernel Cc: Alexander Gordeev, Andreas Larsson, Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas, Christophe Leroy, Dan Williams, Dave Hansen, David Hildenbrand, David S. Miller, Davidlohr Bueso, Greg Kroah-Hartman, Heiko Carstens, Huacai Chen, Ingo Molnar, Jiaxun Yang, John Paul Adrian Glaubitz, Jonathan Cameron, Jonathan Corbet, Michael Ellerman, Mike Rapoport, Palmer Dabbelt, Rafael J. Wysocki, Rob Herring, Samuel Holland, Thomas Bogendoerfer, Thomas Gleixner, Vasily Gorbik, Will Deacon, Zi Yan, devicetree, linux-acpi, linux-arch, linux-arm-kernel, linux-cxl, linux-doc, linux-mips, linux-mm, linux-riscv, linux-s390, linux-sh, linuxppc-dev, loongarch, nvdimm, sparclinux, x86 From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Hi, Following the discussion about handling of CXL fixed memory windows on arm64 [1] I decided to bite the bullet and move numa_memblks from x86 to the generic code so they will be available on arm64/riscv and maybe on loongarch sometime later. While it could be possible to use memblock to describe CXL memory windows, it currently lacks notion of unpopulated memory ranges and numa_memblks does implement this. Another reason to make numa_memblks generic is that both arch_numa (arm64 and riscv) and loongarch use trimmed copy of x86 code although there is no fundamental reason why the same code cannot be used on all these platforms. Having numa_memblks in mm/ will make it's interaction with ACPI and FDT more consistent and I believe will reduce maintenance burden. And with generic numa_memblks it is (almost) straightforward to enable NUMA emulation on arm64 and riscv. The first 9 commits in this series are cleanups that are not strictly related to numa_memblks. Commits 10-16 slightly reorder code in x86 to allow extracting numa_memblks and NUMA emulation to the generic code. Commits 17-19 actually move the code from arch/x86/ to mm/ and commits 20-22 does some aftermath cleanups. Commit 23 updates of_numa_init() to return error of no NUMA nodes were found in the device tree. Commit 24 switches arch_numa to numa_memblks. Commit 25 enables usage of phys_to_target_node() and memory_add_physaddr_to_nid() with numa_memblks. Commit 26 moves the description for numa=fake from x86 to admin-guide. [1] https://lore.kernel.org/all/20240529171236.32002-1-Jonathan.Cameron@huawei.com/ v3: https://lore.kernel.org/all/20240801060826.559858-1-rppt@kernel.org * update allocation of offline node, thanks Jonathan * add comment about dependency of get_pfn_range_for_nid on memblock_set_node(), per Dan * fix build errros with 32-bit phys_address_t reported by kbuild * add Acked- and Reviewed-by, thanks Dan and David v2: https://lore.kernel.org/all/20240723064156.4009477-1-rppt@kernel.org * rebase on v6.11-rc1 * fix dummy_numa_init() in arch_numa, thanks Zi Yan * update of_numa_init() to return error of no NUMA nodes were * add Tested-by, thanks Zi Yan v1: https://lore.kernel.org/all/20240716111346.3676969-1-rppt@kernel.org * add cleanup for arch_alloc_nodedata and HAVE_ARCH_NODEDATA_EXTENSION * add patch that moves description of numa=fake kernel parameter from x86 to admin-guide * reduce rounding up of node_data allocations from PAGE_SIZE to SMP_CACHE_BYTES * restore single allocation attempt of numa_distance * fix several comments * added review tags Mike Rapoport (Microsoft) (26): mm: move kernel/numa.c to mm/ MIPS: sgi-ip27: make NODE_DATA() the same as on all other architectures MIPS: sgi-ip27: ensure node_possible_map only contains valid nodes MIPS: sgi-ip27: drop HAVE_ARCH_NODEDATA_EXTENSION MIPS: loongson64: rename __node_data to node_data MIPS: loongson64: drop HAVE_ARCH_NODEDATA_EXTENSION arch, mm: move definition of node_data to generic code mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION arch, mm: pull out allocation of NODE_DATA to generic code x86/numa: simplify numa_distance allocation x86/numa: use get_pfn_range_for_nid to verify that node spans memory x86/numa: move FAKE_NODE_* defines to numa_emu x86/numa_emu: simplify allocation of phys_dist x86/numa_emu: split __apicid_to_node update to a helper function x86/numa_emu: use a helper function to get MAX_DMA32_PFN x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned mm: introduce numa_memblks mm: move numa_distance and related code from x86 to numa_memblks mm: introduce numa_emulation mm: numa_memblks: introduce numa_memblks_init mm: numa_memblks: make several functions and variables static mm: numa_memblks: use memblock_{start,end}_of_DRAM() when sanitizing meminfo of, numa: return -EINVAL when no numa-node-id is found arch_numa: switch over to numa_memblks mm: make range-to-target_node lookup facility a part of numa_memblks docs: move numa=fake description to kernel-parameters.txt .../admin-guide/kernel-parameters.txt | 15 + .../arch/x86/x86_64/boot-options.rst | 12 - arch/arm64/include/asm/Kbuild | 1 + arch/arm64/include/asm/mmzone.h | 13 - arch/arm64/include/asm/topology.h | 1 + arch/loongarch/include/asm/Kbuild | 1 + arch/loongarch/include/asm/mmzone.h | 16 - arch/loongarch/include/asm/topology.h | 1 + arch/loongarch/kernel/numa.c | 21 - arch/mips/Kconfig | 5 - arch/mips/include/asm/mach-ip27/mmzone.h | 1 - .../mips/include/asm/mach-loongson64/mmzone.h | 4 - arch/mips/loongson64/numa.c | 28 +- arch/mips/sgi-ip27/ip27-memory.c | 12 +- arch/mips/sgi-ip27/ip27-smp.c | 2 + arch/powerpc/include/asm/mmzone.h | 6 - arch/powerpc/mm/numa.c | 26 +- arch/riscv/include/asm/Kbuild | 1 + arch/riscv/include/asm/mmzone.h | 13 - arch/riscv/include/asm/topology.h | 4 + arch/s390/include/asm/Kbuild | 1 + arch/s390/include/asm/mmzone.h | 17 - arch/s390/kernel/numa.c | 3 - arch/sh/include/asm/mmzone.h | 3 - arch/sh/mm/init.c | 7 +- arch/sh/mm/numa.c | 3 - arch/sparc/include/asm/mmzone.h | 4 - arch/sparc/mm/init_64.c | 11 +- arch/x86/Kconfig | 9 +- arch/x86/include/asm/Kbuild | 1 + arch/x86/include/asm/mmzone.h | 6 - arch/x86/include/asm/mmzone_32.h | 17 - arch/x86/include/asm/mmzone_64.h | 18 - arch/x86/include/asm/numa.h | 26 +- arch/x86/include/asm/sparsemem.h | 9 - arch/x86/mm/Makefile | 1 - arch/x86/mm/amdtopology.c | 1 + arch/x86/mm/numa.c | 622 +----------------- arch/x86/mm/numa_internal.h | 24 - drivers/acpi/numa/srat.c | 1 + drivers/base/Kconfig | 1 + drivers/base/arch_numa.c | 224 ++----- drivers/cxl/Kconfig | 2 +- drivers/dax/Kconfig | 2 +- drivers/of/of_numa.c | 5 +- include/asm-generic/mmzone.h | 5 + include/asm-generic/numa.h | 6 +- include/linux/memory_hotplug.h | 48 -- include/linux/numa.h | 8 + include/linux/numa_memblks.h | 58 ++ kernel/Makefile | 1 - kernel/numa.c | 26 - mm/Kconfig | 11 + mm/Makefile | 3 + mm/mm_init.c | 10 +- mm/numa.c | 69 ++ {arch/x86/mm => mm}/numa_emulation.c | 42 +- mm/numa_memblks.c | 571 ++++++++++++++++ 58 files changed, 893 insertions(+), 1166 deletions(-) delete mode 100644 arch/arm64/include/asm/mmzone.h delete mode 100644 arch/loongarch/include/asm/mmzone.h delete mode 100644 arch/riscv/include/asm/mmzone.h delete mode 100644 arch/s390/include/asm/mmzone.h delete mode 100644 arch/x86/include/asm/mmzone.h delete mode 100644 arch/x86/include/asm/mmzone_32.h delete mode 100644 arch/x86/include/asm/mmzone_64.h create mode 100644 include/asm-generic/mmzone.h create mode 100644 include/linux/numa_memblks.h delete mode 100644 kernel/numa.c create mode 100644 mm/numa.c rename {arch/x86/mm => mm}/numa_emulation.c (94%) create mode 100644 mm/numa_memblks.c base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b -- 2.43.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v4 24/26] arch_numa: switch over to numa_memblks 2024-08-07 6:40 [PATCH v4 00/26] mm: introduce numa_memblks Mike Rapoport @ 2024-08-07 6:41 ` Mike Rapoport 2024-08-07 6:58 ` Arnd Bergmann 2024-11-27 19:32 ` Marc Zyngier 0 siblings, 2 replies; 7+ messages in thread From: Mike Rapoport @ 2024-08-07 6:41 UTC (permalink / raw) To: linux-kernel Cc: Alexander Gordeev, Andreas Larsson, Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas, Christophe Leroy, Dan Williams, Dave Hansen, David Hildenbrand, David S. Miller, Davidlohr Bueso, Greg Kroah-Hartman, Heiko Carstens, Huacai Chen, Ingo Molnar, Jiaxun Yang, John Paul Adrian Glaubitz, Jonathan Cameron, Jonathan Corbet, Michael Ellerman, Mike Rapoport, Palmer Dabbelt, Rafael J. Wysocki, Rob Herring, Samuel Holland, Thomas Bogendoerfer, Thomas Gleixner, Vasily Gorbik, Will Deacon, Zi Yan, devicetree, linux-acpi, linux-arch, linux-arm-kernel, linux-cxl, linux-doc, linux-mips, linux-mm, linux-riscv, linux-s390, linux-sh, linuxppc-dev, loongarch, nvdimm, sparclinux, x86, Jonathan Cameron From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> Until now arch_numa was directly translating firmware NUMA information to memblock. Using numa_memblks as an intermediate step has a few advantages: * alignment with more battle tested x86 implementation * availability of NUMA emulation * maintaining node information for not yet populated memory Adjust a few places in numa_memblks to compile with 32-bit phys_addr_t and replace current functionality related to numa_add_memblk() and __node_distance() in arch_numa with the implementation based on numa_memblks and add functions required by numa_emulation. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Hildenbrand <david@redhat.com> --- drivers/base/Kconfig | 1 + drivers/base/arch_numa.c | 201 +++++++++++-------------------------- include/asm-generic/numa.h | 6 +- mm/numa_memblks.c | 17 ++-- 4 files changed, 75 insertions(+), 150 deletions(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 2b8fd6bb7da0..064eb52ff7e2 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -226,6 +226,7 @@ config GENERIC_ARCH_TOPOLOGY config GENERIC_ARCH_NUMA bool + select NUMA_MEMBLKS help Enable support for generic NUMA implementation. Currently, RISC-V and ARM64 use it. diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c index b6af7475ec44..8d49893c0e94 100644 --- a/drivers/base/arch_numa.c +++ b/drivers/base/arch_numa.c @@ -12,14 +12,12 @@ #include <linux/memblock.h> #include <linux/module.h> #include <linux/of.h> +#include <linux/numa_memblks.h> #include <asm/sections.h> -nodemask_t numa_nodes_parsed __initdata; static int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE }; -static int numa_distance_cnt; -static u8 *numa_distance; bool numa_off; static __init int numa_parse_early_param(char *opt) @@ -28,6 +26,8 @@ static __init int numa_parse_early_param(char *opt) return -EINVAL; if (str_has_prefix(opt, "off")) numa_off = true; + if (!strncmp(opt, "fake=", 5)) + return numa_emu_cmdline(opt + 5); return 0; } @@ -59,6 +59,7 @@ EXPORT_SYMBOL(cpumask_of_node); #endif +#ifndef CONFIG_NUMA_EMU static void numa_update_cpu(unsigned int cpu, bool remove) { int nid = cpu_to_node(cpu); @@ -81,6 +82,7 @@ void numa_remove_cpu(unsigned int cpu) { numa_update_cpu(cpu, true); } +#endif void numa_clear_node(unsigned int cpu) { @@ -142,7 +144,7 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid) unsigned long __per_cpu_offset[NR_CPUS] __read_mostly; EXPORT_SYMBOL(__per_cpu_offset); -int __init early_cpu_to_node(int cpu) +int early_cpu_to_node(int cpu) { return cpu_to_node_map[cpu]; } @@ -187,30 +189,6 @@ void __init setup_per_cpu_areas(void) } #endif -/** - * numa_add_memblk() - Set node id to memblk - * @nid: NUMA node ID of the new memblk - * @start: Start address of the new memblk - * @end: End address of the new memblk - * - * RETURNS: - * 0 on success, -errno on failure. - */ -int __init numa_add_memblk(int nid, u64 start, u64 end) -{ - int ret; - - ret = memblock_set_node(start, (end - start), &memblock.memory, nid); - if (ret < 0) { - pr_err("memblock [0x%llx - 0x%llx] failed to add on node %d\n", - start, (end - 1), nid); - return ret; - } - - node_set(nid, numa_nodes_parsed); - return ret; -} - /* * Initialize NODE_DATA for a node on the local memory */ @@ -226,116 +204,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) NODE_DATA(nid)->node_spanned_pages = end_pfn - start_pfn; } -/* - * numa_free_distance - * - * The current table is freed. - */ -void __init numa_free_distance(void) -{ - size_t size; - - if (!numa_distance) - return; - - size = numa_distance_cnt * numa_distance_cnt * - sizeof(numa_distance[0]); - - memblock_free(numa_distance, size); - numa_distance_cnt = 0; - numa_distance = NULL; -} - -/* - * Create a new NUMA distance table. - */ -static int __init numa_alloc_distance(void) -{ - size_t size; - int i, j; - - size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]); - numa_distance = memblock_alloc(size, PAGE_SIZE); - if (WARN_ON(!numa_distance)) - return -ENOMEM; - - numa_distance_cnt = nr_node_ids; - - /* fill with the default distances */ - for (i = 0; i < numa_distance_cnt; i++) - for (j = 0; j < numa_distance_cnt; j++) - numa_distance[i * numa_distance_cnt + j] = i == j ? - LOCAL_DISTANCE : REMOTE_DISTANCE; - - pr_debug("Initialized distance table, cnt=%d\n", numa_distance_cnt); - - return 0; -} - -/** - * numa_set_distance() - Set inter node NUMA distance from node to node. - * @from: the 'from' node to set distance - * @to: the 'to' node to set distance - * @distance: NUMA distance - * - * Set the distance from node @from to @to to @distance. - * If distance table doesn't exist, a warning is printed. - * - * If @from or @to is higher than the highest known node or lower than zero - * or @distance doesn't make sense, the call is ignored. - */ -void __init numa_set_distance(int from, int to, int distance) -{ - if (!numa_distance) { - pr_warn_once("Warning: distance table not allocated yet\n"); - return; - } - - if (from >= numa_distance_cnt || to >= numa_distance_cnt || - from < 0 || to < 0) { - pr_warn_once("Warning: node ids are out of bound, from=%d to=%d distance=%d\n", - from, to, distance); - return; - } - - if ((u8)distance != distance || - (from == to && distance != LOCAL_DISTANCE)) { - pr_warn_once("Warning: invalid distance parameter, from=%d to=%d distance=%d\n", - from, to, distance); - return; - } - - numa_distance[from * numa_distance_cnt + to] = distance; -} - -/* - * Return NUMA distance @from to @to - */ -int __node_distance(int from, int to) -{ - if (from >= numa_distance_cnt || to >= numa_distance_cnt) - return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE; - return numa_distance[from * numa_distance_cnt + to]; -} -EXPORT_SYMBOL(__node_distance); - static int __init numa_register_nodes(void) { int nid; - struct memblock_region *mblk; - - /* Check that valid nid is set to memblks */ - for_each_mem_region(mblk) { - int mblk_nid = memblock_get_region_node(mblk); - phys_addr_t start = mblk->base; - phys_addr_t end = mblk->base + mblk->size - 1; - - if (mblk_nid == NUMA_NO_NODE || mblk_nid >= MAX_NUMNODES) { - pr_warn("Warning: invalid memblk node %d [mem %pap-%pap]\n", - mblk_nid, &start, &end); - return -EINVAL; - } - } /* Finally register nodes. */ for_each_node_mask(nid, numa_nodes_parsed) { @@ -360,11 +231,7 @@ static int __init numa_init(int (*init_func)(void)) nodes_clear(node_possible_map); nodes_clear(node_online_map); - ret = numa_alloc_distance(); - if (ret < 0) - return ret; - - ret = init_func(); + ret = numa_memblks_init(init_func, /* memblock_force_top_down */ false); if (ret < 0) goto out_free_distance; @@ -382,7 +249,7 @@ static int __init numa_init(int (*init_func)(void)) return 0; out_free_distance: - numa_free_distance(); + numa_reset_distance(); return ret; } @@ -412,6 +279,7 @@ static int __init dummy_numa_init(void) pr_err("NUMA init failed\n"); return ret; } + node_set(0, numa_nodes_parsed); numa_off = true; return 0; @@ -454,3 +322,54 @@ void __init arch_numa_init(void) numa_init(dummy_numa_init); } + +#ifdef CONFIG_NUMA_EMU +void __init numa_emu_update_cpu_to_node(int *emu_nid_to_phys, + unsigned int nr_emu_nids) +{ + int i, j; + + /* + * Transform cpu_to_node_map table to use emulated nids by + * reverse-mapping phys_nid. The maps should always exist but fall + * back to zero just in case. + */ + for (i = 0; i < ARRAY_SIZE(cpu_to_node_map); i++) { + if (cpu_to_node_map[i] == NUMA_NO_NODE) + continue; + for (j = 0; j < nr_emu_nids; j++) + if (cpu_to_node_map[i] == emu_nid_to_phys[j]) + break; + cpu_to_node_map[i] = j < nr_emu_nids ? j : 0; + } +} + +u64 __init numa_emu_dma_end(void) +{ + return PFN_PHYS(memblock_start_of_DRAM() + SZ_4G); +} + +void debug_cpumask_set_cpu(unsigned int cpu, int node, bool enable) +{ + struct cpumask *mask; + + if (node == NUMA_NO_NODE) + return; + + mask = node_to_cpumask_map[node]; + if (!cpumask_available(mask)) { + pr_err("node_to_cpumask_map[%i] NULL\n", node); + dump_stack(); + return; + } + + if (enable) + cpumask_set_cpu(cpu, mask); + else + cpumask_clear_cpu(cpu, mask); + + pr_debug("%s cpu %d node %d: mask now %*pbl\n", + enable ? "numa_add_cpu" : "numa_remove_cpu", + cpu, node, cpumask_pr_args(mask)); +} +#endif /* CONFIG_NUMA_EMU */ diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h index c32e0cf23c90..c2b046d1fd82 100644 --- a/include/asm-generic/numa.h +++ b/include/asm-generic/numa.h @@ -32,8 +32,6 @@ static inline const struct cpumask *cpumask_of_node(int node) void __init arch_numa_init(void); int __init numa_add_memblk(int nodeid, u64 start, u64 end); -void __init numa_set_distance(int from, int to, int distance); -void __init numa_free_distance(void); void __init early_map_cpu_to_node(unsigned int cpu, int nid); int __init early_cpu_to_node(int cpu); void numa_store_cpu_info(unsigned int cpu); @@ -51,4 +49,8 @@ static inline int early_cpu_to_node(int cpu) { return 0; } #endif /* CONFIG_NUMA */ +#ifdef CONFIG_NUMA_EMU +void debug_cpumask_set_cpu(unsigned int cpu, int node, bool enable); +#endif + #endif /* __ASM_GENERIC_NUMA_H */ diff --git a/mm/numa_memblks.c b/mm/numa_memblks.c index e4358ad92233..c4037faa438b 100644 --- a/mm/numa_memblks.c +++ b/mm/numa_memblks.c @@ -405,9 +405,12 @@ static int __init numa_register_meminfo(struct numa_meminfo *mi) unsigned long pfn_align = node_map_pfn_alignment(); if (pfn_align && pfn_align < PAGES_PER_SECTION) { - pr_warn("Node alignment %LuMB < min %LuMB, rejecting NUMA config\n", - PFN_PHYS(pfn_align) >> 20, - PFN_PHYS(PAGES_PER_SECTION) >> 20); + unsigned long node_align_mb = PFN_PHYS(pfn_align) >> 20; + + unsigned long sect_align_mb = PFN_PHYS(PAGES_PER_SECTION) >> 20; + + pr_warn("Node alignment %luMB < min %luMB, rejecting NUMA config\n", + node_align_mb, sect_align_mb); return -EINVAL; } } @@ -418,18 +421,18 @@ static int __init numa_register_meminfo(struct numa_meminfo *mi) int __init numa_memblks_init(int (*init_func)(void), bool memblock_force_top_down) { + phys_addr_t max_addr = (phys_addr_t)ULLONG_MAX; int ret; nodes_clear(numa_nodes_parsed); nodes_clear(node_possible_map); nodes_clear(node_online_map); memset(&numa_meminfo, 0, sizeof(numa_meminfo)); - WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.memory, - NUMA_NO_NODE)); - WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.reserved, + WARN_ON(memblock_set_node(0, max_addr, &memblock.memory, NUMA_NO_NODE)); + WARN_ON(memblock_set_node(0, max_addr, &memblock.reserved, NUMA_NO_NODE)); /* In case that parsing SRAT failed. */ - WARN_ON(memblock_clear_hotplug(0, ULLONG_MAX)); + WARN_ON(memblock_clear_hotplug(0, max_addr)); numa_reset_distance(); ret = init_func(); -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 24/26] arch_numa: switch over to numa_memblks 2024-08-07 6:41 ` [PATCH v4 24/26] arch_numa: switch over to numa_memblks Mike Rapoport @ 2024-08-07 6:58 ` Arnd Bergmann 2024-08-07 18:18 ` Mike Rapoport 2024-11-27 19:32 ` Marc Zyngier 1 sibling, 1 reply; 7+ messages in thread From: Arnd Bergmann @ 2024-08-07 6:58 UTC (permalink / raw) To: Mike Rapoport, linux-kernel Cc: Alexander Gordeev, Andreas Larsson, Andrew Morton, Borislav Petkov, Catalin Marinas, Christophe Leroy, Dan Williams, Dave Hansen, David Hildenbrand, David S . Miller, Davidlohr Bueso, Greg Kroah-Hartman, Heiko Carstens, Huacai Chen, Ingo Molnar, Jiaxun Yang, John Paul Adrian Glaubitz, Jonathan Cameron, Jonathan Corbet, Michael Ellerman, Palmer Dabbelt, Rafael J . Wysocki, Rob Herring, Samuel Holland, Thomas Bogendoerfer, Thomas Gleixner, Vasily Gorbik, Will Deacon, Zi Yan, devicetree, linux-acpi, Linux-Arch, linux-arm-kernel, linux-cxl, linux-doc, linux-mips, linux-mm, linux-riscv, linux-s390, linux-sh, linuxppc-dev, loongarch, nvdimm, sparclinux, x86 On Wed, Aug 7, 2024, at 08:41, Mike Rapoport wrote: > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > Until now arch_numa was directly translating firmware NUMA information > to memblock. I get a link time warning from this: WARNING: modpost: vmlinux: section mismatch in reference: numa_set_cpumask+0x24 (section: .text.unlikely) -> early_cpu_to_node (section: .init.text) > @@ -142,7 +144,7 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid) > unsigned long __per_cpu_offset[NR_CPUS] __read_mostly; > EXPORT_SYMBOL(__per_cpu_offset); > > -int __init early_cpu_to_node(int cpu) > +int early_cpu_to_node(int cpu) > { > return cpu_to_node_map[cpu]; > } early_cpu_to_node() can no longer be __init here > +#endif /* CONFIG_NUMA_EMU */ > diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h > index c32e0cf23c90..c2b046d1fd82 100644 > --- a/include/asm-generic/numa.h > +++ b/include/asm-generic/numa.h > @@ -32,8 +32,6 @@ static inline const struct cpumask *cpumask_of_node(int node) > > void __init arch_numa_init(void); > int __init numa_add_memblk(int nodeid, u64 start, u64 end); > -void __init numa_set_distance(int from, int to, int distance); > -void __init numa_free_distance(void); > void __init early_map_cpu_to_node(unsigned int cpu, int nid); > int __init early_cpu_to_node(int cpu); > void numa_store_cpu_info(unsigned int cpu); but is still declared as __init in the header, so it is still put in that section and discarded after boot. I was confused by this at first, since the 'early' name seems to imply that you shouldn't call it once the system is up, but now you do. Arnd ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 24/26] arch_numa: switch over to numa_memblks 2024-08-07 6:58 ` Arnd Bergmann @ 2024-08-07 18:18 ` Mike Rapoport 2024-08-07 18:53 ` Arnd Bergmann 0 siblings, 1 reply; 7+ messages in thread From: Mike Rapoport @ 2024-08-07 18:18 UTC (permalink / raw) To: Arnd Bergmann Cc: linux-kernel, Alexander Gordeev, Andreas Larsson, Andrew Morton, Borislav Petkov, Catalin Marinas, Christophe Leroy, Dan Williams, Dave Hansen, David Hildenbrand, David S . Miller, Davidlohr Bueso, Greg Kroah-Hartman, Heiko Carstens, Huacai Chen, Ingo Molnar, Jiaxun Yang, John Paul Adrian Glaubitz, Jonathan Cameron, Jonathan Corbet, Michael Ellerman, Palmer Dabbelt, Rafael J . Wysocki, Rob Herring, Samuel Holland, Thomas Bogendoerfer, Thomas Gleixner, Vasily Gorbik, Will Deacon, Zi Yan, devicetree, linux-acpi, Linux-Arch, linux-arm-kernel, linux-cxl, linux-doc, linux-mips, linux-mm, linux-riscv, linux-s390, linux-sh, linuxppc-dev, loongarch, nvdimm, sparclinux, x86 On Wed, Aug 07, 2024 at 08:58:37AM +0200, Arnd Bergmann wrote: > On Wed, Aug 7, 2024, at 08:41, Mike Rapoport wrote: > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > > > Until now arch_numa was directly translating firmware NUMA information > > to memblock. > > I get a link time warning from this: > > WARNING: modpost: vmlinux: section mismatch in reference: numa_set_cpumask+0x24 (section: .text.unlikely) -> early_cpu_to_node (section: .init.text) I didn't see this neither in my build tests nor in kbuild reports :/ > > @@ -142,7 +144,7 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid) > > unsigned long __per_cpu_offset[NR_CPUS] __read_mostly; > > EXPORT_SYMBOL(__per_cpu_offset); > > > > -int __init early_cpu_to_node(int cpu) > > +int early_cpu_to_node(int cpu) > > { > > return cpu_to_node_map[cpu]; > > } > > early_cpu_to_node() can no longer be __init here > > > +#endif /* CONFIG_NUMA_EMU */ > > diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h > > index c32e0cf23c90..c2b046d1fd82 100644 > > --- a/include/asm-generic/numa.h > > +++ b/include/asm-generic/numa.h > > @@ -32,8 +32,6 @@ static inline const struct cpumask *cpumask_of_node(int node) > > > > void __init arch_numa_init(void); > > int __init numa_add_memblk(int nodeid, u64 start, u64 end); > > -void __init numa_set_distance(int from, int to, int distance); > > -void __init numa_free_distance(void); > > void __init early_map_cpu_to_node(unsigned int cpu, int nid); > > int __init early_cpu_to_node(int cpu); > > void numa_store_cpu_info(unsigned int cpu); > > but is still declared as __init in the header, so it is > still put in that section and discarded after boot. I believe this should fix it diff --git a/include/asm-generic/numa.h b/include/asm-generic/numa.h index c2b046d1fd82..e063d6487f66 100644 --- a/include/asm-generic/numa.h +++ b/include/asm-generic/numa.h @@ -33,7 +33,7 @@ static inline const struct cpumask *cpumask_of_node(int node) void __init arch_numa_init(void); int __init numa_add_memblk(int nodeid, u64 start, u64 end); void __init early_map_cpu_to_node(unsigned int cpu, int nid); -int __init early_cpu_to_node(int cpu); +int early_cpu_to_node(int cpu); void numa_store_cpu_info(unsigned int cpu); void numa_add_cpu(unsigned int cpu); void numa_remove_cpu(unsigned int cpu); > I was confused by this at first, since the 'early' name > seems to imply that you shouldn't call it once the system > is up, but now you do. I agree that this is confusing, but that's what x86 does and numa_emulation uses. > Arnd > -- Sincerely yours, Mike. ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v4 24/26] arch_numa: switch over to numa_memblks 2024-08-07 18:18 ` Mike Rapoport @ 2024-08-07 18:53 ` Arnd Bergmann 0 siblings, 0 replies; 7+ messages in thread From: Arnd Bergmann @ 2024-08-07 18:53 UTC (permalink / raw) To: Mike Rapoport Cc: linux-kernel, Alexander Gordeev, Andreas Larsson, Andrew Morton, Borislav Petkov, Catalin Marinas, Christophe Leroy, Dan Williams, Dave Hansen, David Hildenbrand, David S . Miller, Davidlohr Bueso, Greg Kroah-Hartman, Heiko Carstens, Huacai Chen, Ingo Molnar, Jiaxun Yang, John Paul Adrian Glaubitz, Jonathan Cameron, Jonathan Corbet, Michael Ellerman, Palmer Dabbelt, Rafael J . Wysocki, Rob Herring, Samuel Holland, Thomas Bogendoerfer, Thomas Gleixner, Vasily Gorbik, Will Deacon, Zi Yan, devicetree, linux-acpi, Linux-Arch, linux-arm-kernel, linux-cxl, linux-doc, linux-mips, linux-mm, linux-riscv, linux-s390, linux-sh, linuxppc-dev, loongarch, nvdimm, sparclinux, x86 On Wed, Aug 7, 2024, at 20:18, Mike Rapoport wrote: > On Wed, Aug 07, 2024 at 08:58:37AM +0200, Arnd Bergmann wrote: >> On Wed, Aug 7, 2024, at 08:41, Mike Rapoport wrote: >> > >> > void __init arch_numa_init(void); >> > int __init numa_add_memblk(int nodeid, u64 start, u64 end); >> > -void __init numa_set_distance(int from, int to, int distance); >> > -void __init numa_free_distance(void); >> > void __init early_map_cpu_to_node(unsigned int cpu, int nid); >> > int __init early_cpu_to_node(int cpu); >> > void numa_store_cpu_info(unsigned int cpu); >> >> but is still declared as __init in the header, so it is >> still put in that section and discarded after boot. > > I believe this should fix it Yes, sorry I should have posted the patch as well, this is what I tested with locally. Arnd ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v4 24/26] arch_numa: switch over to numa_memblks 2024-08-07 6:41 ` [PATCH v4 24/26] arch_numa: switch over to numa_memblks Mike Rapoport 2024-08-07 6:58 ` Arnd Bergmann @ 2024-11-27 19:32 ` Marc Zyngier 1 sibling, 0 replies; 7+ messages in thread From: Marc Zyngier @ 2024-11-27 19:32 UTC (permalink / raw) To: Mike Rapoport Cc: linux-kernel, Alexander Gordeev, Andreas Larsson, Andrew Morton, Arnd Bergmann, Borislav Petkov, Catalin Marinas, Christophe Leroy, Dan Williams, Dave Hansen, David Hildenbrand, David S. Miller, Davidlohr Bueso, Greg Kroah-Hartman, Heiko Carstens, Huacai Chen, Ingo Molnar, Jiaxun Yang, John Paul Adrian Glaubitz, Jonathan Cameron, Jonathan Corbet, Michael Ellerman, Palmer Dabbelt, Rafael J. Wysocki, Rob Herring, Samuel Holland, Thomas Bogendoerfer, Thomas Gleixner, Vasily Gorbik, Will Deacon, Zi Yan, devicetree, linux-acpi, linux-arch, linux-arm-kernel, linux-cxl, linux-doc, linux-mips, linux-mm, linux-riscv, linux-s390, linux-sh, linuxppc-dev, loongarch, nvdimm, sparclinux, x86, Jonathan Cameron Hi Mike, Sorry for reviving a rather old thread. On Wed, 07 Aug 2024 07:41:08 +0100, Mike Rapoport <rppt@kernel.org> wrote: > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org> > > Until now arch_numa was directly translating firmware NUMA information > to memblock. > > Using numa_memblks as an intermediate step has a few advantages: > * alignment with more battle tested x86 implementation > * availability of NUMA emulation > * maintaining node information for not yet populated memory > > Adjust a few places in numa_memblks to compile with 32-bit phys_addr_t > and replace current functionality related to numa_add_memblk() and > __node_distance() in arch_numa with the implementation based on > numa_memblks and add functions required by numa_emulation. > > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> > Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] > Acked-by: Dan Williams <dan.j.williams@intel.com> > Acked-by: David Hildenbrand <david@redhat.com> > --- > drivers/base/Kconfig | 1 + > drivers/base/arch_numa.c | 201 +++++++++++-------------------------- > include/asm-generic/numa.h | 6 +- > mm/numa_memblks.c | 17 ++-- > 4 files changed, 75 insertions(+), 150 deletions(-) > [...] > static int __init numa_register_nodes(void) > { > int nid; > - struct memblock_region *mblk; > - > - /* Check that valid nid is set to memblks */ > - for_each_mem_region(mblk) { > - int mblk_nid = memblock_get_region_node(mblk); > - phys_addr_t start = mblk->base; > - phys_addr_t end = mblk->base + mblk->size - 1; > - > - if (mblk_nid == NUMA_NO_NODE || mblk_nid >= MAX_NUMNODES) { > - pr_warn("Warning: invalid memblk node %d [mem %pap-%pap]\n", > - mblk_nid, &start, &end); > - return -EINVAL; > - } > - } > This hunk has the unfortunate side effect of killing my ThunderX extremely early at boot time, as this sorry excuse for a machine really relies on the kernel recognising that whatever NUMA information the FW offers is BS. Reverting this hunk restores happiness (sort of). FWIW, I've posted a patch with such revert at [1]. Thanks, M. [1] https://lore.kernel.org/r/20241127193000.3702637-1-maz@kernel.org -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-11-27 19:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <MW4PR12MB72616723E1A090E315681FF6A38B2@MW4PR12MB7261.namprd12.prod.outlook.com> 2024-08-27 8:52 ` [PATCH v4 24/26] arch_numa: switch over to numa_memblks Mike Rapoport 2024-08-26 22:46 Bruno Faccini -- strict thread matches above, loose matches on Subject: below -- 2024-08-07 6:40 [PATCH v4 00/26] mm: introduce numa_memblks Mike Rapoport 2024-08-07 6:41 ` [PATCH v4 24/26] arch_numa: switch over to numa_memblks Mike Rapoport 2024-08-07 6:58 ` Arnd Bergmann 2024-08-07 18:18 ` Mike Rapoport 2024-08-07 18:53 ` Arnd Bergmann 2024-11-27 19:32 ` Marc Zyngier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).