From: Yury Norov <yury.norov@gmail.com>
To: Huang Shijie <shijie@os.amperecomputing.com>
Cc: gregkh@linuxfoundation.org, patches@amperecomputing.com,
rafael@kernel.org, paul.walmsley@sifive.com, palmer@dabbelt.com,
aou@eecs.berkeley.edu, kuba@kernel.org, vschneid@redhat.com,
mingo@kernel.org, akpm@linux-foundation.org, vbabka@suse.cz,
rppt@kernel.org, tglx@linutronix.de, jpoimboe@kernel.org,
ndesaulniers@google.com, mikelley@microsoft.com,
mhiramat@kernel.org, arnd@arndb.de, linux-kernel@vger.kernel.org,
linux-riscv@lists.infradead.org,
linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com,
will@kernel.org, mark.rutland@arm.com, mpe@ellerman.id.au,
linuxppc-dev@lists.ozlabs.org, chenhuacai@kernel.org,
jiaxun.yang@flygoat.com, linux-mips@vger.kernel.org,
cl@os.amperecomputing.com
Subject: Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
Date: Thu, 18 Jan 2024 20:42:25 -0800 [thread overview]
Message-ID: <Zan9sb0vtSvVvQeA@yury-ThinkPad> (raw)
In-Reply-To: <20240119033227.14113-1-shijie@os.amperecomputing.com>
On Fri, Jan 19, 2024 at 11:32:27AM +0800, Huang Shijie wrote:
> hZ7bkEvc+Z19RHkS/HVG3KMg
> X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM8PR01MB7144
> Status: O
> Content-Length: 3779
> Lines: 126
>
> During the kernel booting, the generic cpu_to_node() is called too early in
> arm64, powerpc and riscv when CONFIG_NUMA is enabled.
>
> There are at least four places in the common code where
> the generic cpu_to_node() is called before it is initialized:
> 1.) early_trace_init() in kernel/trace/trace.c
> 2.) sched_init() in kernel/sched/core.c
> 3.) init_sched_fair_class() in kernel/sched/fair.c
> 4.) workqueue_init_early() in kernel/workqueue.c
>
> In order to fix the bug, the patch changes generic cpu_to_node to
> function pointer, and export it for kernel modules.
> Introduce smp_prepare_boot_cpu_start() to wrap the original
> smp_prepare_boot_cpu(), and set cpu_to_node with early_cpu_to_node.
> Introduce smp_prepare_cpus_done() to wrap the original smp_prepare_cpus(),
> and set the cpu_to_node to formal _cpu_to_node().
This adds another level of indirection, I think. Currently cpu_to_node
is a simple inliner. After the patch it would be a real function with
all the associate overhead. Can you share a bloat-o-meter output here?
Regardless, I don't think that the approach is correct. As per your
description, some initialization functions erroneously call
cpu_to_node() instead of early_cpu_to_node() which exists specifically
for that case.
If the above correct, it's clearly a caller problem, and the fix is to
simply switch all those callers to use early version.
I would also initialize the numa_node with NUMA_NO_NODE at declaration,
so that if someone calls cpu_to_node() before the variable is properly
initialized at runtime, he'll get NO_NODE, which is obviously an error.
Thanks,
Yury
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
> drivers/base/arch_numa.c | 11 +++++++++++
> include/linux/topology.h | 6 ++----
> init/main.c | 29 +++++++++++++++++++++++++++--
> 3 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> index 5b59d133b6af..867a477fa975 100644
> --- a/drivers/base/arch_numa.c
> +++ b/drivers/base/arch_numa.c
> @@ -61,6 +61,17 @@ EXPORT_SYMBOL(cpumask_of_node);
>
> #endif
>
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> +int _cpu_to_node(int cpu)
> +{
> + return per_cpu(numa_node, cpu);
> +}
> +int (*cpu_to_node)(int cpu);
> +EXPORT_SYMBOL(cpu_to_node);
> +#endif
> +#endif
> +
> static void numa_update_cpu(unsigned int cpu, bool remove)
> {
> int nid = cpu_to_node(cpu);
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 52f5850730b3..e7ce2bae11dd 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -91,10 +91,8 @@ static inline int numa_node_id(void)
> #endif
>
> #ifndef cpu_to_node
> -static inline int cpu_to_node(int cpu)
> -{
> - return per_cpu(numa_node, cpu);
> -}
> +extern int (*cpu_to_node)(int cpu);
> +extern int _cpu_to_node(int cpu);
> #endif
>
> #ifndef set_numa_node
> diff --git a/init/main.c b/init/main.c
> index e24b0780fdff..b142e9c51161 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -870,6 +870,18 @@ static void __init print_unknown_bootoptions(void)
> memblock_free(unknown_options, len);
> }
>
> +static void __init smp_prepare_boot_cpu_start(void)
> +{
> + smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> +
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> + /* The early_cpu_to_node should be ready now. */
> + cpu_to_node = early_cpu_to_node;
> +#endif
> +#endif
> +}
> +
> asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
> void start_kernel(void)
> {
> @@ -899,7 +911,7 @@ void start_kernel(void)
> setup_command_line(command_line);
> setup_nr_cpu_ids();
> setup_per_cpu_areas();
> - smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> + smp_prepare_boot_cpu_start();
> boot_cpu_hotplug_init();
>
> pr_notice("Kernel command line: %s\n", saved_command_line);
> @@ -1519,6 +1531,19 @@ void __init console_on_rootfs(void)
> fput(file);
> }
>
> +static void __init smp_prepare_cpus_done(unsigned int setup_max_cpus)
> +{
> + /* Different ARCHs may override smp_prepare_cpus() */
> + smp_prepare_cpus(setup_max_cpus);
> +
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> + /* Change to the formal function. */
> + cpu_to_node = _cpu_to_node;
> +#endif
> +#endif
> +}
> +
> static noinline void __init kernel_init_freeable(void)
> {
> /* Now the scheduler is fully set up and can do blocking allocations */
> @@ -1531,7 +1556,7 @@ static noinline void __init kernel_init_freeable(void)
>
> cad_pid = get_pid(t
WARNING: multiple messages have this Message-ID (diff)
From: Yury Norov <yury.norov@gmail.com>
To: Huang Shijie <shijie@os.amperecomputing.com>
Cc: gregkh@linuxfoundation.org, patches@amperecomputing.com,
rafael@kernel.org, paul.walmsley@sifive.com, palmer@dabbelt.com,
aou@eecs.berkeley.edu, kuba@kernel.org, vschneid@redhat.com,
mingo@kernel.org, akpm@linux-foundation.org, vbabka@suse.cz,
rppt@kernel.org, tglx@linutronix.de, jpoimboe@kernel.org,
ndesaulniers@google.com, mikelley@microsoft.com,
mhiramat@kernel.org, arnd@arndb.de, linux-kernel@vger.kernel.org,
linux-riscv@lists.infradead.org,
linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com,
will@kernel.org, mark.rutland@arm.com, mpe@ellerman.id.au,
linuxppc-dev@lists.ozlabs.org, chenhuacai@kernel.org,
jiaxun.yang@flygoat.com, linux-mips@vger.kernel.org,
cl@os.amperecomputing.com
Subject: Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
Date: Thu, 18 Jan 2024 20:42:25 -0800 [thread overview]
Message-ID: <Zan9sb0vtSvVvQeA@yury-ThinkPad> (raw)
In-Reply-To: <20240119033227.14113-1-shijie@os.amperecomputing.com>
On Fri, Jan 19, 2024 at 11:32:27AM +0800, Huang Shijie wrote:
> hZ7bkEvc+Z19RHkS/HVG3KMg
> X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM8PR01MB7144
> Status: O
> Content-Length: 3779
> Lines: 126
>
> During the kernel booting, the generic cpu_to_node() is called too early in
> arm64, powerpc and riscv when CONFIG_NUMA is enabled.
>
> There are at least four places in the common code where
> the generic cpu_to_node() is called before it is initialized:
> 1.) early_trace_init() in kernel/trace/trace.c
> 2.) sched_init() in kernel/sched/core.c
> 3.) init_sched_fair_class() in kernel/sched/fair.c
> 4.) workqueue_init_early() in kernel/workqueue.c
>
> In order to fix the bug, the patch changes generic cpu_to_node to
> function pointer, and export it for kernel modules.
> Introduce smp_prepare_boot_cpu_start() to wrap the original
> smp_prepare_boot_cpu(), and set cpu_to_node with early_cpu_to_node.
> Introduce smp_prepare_cpus_done() to wrap the original smp_prepare_cpus(),
> and set the cpu_to_node to formal _cpu_to_node().
This adds another level of indirection, I think. Currently cpu_to_node
is a simple inliner. After the patch it would be a real function with
all the associate overhead. Can you share a bloat-o-meter output here?
Regardless, I don't think that the approach is correct. As per your
description, some initialization functions erroneously call
cpu_to_node() instead of early_cpu_to_node() which exists specifically
for that case.
If the above correct, it's clearly a caller problem, and the fix is to
simply switch all those callers to use early version.
I would also initialize the numa_node with NUMA_NO_NODE at declaration,
so that if someone calls cpu_to_node() before the variable is properly
initialized at runtime, he'll get NO_NODE, which is obviously an error.
Thanks,
Yury
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
> drivers/base/arch_numa.c | 11 +++++++++++
> include/linux/topology.h | 6 ++----
> init/main.c | 29 +++++++++++++++++++++++++++--
> 3 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> index 5b59d133b6af..867a477fa975 100644
> --- a/drivers/base/arch_numa.c
> +++ b/drivers/base/arch_numa.c
> @@ -61,6 +61,17 @@ EXPORT_SYMBOL(cpumask_of_node);
>
> #endif
>
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> +int _cpu_to_node(int cpu)
> +{
> + return per_cpu(numa_node, cpu);
> +}
> +int (*cpu_to_node)(int cpu);
> +EXPORT_SYMBOL(cpu_to_node);
> +#endif
> +#endif
> +
> static void numa_update_cpu(unsigned int cpu, bool remove)
> {
> int nid = cpu_to_node(cpu);
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 52f5850730b3..e7ce2bae11dd 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -91,10 +91,8 @@ static inline int numa_node_id(void)
> #endif
>
> #ifndef cpu_to_node
> -static inline int cpu_to_node(int cpu)
> -{
> - return per_cpu(numa_node, cpu);
> -}
> +extern int (*cpu_to_node)(int cpu);
> +extern int _cpu_to_node(int cpu);
> #endif
>
> #ifndef set_numa_node
> diff --git a/init/main.c b/init/main.c
> index e24b0780fdff..b142e9c51161 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -870,6 +870,18 @@ static void __init print_unknown_bootoptions(void)
> memblock_free(unknown_options, len);
> }
>
> +static void __init smp_prepare_boot_cpu_start(void)
> +{
> + smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> +
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> + /* The early_cpu_to_node should be ready now. */
> + cpu_to_node = early_cpu_to_node;
> +#endif
> +#endif
> +}
> +
> asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
> void start_kernel(void)
> {
> @@ -899,7 +911,7 @@ void start_kernel(void)
> setup_command_line(command_line);
> setup_nr_cpu_ids();
> setup_per_cpu_areas();
> - smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> + smp_prepare_boot_cpu_start();
> boot_cpu_hotplug_init();
>
> pr_notice("Kernel command line: %s\n", saved_command_line);
> @@ -1519,6 +1531,19 @@ void __init console_on_rootfs(void)
> fput(file);
> }
>
> +static void __init smp_prepare_cpus_done(unsigned int setup_max_cpus)
> +{
> + /* Different ARCHs may override smp_prepare_cpus() */
> + smp_prepare_cpus(setup_max_cpus);
> +
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> + /* Change to the formal function. */
> + cpu_to_node = _cpu_to_node;
> +#endif
> +#endif
> +}
> +
> static noinline void __init kernel_init_freeable(void)
> {
> /* Now the scheduler is fully set up and can do blocking allocations */
> @@ -1531,7 +1556,7 @@ static noinline void __init kernel_init_freeable(void)
>
> cad_pid = get_pid(t
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: Yury Norov <yury.norov@gmail.com>
To: Huang Shijie <shijie@os.amperecomputing.com>
Cc: mark.rutland@arm.com, rafael@kernel.org, catalin.marinas@arm.com,
jiaxun.yang@flygoat.com, mikelley@microsoft.com,
linux-riscv@lists.infradead.org, will@kernel.org,
mingo@kernel.org, vschneid@redhat.com, chenhuacai@kernel.org,
cl@os.amperecomputing.com, vbabka@suse.cz, kuba@kernel.org,
patches@amperecomputing.com, linux-mips@vger.kernel.org,
aou@eecs.berkeley.edu, arnd@arndb.de, paul.walmsley@sifive.com,
tglx@linutronix.de, jpoimboe@kernel.org,
linux-arm-kernel@lists.infradead.org, gregkh@linuxfoundation.org,
ndesaulniers@google.com, linux-kernel@vger.kernel.org,
palmer@dabbelt.com, mhiramat@kernel.org,
akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org,
rppt@kernel.org
Subject: Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
Date: Thu, 18 Jan 2024 20:42:25 -0800 [thread overview]
Message-ID: <Zan9sb0vtSvVvQeA@yury-ThinkPad> (raw)
In-Reply-To: <20240119033227.14113-1-shijie@os.amperecomputing.com>
On Fri, Jan 19, 2024 at 11:32:27AM +0800, Huang Shijie wrote:
> hZ7bkEvc+Z19RHkS/HVG3KMg
> X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM8PR01MB7144
> Status: O
> Content-Length: 3779
> Lines: 126
>
> During the kernel booting, the generic cpu_to_node() is called too early in
> arm64, powerpc and riscv when CONFIG_NUMA is enabled.
>
> There are at least four places in the common code where
> the generic cpu_to_node() is called before it is initialized:
> 1.) early_trace_init() in kernel/trace/trace.c
> 2.) sched_init() in kernel/sched/core.c
> 3.) init_sched_fair_class() in kernel/sched/fair.c
> 4.) workqueue_init_early() in kernel/workqueue.c
>
> In order to fix the bug, the patch changes generic cpu_to_node to
> function pointer, and export it for kernel modules.
> Introduce smp_prepare_boot_cpu_start() to wrap the original
> smp_prepare_boot_cpu(), and set cpu_to_node with early_cpu_to_node.
> Introduce smp_prepare_cpus_done() to wrap the original smp_prepare_cpus(),
> and set the cpu_to_node to formal _cpu_to_node().
This adds another level of indirection, I think. Currently cpu_to_node
is a simple inliner. After the patch it would be a real function with
all the associate overhead. Can you share a bloat-o-meter output here?
Regardless, I don't think that the approach is correct. As per your
description, some initialization functions erroneously call
cpu_to_node() instead of early_cpu_to_node() which exists specifically
for that case.
If the above correct, it's clearly a caller problem, and the fix is to
simply switch all those callers to use early version.
I would also initialize the numa_node with NUMA_NO_NODE at declaration,
so that if someone calls cpu_to_node() before the variable is properly
initialized at runtime, he'll get NO_NODE, which is obviously an error.
Thanks,
Yury
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
> drivers/base/arch_numa.c | 11 +++++++++++
> include/linux/topology.h | 6 ++----
> init/main.c | 29 +++++++++++++++++++++++++++--
> 3 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> index 5b59d133b6af..867a477fa975 100644
> --- a/drivers/base/arch_numa.c
> +++ b/drivers/base/arch_numa.c
> @@ -61,6 +61,17 @@ EXPORT_SYMBOL(cpumask_of_node);
>
> #endif
>
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> +int _cpu_to_node(int cpu)
> +{
> + return per_cpu(numa_node, cpu);
> +}
> +int (*cpu_to_node)(int cpu);
> +EXPORT_SYMBOL(cpu_to_node);
> +#endif
> +#endif
> +
> static void numa_update_cpu(unsigned int cpu, bool remove)
> {
> int nid = cpu_to_node(cpu);
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 52f5850730b3..e7ce2bae11dd 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -91,10 +91,8 @@ static inline int numa_node_id(void)
> #endif
>
> #ifndef cpu_to_node
> -static inline int cpu_to_node(int cpu)
> -{
> - return per_cpu(numa_node, cpu);
> -}
> +extern int (*cpu_to_node)(int cpu);
> +extern int _cpu_to_node(int cpu);
> #endif
>
> #ifndef set_numa_node
> diff --git a/init/main.c b/init/main.c
> index e24b0780fdff..b142e9c51161 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -870,6 +870,18 @@ static void __init print_unknown_bootoptions(void)
> memblock_free(unknown_options, len);
> }
>
> +static void __init smp_prepare_boot_cpu_start(void)
> +{
> + smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> +
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> + /* The early_cpu_to_node should be ready now. */
> + cpu_to_node = early_cpu_to_node;
> +#endif
> +#endif
> +}
> +
> asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
> void start_kernel(void)
> {
> @@ -899,7 +911,7 @@ void start_kernel(void)
> setup_command_line(command_line);
> setup_nr_cpu_ids();
> setup_per_cpu_areas();
> - smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> + smp_prepare_boot_cpu_start();
> boot_cpu_hotplug_init();
>
> pr_notice("Kernel command line: %s\n", saved_command_line);
> @@ -1519,6 +1531,19 @@ void __init console_on_rootfs(void)
> fput(file);
> }
>
> +static void __init smp_prepare_cpus_done(unsigned int setup_max_cpus)
> +{
> + /* Different ARCHs may override smp_prepare_cpus() */
> + smp_prepare_cpus(setup_max_cpus);
> +
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> + /* Change to the formal function. */
> + cpu_to_node = _cpu_to_node;
> +#endif
> +#endif
> +}
> +
> static noinline void __init kernel_init_freeable(void)
> {
> /* Now the scheduler is fully set up and can do blocking allocations */
> @@ -1531,7 +1556,7 @@ static noinline void __init kernel_init_freeable(void)
>
> cad_pid = get_pid(t
WARNING: multiple messages have this Message-ID (diff)
From: Yury Norov <yury.norov@gmail.com>
To: Huang Shijie <shijie@os.amperecomputing.com>
Cc: gregkh@linuxfoundation.org, patches@amperecomputing.com,
rafael@kernel.org, paul.walmsley@sifive.com, palmer@dabbelt.com,
aou@eecs.berkeley.edu, kuba@kernel.org, vschneid@redhat.com,
mingo@kernel.org, akpm@linux-foundation.org, vbabka@suse.cz,
rppt@kernel.org, tglx@linutronix.de, jpoimboe@kernel.org,
ndesaulniers@google.com, mikelley@microsoft.com,
mhiramat@kernel.org, arnd@arndb.de, linux-kernel@vger.kernel.org,
linux-riscv@lists.infradead.org,
linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com,
will@kernel.org, mark.rutland@arm.com, mpe@ellerman.id.au,
linuxppc-dev@lists.ozlabs.org, chenhuacai@kernel.org,
jiaxun.yang@flygoat.com, linux-mips@vger.kernel.org,
cl@os.amperecomputing.com
Subject: Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
Date: Thu, 18 Jan 2024 20:42:25 -0800 [thread overview]
Message-ID: <Zan9sb0vtSvVvQeA@yury-ThinkPad> (raw)
In-Reply-To: <20240119033227.14113-1-shijie@os.amperecomputing.com>
On Fri, Jan 19, 2024 at 11:32:27AM +0800, Huang Shijie wrote:
> hZ7bkEvc+Z19RHkS/HVG3KMg
> X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM8PR01MB7144
> Status: O
> Content-Length: 3779
> Lines: 126
>
> During the kernel booting, the generic cpu_to_node() is called too early in
> arm64, powerpc and riscv when CONFIG_NUMA is enabled.
>
> There are at least four places in the common code where
> the generic cpu_to_node() is called before it is initialized:
> 1.) early_trace_init() in kernel/trace/trace.c
> 2.) sched_init() in kernel/sched/core.c
> 3.) init_sched_fair_class() in kernel/sched/fair.c
> 4.) workqueue_init_early() in kernel/workqueue.c
>
> In order to fix the bug, the patch changes generic cpu_to_node to
> function pointer, and export it for kernel modules.
> Introduce smp_prepare_boot_cpu_start() to wrap the original
> smp_prepare_boot_cpu(), and set cpu_to_node with early_cpu_to_node.
> Introduce smp_prepare_cpus_done() to wrap the original smp_prepare_cpus(),
> and set the cpu_to_node to formal _cpu_to_node().
This adds another level of indirection, I think. Currently cpu_to_node
is a simple inliner. After the patch it would be a real function with
all the associate overhead. Can you share a bloat-o-meter output here?
Regardless, I don't think that the approach is correct. As per your
description, some initialization functions erroneously call
cpu_to_node() instead of early_cpu_to_node() which exists specifically
for that case.
If the above correct, it's clearly a caller problem, and the fix is to
simply switch all those callers to use early version.
I would also initialize the numa_node with NUMA_NO_NODE at declaration,
so that if someone calls cpu_to_node() before the variable is properly
initialized at runtime, he'll get NO_NODE, which is obviously an error.
Thanks,
Yury
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
> drivers/base/arch_numa.c | 11 +++++++++++
> include/linux/topology.h | 6 ++----
> init/main.c | 29 +++++++++++++++++++++++++++--
> 3 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> index 5b59d133b6af..867a477fa975 100644
> --- a/drivers/base/arch_numa.c
> +++ b/drivers/base/arch_numa.c
> @@ -61,6 +61,17 @@ EXPORT_SYMBOL(cpumask_of_node);
>
> #endif
>
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> +int _cpu_to_node(int cpu)
> +{
> + return per_cpu(numa_node, cpu);
> +}
> +int (*cpu_to_node)(int cpu);
> +EXPORT_SYMBOL(cpu_to_node);
> +#endif
> +#endif
> +
> static void numa_update_cpu(unsigned int cpu, bool remove)
> {
> int nid = cpu_to_node(cpu);
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 52f5850730b3..e7ce2bae11dd 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -91,10 +91,8 @@ static inline int numa_node_id(void)
> #endif
>
> #ifndef cpu_to_node
> -static inline int cpu_to_node(int cpu)
> -{
> - return per_cpu(numa_node, cpu);
> -}
> +extern int (*cpu_to_node)(int cpu);
> +extern int _cpu_to_node(int cpu);
> #endif
>
> #ifndef set_numa_node
> diff --git a/init/main.c b/init/main.c
> index e24b0780fdff..b142e9c51161 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -870,6 +870,18 @@ static void __init print_unknown_bootoptions(void)
> memblock_free(unknown_options, len);
> }
>
> +static void __init smp_prepare_boot_cpu_start(void)
> +{
> + smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> +
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> + /* The early_cpu_to_node should be ready now. */
> + cpu_to_node = early_cpu_to_node;
> +#endif
> +#endif
> +}
> +
> asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector
> void start_kernel(void)
> {
> @@ -899,7 +911,7 @@ void start_kernel(void)
> setup_command_line(command_line);
> setup_nr_cpu_ids();
> setup_per_cpu_areas();
> - smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> + smp_prepare_boot_cpu_start();
> boot_cpu_hotplug_init();
>
> pr_notice("Kernel command line: %s\n", saved_command_line);
> @@ -1519,6 +1531,19 @@ void __init console_on_rootfs(void)
> fput(file);
> }
>
> +static void __init smp_prepare_cpus_done(unsigned int setup_max_cpus)
> +{
> + /* Different ARCHs may override smp_prepare_cpus() */
> + smp_prepare_cpus(setup_max_cpus);
> +
> +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> +#ifndef cpu_to_node
> + /* Change to the formal function. */
> + cpu_to_node = _cpu_to_node;
> +#endif
> +#endif
> +}
> +
> static noinline void __init kernel_init_freeable(void)
> {
> /* Now the scheduler is fully set up and can do blocking allocations */
> @@ -1531,7 +1556,7 @@ static noinline void __init kernel_init_freeable(void)
>
> cad_pid = get_pid(t
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-01-19 4:42 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-19 3:32 [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id Huang Shijie
2024-01-19 3:32 ` Huang Shijie
2024-01-19 3:32 ` Huang Shijie
2024-01-19 3:32 ` Huang Shijie
2024-01-19 4:42 ` Yury Norov [this message]
2024-01-19 4:42 ` Yury Norov
2024-01-19 4:42 ` Yury Norov
2024-01-19 4:42 ` Yury Norov
2024-01-19 6:46 ` Shijie Huang
2024-01-19 6:46 ` Shijie Huang
2024-01-19 6:46 ` Shijie Huang
2024-01-19 6:46 ` Shijie Huang
2024-01-19 7:02 ` Shijie Huang
2024-01-19 7:02 ` Shijie Huang
2024-01-19 7:02 ` Shijie Huang
2024-01-19 7:02 ` Shijie Huang
2024-01-19 8:42 ` Mike Rapoport
2024-01-19 8:42 ` Mike Rapoport
2024-01-19 8:42 ` Mike Rapoport
2024-01-19 8:42 ` Mike Rapoport
2024-01-19 8:50 ` Shijie Huang
2024-01-19 8:50 ` Shijie Huang
2024-01-19 8:50 ` Shijie Huang
2024-01-19 8:50 ` Shijie Huang
2024-01-19 18:02 ` Yury Norov
2024-01-19 18:02 ` Yury Norov
2024-01-19 18:02 ` Yury Norov
2024-01-19 18:02 ` Yury Norov
2024-01-22 7:32 ` Shijie Huang
2024-01-22 7:32 ` Shijie Huang
2024-01-22 7:32 ` Shijie Huang
2024-01-22 7:32 ` Shijie Huang
2024-01-22 7:41 ` Mike Rapoport
2024-01-22 7:41 ` Mike Rapoport
2024-01-22 7:41 ` Mike Rapoport
2024-01-22 7:41 ` Mike Rapoport
2024-01-22 8:27 ` Shijie Huang
2024-01-22 8:27 ` Shijie Huang
2024-01-22 8:27 ` Shijie Huang
2024-01-22 8:27 ` Shijie Huang
2024-01-19 7:42 ` Shijie Huang
2024-01-19 7:42 ` Shijie Huang
2024-01-19 7:42 ` Shijie Huang
2024-01-19 7:42 ` Shijie Huang
2024-01-19 5:35 ` Greg KH
2024-01-19 5:35 ` Greg KH
2024-01-19 5:35 ` Greg KH
2024-01-19 5:35 ` Greg KH
2024-01-19 16:32 ` kernel test robot
2024-01-19 16:32 ` kernel test robot
2024-01-19 16:32 ` kernel test robot
2024-01-19 16:32 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zan9sb0vtSvVvQeA@yury-ThinkPad \
--to=yury.norov@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aou@eecs.berkeley.edu \
--cc=arnd@arndb.de \
--cc=catalin.marinas@arm.com \
--cc=chenhuacai@kernel.org \
--cc=cl@os.amperecomputing.com \
--cc=gregkh@linuxfoundation.org \
--cc=jiaxun.yang@flygoat.com \
--cc=jpoimboe@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mark.rutland@arm.com \
--cc=mhiramat@kernel.org \
--cc=mikelley@microsoft.com \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=ndesaulniers@google.com \
--cc=palmer@dabbelt.com \
--cc=patches@amperecomputing.com \
--cc=paul.walmsley@sifive.com \
--cc=rafael@kernel.org \
--cc=rppt@kernel.org \
--cc=shijie@os.amperecomputing.com \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.