From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Puranjay Mohan <puranjay@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Sumit Garg <sumit.garg@linaro.org>,
Stephen Boyd <swboyd@chromium.org>,
Douglas Anderson <dianders@chromium.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Cc: puranjay12@gmail.com
Subject: Re: [PATCH v2 2/2] arm64: implement raw_smp_processor_id() using thread_info
Date: Mon, 6 May 2024 12:27:02 +0530 [thread overview]
Message-ID: <7ceccfb1-e710-4189-8d1a-a8b4c2d184bc@arm.com> (raw)
In-Reply-To: <20240502123449.2690-2-puranjay@kernel.org>
On 5/2/24 18:04, Puranjay Mohan wrote:
> Historically, arm64 implemented raw_smp_processor_id() as a read of
> current_thread_info()->cpu. This changed when arm64 moved thread_info
> into task struct, as at the time CONFIG_THREAD_INFO_IN_TASK made core
> code use thread_struct::cpu for the cpu number, and due to header
> dependencies prevented using this in raw_smp_processor_id(). As a
> workaround, we moved to using a percpu variable in commit:
>
> commit 57c82954e77f ("arm64: make cpu number a percpu variable")
>
> Since then, thread_info::cpu was reintroduced, and core code was made to
> use this in commits:
>
> commit 001430c1910d ("arm64: add CPU field to struct thread_info")
> commit bcf9033e5449 ("sched: move CPU field back into thread_info if
> THREAD_INFO_IN_TASK=y")
>
> Consequently it is possible to use current_thread_info()->cpu again.
>
> This decreases the number of emitted instructions like in the following
> example:
>
> Dump of assembler code for function bpf_get_smp_processor_id:
> 0xffff8000802cd608 <+0>: nop
> 0xffff8000802cd60c <+4>: nop
> 0xffff8000802cd610 <+8>: adrp x0, 0xffff800082138000
> 0xffff8000802cd614 <+12>: mrs x1, tpidr_el1
> 0xffff8000802cd618 <+16>: add x0, x0, #0x8
> 0xffff8000802cd61c <+20>: ldrsw x0, [x0, x1]
> 0xffff8000802cd620 <+24>: ret
>
> After this patch:
>
> Dump of assembler code for function bpf_get_smp_processor_id:
> 0xffff8000802c9130 <+0>: nop
> 0xffff8000802c9134 <+4>: nop
> 0xffff8000802c9138 <+8>: mrs x0, sp_el0
> 0xffff8000802c913c <+12>: ldr w0, [x0, #24]
> 0xffff8000802c9140 <+16>: ret
>
> A microbenchmark[1] was built to measure the performance improvement
> provided by this change. It calls the following function given number of
> times and finds the runtime overhead:
>
> static noinline int get_cpu_id(void)
> {
> return smp_processor_id();
> }
>
> Run the benchmark like:
> modprobe smp_processor_id nr_function_calls=1000000000
>
> +--------------------------+------------------------+
> | | Number of Calls | Time taken |
> +--------+-----------------+------------------------+
> | Before | 1000000000 | 1602888401ns |
> +--------+-----------------+------------------------+
> | After | 1000000000 | 1206212658ns |
> +--------+-----------------+------------------------+
> | Difference (decrease) | 396675743ns (24.74%) |
> +---------------------------------------------------+
>
> Remove the percpu variable cpu_number as it is used only in
> set_smp_ipi_range() as a dummy variable to be passed to ipi_handler().
> Use irq_stat in place of cpu_number here.
>
> [1] https://github.com/puranjaymohan/linux/commit/77d3fdd
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> Changes in v1 -> v2:
> v1: https://lore.kernel.org/all/20240501154236.10236-1-puranjay@kernel.org/
> - Remove the percpu variable cpu_number
> - Add more information to the commit message.
> ---
> arch/arm64/include/asm/smp.h | 13 +------------
> arch/arm64/kernel/smp.c | 9 ++-------
> 2 files changed, 3 insertions(+), 19 deletions(-)
>
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index efb13112b408..2510eec026f7 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -25,22 +25,11 @@
>
> #ifndef __ASSEMBLY__
>
> -#include <asm/percpu.h>
> -
> #include <linux/threads.h>
> #include <linux/cpumask.h>
> #include <linux/thread_info.h>
>
> -DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
> -
> -/*
> - * We don't use this_cpu_read(cpu_number) as that has implicit writes to
> - * preempt_count, and associated (compiler) barriers, that we'd like to avoid
> - * the expense of. If we're preemptible, the value can be stale at use anyway.
> - * And we can't use this_cpu_ptr() either, as that winds up recursing back
> - * here under CONFIG_DEBUG_PREEMPT=y.
> - */
> -#define raw_smp_processor_id() (*raw_cpu_ptr(&cpu_number))
> +#define raw_smp_processor_id() (current_thread_info()->cpu)
>
> /*
> * Logical CPU mapping.
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 4ced34f62dab..98d4e352c3d0 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -55,9 +55,6 @@
>
> #include <trace/events/ipi.h>
>
> -DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
> -EXPORT_PER_CPU_SYMBOL(cpu_number);
> -
> /*
> * as from 2.5, kernels no longer have an init_tasks structure
> * so we need some other way of telling a new secondary core
> @@ -742,8 +739,6 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> */
> for_each_possible_cpu(cpu) {
>
> - per_cpu(cpu_number, cpu) = cpu;
> -
> if (cpu == smp_processor_id())
> continue;
>
> @@ -1021,12 +1016,12 @@ void __init set_smp_ipi_range(int ipi_base, int n)
>
> if (ipi_should_be_nmi(i)) {
> err = request_percpu_nmi(ipi_base + i, ipi_handler,
> - "IPI", &cpu_number);
> + "IPI", &irq_stat);
> WARN(err, "Could not request IPI %d as NMI, err=%d\n",
> i, err);
> } else {
> err = request_percpu_irq(ipi_base + i, ipi_handler,
> - "IPI", &cpu_number);
> + "IPI", &irq_stat);
> WARN(err, "Could not request IPI %d as IRQ, err=%d\n",
> i, err);
> }
WARNING: multiple messages have this Message-ID (diff)
From: Anshuman Khandual <anshuman.khandual@arm.com>
To: Puranjay Mohan <puranjay@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Sumit Garg <sumit.garg@linaro.org>,
Stephen Boyd <swboyd@chromium.org>,
Douglas Anderson <dianders@chromium.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Cc: puranjay12@gmail.com
Subject: Re: [PATCH v2 2/2] arm64: implement raw_smp_processor_id() using thread_info
Date: Mon, 6 May 2024 12:27:02 +0530 [thread overview]
Message-ID: <7ceccfb1-e710-4189-8d1a-a8b4c2d184bc@arm.com> (raw)
In-Reply-To: <20240502123449.2690-2-puranjay@kernel.org>
On 5/2/24 18:04, Puranjay Mohan wrote:
> Historically, arm64 implemented raw_smp_processor_id() as a read of
> current_thread_info()->cpu. This changed when arm64 moved thread_info
> into task struct, as at the time CONFIG_THREAD_INFO_IN_TASK made core
> code use thread_struct::cpu for the cpu number, and due to header
> dependencies prevented using this in raw_smp_processor_id(). As a
> workaround, we moved to using a percpu variable in commit:
>
> commit 57c82954e77f ("arm64: make cpu number a percpu variable")
>
> Since then, thread_info::cpu was reintroduced, and core code was made to
> use this in commits:
>
> commit 001430c1910d ("arm64: add CPU field to struct thread_info")
> commit bcf9033e5449 ("sched: move CPU field back into thread_info if
> THREAD_INFO_IN_TASK=y")
>
> Consequently it is possible to use current_thread_info()->cpu again.
>
> This decreases the number of emitted instructions like in the following
> example:
>
> Dump of assembler code for function bpf_get_smp_processor_id:
> 0xffff8000802cd608 <+0>: nop
> 0xffff8000802cd60c <+4>: nop
> 0xffff8000802cd610 <+8>: adrp x0, 0xffff800082138000
> 0xffff8000802cd614 <+12>: mrs x1, tpidr_el1
> 0xffff8000802cd618 <+16>: add x0, x0, #0x8
> 0xffff8000802cd61c <+20>: ldrsw x0, [x0, x1]
> 0xffff8000802cd620 <+24>: ret
>
> After this patch:
>
> Dump of assembler code for function bpf_get_smp_processor_id:
> 0xffff8000802c9130 <+0>: nop
> 0xffff8000802c9134 <+4>: nop
> 0xffff8000802c9138 <+8>: mrs x0, sp_el0
> 0xffff8000802c913c <+12>: ldr w0, [x0, #24]
> 0xffff8000802c9140 <+16>: ret
>
> A microbenchmark[1] was built to measure the performance improvement
> provided by this change. It calls the following function given number of
> times and finds the runtime overhead:
>
> static noinline int get_cpu_id(void)
> {
> return smp_processor_id();
> }
>
> Run the benchmark like:
> modprobe smp_processor_id nr_function_calls=1000000000
>
> +--------------------------+------------------------+
> | | Number of Calls | Time taken |
> +--------+-----------------+------------------------+
> | Before | 1000000000 | 1602888401ns |
> +--------+-----------------+------------------------+
> | After | 1000000000 | 1206212658ns |
> +--------+-----------------+------------------------+
> | Difference (decrease) | 396675743ns (24.74%) |
> +---------------------------------------------------+
>
> Remove the percpu variable cpu_number as it is used only in
> set_smp_ipi_range() as a dummy variable to be passed to ipi_handler().
> Use irq_stat in place of cpu_number here.
>
> [1] https://github.com/puranjaymohan/linux/commit/77d3fdd
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
> Changes in v1 -> v2:
> v1: https://lore.kernel.org/all/20240501154236.10236-1-puranjay@kernel.org/
> - Remove the percpu variable cpu_number
> - Add more information to the commit message.
> ---
> arch/arm64/include/asm/smp.h | 13 +------------
> arch/arm64/kernel/smp.c | 9 ++-------
> 2 files changed, 3 insertions(+), 19 deletions(-)
>
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index efb13112b408..2510eec026f7 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -25,22 +25,11 @@
>
> #ifndef __ASSEMBLY__
>
> -#include <asm/percpu.h>
> -
> #include <linux/threads.h>
> #include <linux/cpumask.h>
> #include <linux/thread_info.h>
>
> -DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
> -
> -/*
> - * We don't use this_cpu_read(cpu_number) as that has implicit writes to
> - * preempt_count, and associated (compiler) barriers, that we'd like to avoid
> - * the expense of. If we're preemptible, the value can be stale at use anyway.
> - * And we can't use this_cpu_ptr() either, as that winds up recursing back
> - * here under CONFIG_DEBUG_PREEMPT=y.
> - */
> -#define raw_smp_processor_id() (*raw_cpu_ptr(&cpu_number))
> +#define raw_smp_processor_id() (current_thread_info()->cpu)
>
> /*
> * Logical CPU mapping.
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 4ced34f62dab..98d4e352c3d0 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -55,9 +55,6 @@
>
> #include <trace/events/ipi.h>
>
> -DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
> -EXPORT_PER_CPU_SYMBOL(cpu_number);
> -
> /*
> * as from 2.5, kernels no longer have an init_tasks structure
> * so we need some other way of telling a new secondary core
> @@ -742,8 +739,6 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> */
> for_each_possible_cpu(cpu) {
>
> - per_cpu(cpu_number, cpu) = cpu;
> -
> if (cpu == smp_processor_id())
> continue;
>
> @@ -1021,12 +1016,12 @@ void __init set_smp_ipi_range(int ipi_base, int n)
>
> if (ipi_should_be_nmi(i)) {
> err = request_percpu_nmi(ipi_base + i, ipi_handler,
> - "IPI", &cpu_number);
> + "IPI", &irq_stat);
> WARN(err, "Could not request IPI %d as NMI, err=%d\n",
> i, err);
> } else {
> err = request_percpu_irq(ipi_base + i, ipi_handler,
> - "IPI", &cpu_number);
> + "IPI", &irq_stat);
> WARN(err, "Could not request IPI %d as IRQ, err=%d\n",
> i, err);
> }
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-05-06 6:57 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-02 12:34 [PATCH v2 1/2] arm64/arch_timer: include <linux/percpu.h> Puranjay Mohan
2024-05-02 12:34 ` Puranjay Mohan
2024-05-02 12:34 ` [PATCH v2 2/2] arm64: implement raw_smp_processor_id() using thread_info Puranjay Mohan
2024-05-02 12:34 ` Puranjay Mohan
2024-05-03 9:14 ` Anshuman Khandual
2024-05-03 9:14 ` Anshuman Khandual
2024-05-03 15:30 ` Mark Rutland
2024-05-03 15:30 ` Mark Rutland
2024-05-06 6:57 ` Anshuman Khandual [this message]
2024-05-06 6:57 ` Anshuman Khandual
2024-05-03 9:07 ` [PATCH v2 1/2] arm64/arch_timer: include <linux/percpu.h> Anshuman Khandual
2024-05-03 9:07 ` Anshuman Khandual
2024-05-03 9:44 ` Puranjay Mohan
2024-05-03 9:44 ` Puranjay Mohan
2024-05-06 6:10 ` Anshuman Khandual
2024-05-06 6:10 ` Anshuman Khandual
2024-05-03 15:21 ` Mark Rutland
2024-05-03 15:21 ` Mark Rutland
2024-05-03 15:14 ` Mark Rutland
2024-05-03 15:14 ` Mark Rutland
2024-05-06 6:55 ` Anshuman Khandual
2024-05-06 6:55 ` Anshuman Khandual
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7ceccfb1-e710-4189-8d1a-a8b4c2d184bc@arm.com \
--to=anshuman.khandual@arm.com \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=dianders@chromium.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=peterz@infradead.org \
--cc=puranjay12@gmail.com \
--cc=puranjay@kernel.org \
--cc=sumit.garg@linaro.org \
--cc=swboyd@chromium.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.