From: Puranjay Mohan <puranjay@kernel.org>
To: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Sumit Garg <sumit.garg@linaro.org>,
Stephen Boyd <swboyd@chromium.org>,
Douglas Anderson <dianders@chromium.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Cc: puranjay12@gmail.com
Subject: [PATCH] arm64: implement raw_smp_processor_id() using thread_info
Date: Wed, 1 May 2024 15:42:36 +0000 [thread overview]
Message-ID: <20240501154236.10236-1-puranjay@kernel.org> (raw)
ARM64 defines THREAD_INFO_IN_TASK which means the cpu id can be found
from current_thread_info()->cpu.
Implement raw_smp_processor_id() using the above. This decreases the
number of emitted instructions like in the following example:
Dump of assembler code for function bpf_get_smp_processor_id:
0xffff8000802cd608 <+0>: nop
0xffff8000802cd60c <+4>: nop
0xffff8000802cd610 <+8>: adrp x0, 0xffff800082138000
0xffff8000802cd614 <+12>: mrs x1, tpidr_el1
0xffff8000802cd618 <+16>: add x0, x0, #0x8
0xffff8000802cd61c <+20>: ldrsw x0, [x0, x1]
0xffff8000802cd620 <+24>: ret
After this patch:
Dump of assembler code for function bpf_get_smp_processor_id:
0xffff8000802c9130 <+0>: nop
0xffff8000802c9134 <+4>: nop
0xffff8000802c9138 <+8>: mrs x0, sp_el0
0xffff8000802c913c <+12>: ldr w0, [x0, #24]
0xffff8000802c9140 <+16>: ret
A microbenchmark[1] was built to measure the performance improvement
provided by this change. It calls the following function given number of
times and finds the runtime overhead:
static noinline int get_cpu_id(void)
{
return smp_processor_id();
}
Run the benchmark like:
modprobe smp_processor_id nr_function_calls=1000000000
+--------------------------+------------------------+
| | Number of Calls | Time taken |
+--------+-----------------+------------------------+
| Before | 1000000000 | 1602888401ns |
+--------+-----------------+------------------------+
| After | 1000000000 | 1206212658ns |
+--------+-----------------+------------------------+
| Difference (decrease) | 396675743ns (24.74%) |
+---------------------------------------------------+
This improvement is in this very specific microbenchmark but it proves
the point.
The percpu variable cpu_number is left as it is because it is used in
set_smp_ipi_range()
[1] https://github.com/puranjaymohan/linux/commit/77d3fdd
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
arch/arm64/include/asm/smp.h | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index efb13112b408..88fd2ab805ec 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -34,13 +34,9 @@
DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
/*
- * We don't use this_cpu_read(cpu_number) as that has implicit writes to
- * preempt_count, and associated (compiler) barriers, that we'd like to avoid
- * the expense of. If we're preemptible, the value can be stale at use anyway.
- * And we can't use this_cpu_ptr() either, as that winds up recursing back
- * here under CONFIG_DEBUG_PREEMPT=y.
+ * This relies on THREAD_INFO_IN_TASK, but arm64 defines that unconditionally.
*/
-#define raw_smp_processor_id() (*raw_cpu_ptr(&cpu_number))
+#define raw_smp_processor_id() (current_thread_info()->cpu)
/*
* Logical CPU mapping.
--
2.40.1
WARNING: multiple messages have this Message-ID (diff)
From: Puranjay Mohan <puranjay@kernel.org>
To: Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>, Sumit Garg <sumit.garg@linaro.org>,
Stephen Boyd <swboyd@chromium.org>,
Douglas Anderson <dianders@chromium.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Mark Rutland <mark.rutland@arm.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Cc: puranjay12@gmail.com
Subject: [PATCH] arm64: implement raw_smp_processor_id() using thread_info
Date: Wed, 1 May 2024 15:42:36 +0000 [thread overview]
Message-ID: <20240501154236.10236-1-puranjay@kernel.org> (raw)
ARM64 defines THREAD_INFO_IN_TASK which means the cpu id can be found
from current_thread_info()->cpu.
Implement raw_smp_processor_id() using the above. This decreases the
number of emitted instructions like in the following example:
Dump of assembler code for function bpf_get_smp_processor_id:
0xffff8000802cd608 <+0>: nop
0xffff8000802cd60c <+4>: nop
0xffff8000802cd610 <+8>: adrp x0, 0xffff800082138000
0xffff8000802cd614 <+12>: mrs x1, tpidr_el1
0xffff8000802cd618 <+16>: add x0, x0, #0x8
0xffff8000802cd61c <+20>: ldrsw x0, [x0, x1]
0xffff8000802cd620 <+24>: ret
After this patch:
Dump of assembler code for function bpf_get_smp_processor_id:
0xffff8000802c9130 <+0>: nop
0xffff8000802c9134 <+4>: nop
0xffff8000802c9138 <+8>: mrs x0, sp_el0
0xffff8000802c913c <+12>: ldr w0, [x0, #24]
0xffff8000802c9140 <+16>: ret
A microbenchmark[1] was built to measure the performance improvement
provided by this change. It calls the following function given number of
times and finds the runtime overhead:
static noinline int get_cpu_id(void)
{
return smp_processor_id();
}
Run the benchmark like:
modprobe smp_processor_id nr_function_calls=1000000000
+--------------------------+------------------------+
| | Number of Calls | Time taken |
+--------+-----------------+------------------------+
| Before | 1000000000 | 1602888401ns |
+--------+-----------------+------------------------+
| After | 1000000000 | 1206212658ns |
+--------+-----------------+------------------------+
| Difference (decrease) | 396675743ns (24.74%) |
+---------------------------------------------------+
This improvement is in this very specific microbenchmark but it proves
the point.
The percpu variable cpu_number is left as it is because it is used in
set_smp_ipi_range()
[1] https://github.com/puranjaymohan/linux/commit/77d3fdd
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
arch/arm64/include/asm/smp.h | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index efb13112b408..88fd2ab805ec 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -34,13 +34,9 @@
DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
/*
- * We don't use this_cpu_read(cpu_number) as that has implicit writes to
- * preempt_count, and associated (compiler) barriers, that we'd like to avoid
- * the expense of. If we're preemptible, the value can be stale at use anyway.
- * And we can't use this_cpu_ptr() either, as that winds up recursing back
- * here under CONFIG_DEBUG_PREEMPT=y.
+ * This relies on THREAD_INFO_IN_TASK, but arm64 defines that unconditionally.
*/
-#define raw_smp_processor_id() (*raw_cpu_ptr(&cpu_number))
+#define raw_smp_processor_id() (current_thread_info()->cpu)
/*
* Logical CPU mapping.
--
2.40.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next reply other threads:[~2024-05-01 15:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-01 15:42 Puranjay Mohan [this message]
2024-05-01 15:42 ` [PATCH] arm64: implement raw_smp_processor_id() using thread_info Puranjay Mohan
2024-05-01 16:23 ` Christoph Lameter (Ampere)
2024-05-01 16:23 ` Christoph Lameter (Ampere)
2024-05-01 16:41 ` Mark Rutland
2024-05-01 16:41 ` Mark Rutland
2024-05-01 17:12 ` Puranjay Mohan
2024-05-01 17:12 ` Puranjay Mohan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240501154236.10236-1-puranjay@kernel.org \
--to=puranjay@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=dianders@chromium.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=peterz@infradead.org \
--cc=puranjay12@gmail.com \
--cc=sumit.garg@linaro.org \
--cc=swboyd@chromium.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.