public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Uros Bizjak <ubizjak@gmail.com>
To: x86@kernel.org, linux-kernel@vger.kernel.org
Cc: Uros Bizjak <ubizjak@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Nadav Amit <namit@vmware.com>, Ingo Molnar <mingo@kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: [PATCH tip] x86/percpu: Rewrite arch_raw_cpu_ptr()
Date: Wed, 11 Oct 2023 22:40:36 +0200	[thread overview]
Message-ID: <20231011204150.51166-1-ubizjak@gmail.com> (raw)

Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then
add the ptr value to the base. This way, the compiler can propagate
addend to the following instruction and simplify address calculation.

E.g.: address calcuation in amd_pmu_enable_virt() improves from:

    48 c7 c0 00 00 00 00 	mov    $0x0,%rax
	87b7: R_X86_64_32S	cpu_hw_events

    65 48 03 05 00 00 00 	add    %gs:0x0(%rip),%rax
    00
	87bf: R_X86_64_PC32	this_cpu_off-0x4

    48 c7 80 28 13 00 00 	movq   $0x0,0x1328(%rax)
    00 00 00 00

to:

    65 48 8b 05 00 00 00 	mov    %gs:0x0(%rip),%rax
    00
	8798: R_X86_64_PC32	this_cpu_off-0x4
    48 c7 80 00 00 00 00 	movq   $0x0,0x0(%rax)
    00 00 00 00
	87a6: R_X86_64_32S	cpu_hw_events+0x1328

The compiler can also eliminate redundant loads from this_cpu_off,
reducing the number of percpu offset reads (either from this_cpu_off
or with rdgsbase) from 1663 to 1571.

Additionaly, the patch introduces 'rdgsbase' alternative for CPUs with
X86_FEATURE_FSGSBASE. The rdgsbase instruction *probably* will end up
only decoding in the first decoder etc. But we're talking single-cycle
kind of effects, and the rdgsbase case should be much better from
a cache perspective and might use fewer memory pipeline resources to
offset the fact that it uses an unusual front end decoder resource...

The only drawback of the patch is larger binary size:

  text    data     bss     dec     hex filename
25546594        4387686  808452 30742732        1d518cc vmlinux-new.o
25515256        4387814  808452 30711522        1d49ee2 vmlinux-old.o

that increases by 31k (0.123%), due to 1578 rdgsbase altinstructions
that are placed in the text section. The increase in text-size is not
"real" - the 'rdgsbase' instruction should be smaller than a 'mov %gs';
binary size increases because we obviously have two instructions, but
the actual *executable* part likely stays the same, and it's just that
we grow the altinstruction metadata.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/percpu.h | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 60ea7755c0fe..e047a0bc5554 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -49,18 +49,32 @@
 #define __force_percpu_prefix	"%%"__stringify(__percpu_seg)":"
 #define __my_cpu_offset		this_cpu_read(this_cpu_off)
 
-/*
- * Compared to the generic __my_cpu_offset version, the following
- * saves one instruction and avoids clobbering a temp register.
- */
+#ifdef CONFIG_X86_64
+#define arch_raw_cpu_ptr(ptr)					\
+({								\
+	unsigned long tcp_ptr__;				\
+	asm (ALTERNATIVE("movq " __percpu_arg(1) ", %0",	\
+			 "rdgsbase %0",				\
+			 X86_FEATURE_FSGSBASE)			\
+	     : "=r" (tcp_ptr__)					\
+	     : "m" (__my_cpu_var(this_cpu_off)));		\
+								\
+	tcp_ptr__ += (unsigned long)(ptr);			\
+	(typeof(*(ptr)) __kernel __force *)tcp_ptr__;		\
+})
+#else /* CONFIG_X86_64 */
 #define arch_raw_cpu_ptr(ptr)					\
 ({								\
 	unsigned long tcp_ptr__;				\
-	asm ("add " __percpu_arg(1) ", %0"			\
+	asm ("movl " __percpu_arg(1) ", %0"			\
 	     : "=r" (tcp_ptr__)					\
-	     : "m" (__my_cpu_var(this_cpu_off)), "0" (ptr));	\
+	     : "m" (__my_cpu_var(this_cpu_off)));		\
+								\
+	tcp_ptr__ += (unsigned long)(ptr);			\
 	(typeof(*(ptr)) __kernel __force *)tcp_ptr__;		\
 })
+#endif /* CONFIG_X86_64 */
+
 #else /* CONFIG_SMP */
 #define __percpu_seg_override
 #define __percpu_prefix		""
-- 
2.41.0


             reply	other threads:[~2023-10-11 20:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-11 20:40 Uros Bizjak [this message]
2023-10-13 16:04 ` [PATCH tip] x86/percpu: Rewrite arch_raw_cpu_ptr() Sean Christopherson
2023-10-13 19:30   ` Uros Bizjak
2023-10-13 20:07     ` Linus Torvalds
2023-10-13 21:02     ` Sean Christopherson
2023-10-14 10:04 ` Ingo Molnar
2023-10-14 10:34   ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231011204150.51166-1-ubizjak@gmail.com \
    --to=ubizjak@gmail.com \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox