public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot
@ 2023-10-21 14:36 Uros Bizjak
  2023-10-21 14:39 ` Nadav Amit
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Uros Bizjak @ 2023-10-21 14:36 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
	Ingo Molnar, H . Peter Anvin, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Josh Poimboeuf, Nadav Amit

Some variables in pcpu_hot, currently current_task and top_of_stack
are actually per-thread variables implemented as per-cpu variables
and thus stable for the duration of the respective task.  There is
already an attempt to eliminate redundant reads from these variables
using this_cpu_read_stable() asm macro, which hides the dependency
on the read memory address. However, the compiler has limited ability
to eliminate asm common subexpressions, so this approach results in a
limited success.

The solution is to allow more aggressive elimination by aliasing
pcpu_hot into a const-qualified const_pcpu_hot, and to read stable
per-cpu variables from this constant copy.

The current per-cpu infrastructure does not support reads from
const-qualified variables. However, when the compiler supports segment
qualifiers, it is possible to declare the const-aliased variable in
the relevant named address space. The compiler considers access to the
variable, declared in this way, as a read from a constant location,
and will optimize reads from the variable accordingly.

By implementing constant-qualified const_pcpu_hot, the compiler can
eliminate redundant reads from the constant variables, reducing the
number of loads from current_task from 3766 to 3217 on a test build,
a 14.6% reduction.

The reduction of loads translates to the following code savings:

   text    data     bss     dec     hex filename
25477353        4389456  808452 30675261        1d4113d vmlinux-old.o
25476074        4389440  808452 30673966        1d40c2e vmlinux-new.o

representing a code size reduction of 1279 bytes.
---
v2: Export const_pcpu_hot symbol.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Co-developed-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 arch/x86/include/asm/current.h   | 7 +++++++
 arch/x86/include/asm/percpu.h    | 6 +++---
 arch/x86/include/asm/processor.h | 3 +++
 arch/x86/kernel/cpu/common.c     | 1 +
 arch/x86/kernel/vmlinux.lds.S    | 1 +
 include/linux/compiler.h         | 2 +-
 6 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index a1168e7b69e5..0538d2436673 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -36,8 +36,15 @@ static_assert(sizeof(struct pcpu_hot) == 64);
 
 DECLARE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);
 
+/* const-qualified alias to pcpu_hot, aliased by linker. */
+DECLARE_PER_CPU_ALIGNED(const struct pcpu_hot __percpu_seg_override,
+			const_pcpu_hot);
+
 static __always_inline struct task_struct *get_current(void)
 {
+	if (IS_ENABLED(CONFIG_USE_X86_SEG_SUPPORT))
+		return const_pcpu_hot.current_task;
+
 	return this_cpu_read_stable(pcpu_hot.current_task);
 }
 
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index bbcc1ca737f0..630bb912a46b 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -413,9 +413,9 @@ do {									\
  * accessed while this_cpu_read_stable() allows the value to be cached.
  * this_cpu_read_stable() is more efficient and can be used if its value
  * is guaranteed to be valid across cpus.  The current users include
- * get_current() and get_thread_info() both of which are actually
- * per-thread variables implemented as per-cpu variables and thus
- * stable for the duration of the respective task.
+ * pcpu_hot.current_task and pcpu_hot.top_of_stack, both of which are
+ * actually per-thread variables implemented as per-cpu variables and
+ * thus stable for the duration of the respective task.
  */
 #define this_cpu_read_stable_1(pcp)	percpu_stable_op(1, "mov", pcp)
 #define this_cpu_read_stable_2(pcp)	percpu_stable_op(2, "mov", pcp)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index ae81a7191c1c..a807025a4dee 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -533,6 +533,9 @@ static __always_inline unsigned long current_top_of_stack(void)
 	 *  and around vm86 mode and sp0 on x86_64 is special because of the
 	 *  entry trampoline.
 	 */
+	if (IS_ENABLED(CONFIG_USE_X86_SEG_SUPPORT))
+		return pcpu_hot.top_of_stack;
+
 	return this_cpu_read_stable(pcpu_hot.top_of_stack);
 }
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b14fc8c1c953..9058da9ae011 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2049,6 +2049,7 @@ DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot) = {
 	.top_of_stack	= TOP_OF_INIT_STACK,
 };
 EXPORT_PER_CPU_SYMBOL(pcpu_hot);
+EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
 
 #ifdef CONFIG_X86_64
 DEFINE_PER_CPU_FIRST(struct fixed_percpu_data,
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 54a5596adaa6..1239be7cc8d8 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -46,6 +46,7 @@ ENTRY(phys_startup_64)
 #endif
 
 jiffies = jiffies_64;
+const_pcpu_hot = pcpu_hot;
 
 #if defined(CONFIG_X86_64)
 /*
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index d7779a18b24f..bf9815eaf4aa 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -212,7 +212,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
  */
 #define ___ADDRESSABLE(sym, __attrs) \
 	static void * __used __attrs \
-		__UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym;
+	__UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)(uintptr_t)&sym;
 #define __ADDRESSABLE(sym) \
 	___ADDRESSABLE(sym, __section(".discard.addressable"))
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot
  2023-10-21 14:36 [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot Uros Bizjak
@ 2023-10-21 14:39 ` Nadav Amit
  2023-10-21 15:00 ` Ingo Molnar
  2023-10-23  9:29 ` Ingo Molnar
  2 siblings, 0 replies; 5+ messages in thread
From: Nadav Amit @ 2023-10-21 14:39 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List,
	Andy Lutomirski, Brian Gerst, Denys Vlasenko, Ingo Molnar,
	H . Peter Anvin, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Josh Poimboeuf



> On Oct 21, 2023, at 5:36 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> Some variables in pcpu_hot, currently current_task and top_of_stack
> are actually per-thread variables implemented as per-cpu variables
> and thus stable for the duration of the respective task

LGTM!


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot
  2023-10-21 14:36 [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot Uros Bizjak
  2023-10-21 14:39 ` Nadav Amit
@ 2023-10-21 15:00 ` Ingo Molnar
  2023-10-21 15:11   ` Uros Bizjak
  2023-10-23  9:29 ` Ingo Molnar
  2 siblings, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2023-10-21 15:00 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, linux-kernel, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
	H . Peter Anvin, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Josh Poimboeuf, Nadav Amit


* Uros Bizjak <ubizjak@gmail.com> wrote:

>  arch/x86/include/asm/current.h   | 7 +++++++
>  arch/x86/include/asm/percpu.h    | 6 +++---
>  arch/x86/include/asm/processor.h | 3 +++
>  arch/x86/kernel/cpu/common.c     | 1 +
>  arch/x86/kernel/vmlinux.lds.S    | 1 +
>  include/linux/compiler.h         | 2 +-
>  6 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
> index a1168e7b69e5..0538d2436673 100644
> --- a/arch/x86/include/asm/current.h
> +++ b/arch/x86/include/asm/current.h
> @@ -36,8 +36,15 @@ static_assert(sizeof(struct pcpu_hot) == 64);
>  
>  DECLARE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);
>  
> +/* const-qualified alias to pcpu_hot, aliased by linker. */
> +DECLARE_PER_CPU_ALIGNED(const struct pcpu_hot __percpu_seg_override,
> +			const_pcpu_hot);

The aliasing makes me a bit nervous. Could we at least prefix it a bit more 
prominently, like const__pcpu_hot? That way it's immediately obvious at all 
usage sites that it's "special".

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot
  2023-10-21 15:00 ` Ingo Molnar
@ 2023-10-21 15:11   ` Uros Bizjak
  0 siblings, 0 replies; 5+ messages in thread
From: Uros Bizjak @ 2023-10-21 15:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
	H . Peter Anvin, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Josh Poimboeuf, Nadav Amit

On Sat, Oct 21, 2023 at 5:00 PM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Uros Bizjak <ubizjak@gmail.com> wrote:
>
> >  arch/x86/include/asm/current.h   | 7 +++++++
> >  arch/x86/include/asm/percpu.h    | 6 +++---
> >  arch/x86/include/asm/processor.h | 3 +++
> >  arch/x86/kernel/cpu/common.c     | 1 +
> >  arch/x86/kernel/vmlinux.lds.S    | 1 +
> >  include/linux/compiler.h         | 2 +-
> >  6 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
> > index a1168e7b69e5..0538d2436673 100644
> > --- a/arch/x86/include/asm/current.h
> > +++ b/arch/x86/include/asm/current.h
> > @@ -36,8 +36,15 @@ static_assert(sizeof(struct pcpu_hot) == 64);
> >
> >  DECLARE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);
> >
> > +/* const-qualified alias to pcpu_hot, aliased by linker. */
> > +DECLARE_PER_CPU_ALIGNED(const struct pcpu_hot __percpu_seg_override,
> > +                     const_pcpu_hot);
>
> The aliasing makes me a bit nervous. Could we at least prefix it a bit more
> prominently, like const__pcpu_hot? That way it's immediately obvious at all
> usage sites that it's "special".

Sure, it can be renamed. The symbol - although aliased - may be used
in a general way. It is const-qualified and placed in __seg_gs address
space, so all the rules for const and __seg_gs qualifications apply.
However, the values are not that constant, and can be changed behind
the scenes via the pcpu_hot R/W alias.

Uros.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot
  2023-10-21 14:36 [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot Uros Bizjak
  2023-10-21 14:39 ` Nadav Amit
  2023-10-21 15:00 ` Ingo Molnar
@ 2023-10-23  9:29 ` Ingo Molnar
  2 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2023-10-23  9:29 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, linux-kernel, Andy Lutomirski, Brian Gerst, Denys Vlasenko,
	H . Peter Anvin, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Josh Poimboeuf, Nadav Amit


* Uros Bizjak <ubizjak@gmail.com> wrote:

>  DECLARE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);
>  
> +/* const-qualified alias to pcpu_hot, aliased by linker. */
> +DECLARE_PER_CPU_ALIGNED(const struct pcpu_hot __percpu_seg_override,
> +			const_pcpu_hot);

I added the fix below - just like pcpu_hot, const_pcpu_hot needs to be 
exported too, as it's (indirectly) used by various kernel modules.

Thanks,

	Ingo

---
 arch/x86/kernel/cpu/common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 382d4e6b848d..4cc0ab0dfbb5 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2051,6 +2051,7 @@ DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot) = {
 	.top_of_stack	= TOP_OF_INIT_STACK,
 };
 EXPORT_PER_CPU_SYMBOL(pcpu_hot);
+EXPORT_PER_CPU_SYMBOL(const_pcpu_hot);
 
 #ifdef CONFIG_X86_64
 DEFINE_PER_CPU_FIRST(struct fixed_percpu_data,

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-23  9:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-21 14:36 [PATCH -tip v2] x86/percpu: Introduce const-qualified const_pcpu_hot Uros Bizjak
2023-10-21 14:39 ` Nadav Amit
2023-10-21 15:00 ` Ingo Molnar
2023-10-21 15:11   ` Uros Bizjak
2023-10-23  9:29 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox