* [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]()
@ 2025-04-14 8:10 Uros Bizjak
2025-04-14 8:10 ` [PATCH 2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff() Uros Bizjak
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Uros Bizjak @ 2025-04-14 8:10 UTC (permalink / raw)
To: kvm, x86, linux-kernel
Cc: Uros Bizjak, Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
Use asm_inline() to instruct the compiler that the size of asm()
is the minimum size of one instruction, ignoring how many instructions
the compiler thinks it is. ALTERNATIVE macro that expands to several
pseudo directives causes instruction length estimate to count
more than 20 instructions.
bloat-o-meter reports minimal code size increase
(x86_64 defconfig, gcc-14.2.1):
add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10)
Function old new delta
-----------------------------------------------------
__send_ipi_mask 525 535 +10
Total: Before=23751224, After=23751234, chg +0.00%
due to different compiler decisions with more precise size
estimations.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/include/asm/kvm_para.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 57bc74e112f2..519ab5aee250 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -38,7 +38,7 @@ static inline long kvm_hypercall0(unsigned int nr)
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return tdx_kvm_hypercall(nr, 0, 0, 0, 0);
- asm volatile(KVM_HYPERCALL
+ asm_inline volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr)
: "memory");
@@ -52,7 +52,7 @@ static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return tdx_kvm_hypercall(nr, p1, 0, 0, 0);
- asm volatile(KVM_HYPERCALL
+ asm_inline volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1)
: "memory");
@@ -67,7 +67,7 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return tdx_kvm_hypercall(nr, p1, p2, 0, 0);
- asm volatile(KVM_HYPERCALL
+ asm_inline volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2)
: "memory");
@@ -82,7 +82,7 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return tdx_kvm_hypercall(nr, p1, p2, p3, 0);
- asm volatile(KVM_HYPERCALL
+ asm_inline volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2), "d"(p3)
: "memory");
@@ -98,7 +98,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return tdx_kvm_hypercall(nr, p1, p2, p3, p4);
- asm volatile(KVM_HYPERCALL
+ asm_inline volatile(KVM_HYPERCALL
: "=a"(ret)
: "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4)
: "memory");
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff()
2025-04-14 8:10 [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]() Uros Bizjak
@ 2025-04-14 8:10 ` Uros Bizjak
2025-04-15 1:05 ` Sean Christopherson
2025-04-15 1:04 ` [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]() Sean Christopherson
2025-04-25 23:23 ` Sean Christopherson
2 siblings, 1 reply; 6+ messages in thread
From: Uros Bizjak @ 2025-04-14 8:10 UTC (permalink / raw)
To: kvm, x86, linux-kernel
Cc: Uros Bizjak, Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
Micro-optimize vmx_do_interrupt_irqoff() by substituting
MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent
LEAVE instruction. GCC compiler does this by default for
a generic tuning and for all modern processors:
DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_ZHAOXIN
| m_TREMONT | m_CORE_HYBRID | m_CORE_ATOM | m_GENERIC)
The new code also saves a couple of bytes, from:
27: 48 89 ec mov %rbp,%rsp
2a: 5d pop %rbp
to:
27: c9 leave
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/kvm/vmx/vmenter.S | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index f6986dee6f8c..0a6cf5bff2aa 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -59,8 +59,7 @@
* without the explicit restore, thinks the stack is getting walloped.
* Using an unwind hint is problematic due to x86-64's dynamic alignment.
*/
- mov %_ASM_BP, %_ASM_SP
- pop %_ASM_BP
+ leave
RET
.endm
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]()
2025-04-14 8:10 [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]() Uros Bizjak
2025-04-14 8:10 ` [PATCH 2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff() Uros Bizjak
@ 2025-04-15 1:04 ` Sean Christopherson
2025-04-25 23:23 ` Sean Christopherson
2 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2025-04-15 1:04 UTC (permalink / raw)
To: Uros Bizjak
Cc: kvm, x86, linux-kernel, Paolo Bonzini, Vitaly Kuznetsov,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
Nit, this is guest code, i.e. should use "kvm/x86:" for the scope. No need to
send a new version just for that.
On Mon, Apr 14, 2025, Uros Bizjak wrote:
> Use asm_inline() to instruct the compiler that the size of asm()
> is the minimum size of one instruction, ignoring how many instructions
> the compiler thinks it is. ALTERNATIVE macro that expands to several
> pseudo directives causes instruction length estimate to count
> more than 20 instructions.
>
> bloat-o-meter reports minimal code size increase
> (x86_64 defconfig, gcc-14.2.1):
>
> add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10)
>
> Function old new delta
> -----------------------------------------------------
> __send_ipi_mask 525 535 +10
>
> Total: Before=23751224, After=23751234, chg +0.00%
>
> due to different compiler decisions with more precise size
> estimations.
>
> No functional change intended.
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
> arch/x86/include/asm/kvm_para.h | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 57bc74e112f2..519ab5aee250 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -38,7 +38,7 @@ static inline long kvm_hypercall0(unsigned int nr)
> if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> return tdx_kvm_hypercall(nr, 0, 0, 0, 0);
>
> - asm volatile(KVM_HYPERCALL
> + asm_inline volatile(KVM_HYPERCALL
> : "=a"(ret)
> : "a"(nr)
> : "memory");
> @@ -52,7 +52,7 @@ static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
> if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> return tdx_kvm_hypercall(nr, p1, 0, 0, 0);
>
> - asm volatile(KVM_HYPERCALL
> + asm_inline volatile(KVM_HYPERCALL
> : "=a"(ret)
> : "a"(nr), "b"(p1)
> : "memory");
> @@ -67,7 +67,7 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
> if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> return tdx_kvm_hypercall(nr, p1, p2, 0, 0);
>
> - asm volatile(KVM_HYPERCALL
> + asm_inline volatile(KVM_HYPERCALL
> : "=a"(ret)
> : "a"(nr), "b"(p1), "c"(p2)
> : "memory");
> @@ -82,7 +82,7 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
> if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> return tdx_kvm_hypercall(nr, p1, p2, p3, 0);
>
> - asm volatile(KVM_HYPERCALL
> + asm_inline volatile(KVM_HYPERCALL
> : "=a"(ret)
> : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
> : "memory");
> @@ -98,7 +98,7 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
> if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
> return tdx_kvm_hypercall(nr, p1, p2, p3, p4);
>
> - asm volatile(KVM_HYPERCALL
> + asm_inline volatile(KVM_HYPERCALL
> : "=a"(ret)
> : "a"(nr), "b"(p1), "c"(p2), "d"(p3), "S"(p4)
> : "memory");
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff()
2025-04-14 8:10 ` [PATCH 2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff() Uros Bizjak
@ 2025-04-15 1:05 ` Sean Christopherson
2025-04-15 7:42 ` Uros Bizjak
0 siblings, 1 reply; 6+ messages in thread
From: Sean Christopherson @ 2025-04-15 1:05 UTC (permalink / raw)
To: Uros Bizjak
Cc: kvm, x86, linux-kernel, Paolo Bonzini, Vitaly Kuznetsov,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
On Mon, Apr 14, 2025, Uros Bizjak wrote:
> Micro-optimize vmx_do_interrupt_irqoff() by substituting
> MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent
> LEAVE instruction. GCC compiler does this by default for
> a generic tuning and for all modern processors:
Out of curisoity, is LEAVE actually a performance win, or is the benefit essentially
just the few code bytes saves?
> DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
> m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_ZHAOXIN
> | m_TREMONT | m_CORE_HYBRID | m_CORE_ATOM | m_GENERIC)
>
> The new code also saves a couple of bytes, from:
>
> 27: 48 89 ec mov %rbp,%rsp
> 2a: 5d pop %rbp
>
> to:
>
> 27: c9 leave
>
> No functional change intended.
>
> Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
> Cc: Sean Christopherson <seanjc@google.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
> arch/x86/kvm/vmx/vmenter.S | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index f6986dee6f8c..0a6cf5bff2aa 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -59,8 +59,7 @@
> * without the explicit restore, thinks the stack is getting walloped.
> * Using an unwind hint is problematic due to x86-64's dynamic alignment.
> */
> - mov %_ASM_BP, %_ASM_SP
> - pop %_ASM_BP
> + leave
> RET
> .endm
>
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff()
2025-04-15 1:05 ` Sean Christopherson
@ 2025-04-15 7:42 ` Uros Bizjak
0 siblings, 0 replies; 6+ messages in thread
From: Uros Bizjak @ 2025-04-15 7:42 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, x86, linux-kernel, Paolo Bonzini, Vitaly Kuznetsov,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin
On Tue, Apr 15, 2025 at 3:05 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Apr 14, 2025, Uros Bizjak wrote:
> > Micro-optimize vmx_do_interrupt_irqoff() by substituting
> > MOV %RBP,%RSP; POP %RBP instruction sequence with equivalent
> > LEAVE instruction. GCC compiler does this by default for
> > a generic tuning and for all modern processors:
>
> Out of curisoity, is LEAVE actually a performance win, or is the benefit essentially
> just the few code bytes saves?
It is hard to say for out-of-order execution cores, especially when
the stack engine is thrown to the mix (these two instructions, plus
following RET, all update %rsp).
The pragmatic solution was to do what the compiler does and use the
compiler's choice, based on the tuning below.
> > DEF_TUNE (X86_TUNE_USE_LEAVE, "use_leave",
> > m_386 | m_CORE_ALL | m_K6_GEODE | m_AMD_MULTIPLE | m_ZHAOXIN
> > | m_TREMONT | m_CORE_HYBRID | m_CORE_ATOM | m_GENERIC)
The tuning is updated when a new target is introduced to the compiler
and is based on various measurements by the processor manufacturer.
The above covers the majority of recent processors (plus generic
tuning), so I guess we won't fail by following the suit. OTOH, any
performance difference will be negligible.
> > The new code also saves a couple of bytes, from:
> >
> > 27: 48 89 ec mov %rbp,%rsp
> > 2a: 5d pop %rbp
> >
> > to:
> >
> > 27: c9 leave
Thanks,
Uros.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]()
2025-04-14 8:10 [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]() Uros Bizjak
2025-04-14 8:10 ` [PATCH 2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff() Uros Bizjak
2025-04-15 1:04 ` [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]() Sean Christopherson
@ 2025-04-25 23:23 ` Sean Christopherson
2 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2025-04-25 23:23 UTC (permalink / raw)
To: Sean Christopherson, kvm, x86, linux-kernel, Uros Bizjak
Cc: Paolo Bonzini, Vitaly Kuznetsov, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin
On Mon, 14 Apr 2025 10:10:50 +0200, Uros Bizjak wrote:
> Use asm_inline() to instruct the compiler that the size of asm()
> is the minimum size of one instruction, ignoring how many instructions
> the compiler thinks it is. ALTERNATIVE macro that expands to several
> pseudo directives causes instruction length estimate to count
> more than 20 instructions.
>
> bloat-o-meter reports minimal code size increase
> (x86_64 defconfig, gcc-14.2.1):
>
> [...]
Applied patch 2 to kvm-x86 vmx (I'll let Paolo grab the guest change). Thanks!
[1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]()
(no commit info)
[2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff()
https://github.com/kvm-x86/linux/commit/798b9b1cb0e5
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-25 23:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 8:10 [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]() Uros Bizjak
2025-04-14 8:10 ` [PATCH 2/2] KVM: VMX: Use LEAVE in vmx_do_interrupt_irqoff() Uros Bizjak
2025-04-15 1:05 ` Sean Christopherson
2025-04-15 7:42 ` Uros Bizjak
2025-04-15 1:04 ` [PATCH 1/2] KVM: x86: Use asm_inline() instead of asm() in kvm_hypercall[0-4]() Sean Christopherson
2025-04-25 23:23 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).