All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	tglx@kernel.org, mingo@redhat.com, bp@alien8.de,
	Nathan Chancellor <nathan@kernel.org>,
	Calvin Owens <calvin@wbinvd.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	torvalds@linux-foundation.org, x86-ML <x86@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: 8aeb879baf12 - significant system call latency regression, bisected
Date: Tue, 16 Jun 2026 14:53:34 +0100	[thread overview]
Message-ID: <20260616145334.693c043a@pumpkin> (raw)
In-Reply-To: <20260616082814.GQ48970@noisy.programming.kicks-ass.net>

On Tue, 16 Jun 2026 10:28:14 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, Jun 13, 2026 at 06:50:24PM -0700, H. Peter Anvin wrote:
> 
> > OK, I have, I believe root-caused this.
> > 
> > It is a padding issue; removing the code changes __pfx_x64_sys_call to be
> > 32-byte aligned, with the result that x64_sys_call gets *mis*aligned.
> > 
> > Reverting the patch but adding an alignment statement to x64_sys_call
> > re-introduces the performance regression.
> > 
> > I am concerned because this could mean that the __pfx stubs add substantial
> > overhead elsewhere, unless this just happens to be a particularly sensitive
> > case...  
> 
> So what is the actual alignment requirement these days then? We're
> building the (x86_64) kernel with 16 byte function and 1 byte jump
> alignment.
> 
> So ISTR the Intel I-fetch window was 16 bytes, so the above things would
> make sense. However, Gemini, or whatever AI sits in google search, is
> trying to tell me Intel moved to 32 byte I-fetch with Alderlake.
> 
> That same thing is saying AMD switched to 32 byte I-fetch with Zen (1)
> and later.

Basically you can't win.
I was looking at why a patch didn't give the expected performance gain
on a different base kernel build.
It seems to depend on whether the function (actually strlen) was aligned
to an odd or even 16 byte boundary.
If aligned to an even boundary the loop inside the function crossed a
'significant' boundary and the code ran measurably slower.
If you start aligning loop tops and labels in general you probably lose
due to code bloat.
(Here the loop didn't need aligning, it just needed not to contain
the relevant boundary.)

In this case the extra padding will change the alignment of everything that
follows - and some of those might make a difference as well.

You'd need to add extra code further down the function to keep the size
the same (and hope the compiler keeps the functions in the same order).

	David


> 
> This all seems to suggest we do something like so, hmm?
> 
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index b9f5a4a3cc2a..65fff65271d0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -329,7 +329,9 @@ config X86
>  	select HAVE_ARCH_KCSAN			if X86_64
>  	select PROC_PID_ARCH_STATUS		if PROC_FS
>  	select HAVE_ARCH_NODE_DEV_GROUP		if X86_SGX
> -	select FUNCTION_ALIGNMENT_16B		if X86_64 || X86_ALIGNMENT_16
> +	# AMD-Zen+ and Intel-Alderlake+ moved to 32 byte I-fetch
> +	select FUNCTION_ALIGNMENT_32B		if X86_64
> +	select FUNCTION_ALIGNMENT_16B		if X86_ALIGNMENT_16
>  	select FUNCTION_ALIGNMENT_4B
>  	imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
>  	select HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
> 


  parent reply	other threads:[~2026-06-16 13:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-13  1:45 8aeb879baf12 - significant system call latency regression, bisected "H. Peter Anvin" (Intel)
2026-06-13  8:59 ` Peter Zijlstra
2026-06-13 20:34   ` H. Peter Anvin
2026-06-13 23:52     ` H. Peter Anvin
2026-06-14  1:50       ` H. Peter Anvin
2026-06-14 18:08         ` Xin Li
2026-06-14 18:31           ` H. Peter Anvin
2026-06-15  0:19         ` H. Peter Anvin
2026-06-15  2:07           ` H. Peter Anvin
2026-06-15  3:41             ` Linus Torvalds
2026-06-15 18:30               ` H. Peter Anvin
2026-06-16  7:12                 ` Peter Zijlstra
2026-06-16  7:38             ` Peter Zijlstra
2026-06-16  7:53             ` Peter Zijlstra
2026-06-16  8:28         ` Peter Zijlstra
2026-06-16  8:46           ` Linus Torvalds
2026-06-16  9:51             ` Ingo Molnar
2026-06-16 17:44               ` H. Peter Anvin
2026-06-16 13:53           ` David Laight [this message]
2026-06-14  2:11       ` Calvin Owens
2026-06-14  2:14         ` Calvin Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260616145334.693c043a@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=bp@alien8.de \
    --cc=calvin@wbinvd.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nathan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.