All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Chang S. Bae" <chang.seok.bae@intel.com>
To: linux-kernel@vger.kernel.org
Cc: x86@kernel.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com,
	chang.seok.bae@intel.com
Subject: [DISCUSSION] x86: In-Kernel Use of Extended General-Purpose Registers
Date: Mon, 24 Nov 2025 21:32:23 +0000	[thread overview]
Message-ID: <20251124213227.123779-1-chang.seok.bae@intel.com> (raw)

Hi all,

I’d like to initiate a discussion on this topic. The attached patchset
is *not* intended for upstream now. Instead, its purpose is simply to
serve as an example of how the kernel might use these registers. Beyond
a quick look, it will be likely wasting your time if deeply reviewing the
attached patches.

== Background ==

Advanced Performance Extensions (APX) introduces additional GPRs: R16–R32
(EGPRs) [1]. These EGPRs are accessible via new prefix encodings on
legacy instructions. Their state is handled through XSAVE, and support
for this new XSTATE component was merged in v6.16 [2]. So far, APX is
primarily targeted toward userspace enablement.

However, in-kernel use still needs to be explored. Ingo previously noted
that EGPRs may help reduce kernel stack pressue [3], and this topic comes
up in the x86 microconference at LPC [4]. I hope this posting can
circulate some thoughts along with an example ahead.

== Possible Approaches ==

(1) Selective and Limited Use

This follows how vector registers are used today in places like crypto
routines. AVX state usage is bracketed by kernel_fpu_begin() /
kernel_fpu_end(). EGPRs could be similarly used in a small bounded
region.

Under this model:

  * No changes are needed to the existing XSTATE management API.

  * Preemption and softirqs would be disabled while EGPRs are live,
    subsequently limiting usage to small regions.

  * This lends itself mostly to hand-written assembly, which is less
    scalable for broader adoption.

PATCH3 in the attached set shows an example of this kind usage.

(2) Broader or Tree-wide Adoption

If the goal is to substantially reduce stack pressure or improve
performance more broadly, EGPR usage would need to expand to larger
regions. This raises some considerations:

  * The usage window would become too large to keep preemption disabled.
    In that case, the wrapper-based approach becomes infeasible.

  * The EGPR state would then need to be switched on entry to ensure a
    clean separation as APX usage becomes more pervasive. This could be
    handled by extending struct pt_regs or another structure.

  * The kernel must be able to select between legacy mode and APX,
    since APX remains optional for backward compatibility. Conversely,
    APX-only kernel image won't be distributed.

  * This suggests some level of code duplication or alternate code paths
    as an unavoidable trade-off. As the usage grows, so does image size,
    which raises the bar for demonstrating a measurable benefit.

  * At that scale, adoption will likely rely on compiler support. Their
    code-generation and optimization behavior need to be examined and
    ensured in advance.

== Discussions ==

Given the above, a staged adoption may make sense. EGPR usage could
begin in self-contained libraries or performance-critical paths, being
evaluted incrementally as hardware becomes more broadly available.

Now here are some questions to discuss preliminary:

  * Does this overall framing make sense?
  * Are there alternative or more pragmatic approaches for adoption?
  * Which kernel subsystems or hot paths might benefit most from early
    experimentation with EGPRs?

Thanks,
Chang

[1] https://cdrdv2.intel.com/v1/dl/getContent/784266
[2] https://lore.kernel.org/lkml/aDL35MA4vH0wQ6Gb@gmail.com/
[3] https://lore.kernel.org/lkml/Z8C57rzRt90obAFg@gmail.com/
[4] https://lpc.events/event/19/contributions/2028/

Chang S. Bae (3):
  x86/lib: Refactor csum_partial_copy_generic() into a macro
  x86/lib: Convert repeated asm sequences in checksum copy into macros
  x86/lib: Use EGPRs in 64-bit checksum copy loop

 arch/x86/Kconfig                   |   6 +
 arch/x86/Kconfig.assembler         |   6 +
 arch/x86/include/asm/checksum_64.h |  24 ++-
 arch/x86/lib/csum-copy_64.S        | 282 +++++++++++++++++------------
 4 files changed, 206 insertions(+), 112 deletions(-)


base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d
-- 
2.51.0


             reply	other threads:[~2025-11-24 21:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-24 21:32 Chang S. Bae [this message]
2025-11-24 21:32 ` [RFC PATCH 1/3] x86/lib: Refactor csum_partial_copy_generic() into a macro Chang S. Bae
2025-11-24 21:32 ` [RFC PATCH 2/3] x86/lib: Convert repeated asm sequences in checksum copy into macros Chang S. Bae
2025-11-24 21:32 ` [RFC PATCH 3/3] x86/lib: Use EGPRs in 64-bit checksum copy loop Chang S. Bae
2025-11-25 10:37   ` david laight
2025-12-01 21:39     ` Chang S. Bae
2025-11-26 16:30 ` [DISCUSSION] x86: In-Kernel Use of Extended General-Purpose Registers Peter Zijlstra
2025-12-01 21:40   ` Chang S. Bae

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251124213227.123779-1-chang.seok.bae@intel.com \
    --to=chang.seok.bae@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.