From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E47632E6A6 for ; Mon, 24 Nov 2025 21:55:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.17 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764021336; cv=none; b=AwAe6a4OpnbvvFsuo1+Tiyt4BCtgEPoFTuWjpqDkmVL7+tHEBnUkevSf2wjYrbFw5bqPV3Flc/vIHXAWkp4OCxz9zXUgcZONfyEv7OL1/tZHq+LsCF7XmsUSxN7gCc5mdXHivuoQnHLMqP1e8c0/ZadMsiPLFxsUCeCARLakoPM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764021336; c=relaxed/simple; bh=co13dCBjZc4+8yBtl6x3Cq0h35VGkK+q7LnxlJeQ5h8=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=R+L/njjmS8IE7wfBX3/VFlm3BJBG+PyzLE5b/Apa/r17qZMVCRRYNiBr8OEwGCmeqxYWafFt5lY+a2CY3lsEGvhhSuWwS10vsBVAL1O6K8A4Vl5tiD3Lnlundbvv3O/t5ED++iDs7QlwyERz49MxUDGvx4i/Us98F4MYqr5lxCY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UIv8Waog; arc=none smtp.client-ip=198.175.65.17 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UIv8Waog" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764021334; x=1795557334; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=co13dCBjZc4+8yBtl6x3Cq0h35VGkK+q7LnxlJeQ5h8=; b=UIv8WaogX4xjvZq3bSvtPacs1QEmHM6WkQSHCMV6H+cxyWabgrwbfqer WFawDju3XSkWcKu7Y+3dwTsk1uzgV6tE29f74eG7jMD0/8HbSCxLRQjLG vpufrJlfxC5Baud+GyO4gG+d7OzJbYt+93HP4ljic13Bhs329ioMlpRw0 ndmw42sWdluYrdtgUxOzlT3v41YeDar8rkjjEhkvuNyjzNhjxQmceBFhh m/focaNVtDBH9JIngKfGMea6Ui+qs33sz8vw4U4dgxUE8yu53fu0Hpn9v pgwu89q4jYuVScmYPMVDJ7EXpeT3p1sO5HdW0n283C6474HrP+5Xy86oU g==; X-CSE-ConnectionGUID: mOFTKT6XQdubCL4rk425Xw== X-CSE-MsgGUID: D/2QrKDWSRGacKL3ZWRzIg== X-IronPort-AV: E=McAfee;i="6800,10657,11623"; a="65985339" X-IronPort-AV: E=Sophos;i="6.20,223,1758610800"; d="scan'208";a="65985339" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa109.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2025 13:55:34 -0800 X-CSE-ConnectionGUID: Q+GO6FX3SOKrtB8gPoKmTw== X-CSE-MsgGUID: ms0OklHiRwGfwYMEDo8ZWw== X-ExtLoop1: 1 Received: from chang-linux-3.sc.intel.com (HELO chang-linux-3) ([172.25.66.172]) by fmviesa003.fm.intel.com with ESMTP; 24 Nov 2025 13:55:33 -0800 From: "Chang S. Bae" To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, chang.seok.bae@intel.com Subject: [DISCUSSION] x86: In-Kernel Use of Extended General-Purpose Registers Date: Mon, 24 Nov 2025 21:32:23 +0000 Message-ID: <20251124213227.123779-1-chang.seok.bae@intel.com> X-Mailer: git-send-email 2.51.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hi all, I’d like to initiate a discussion on this topic. The attached patchset is *not* intended for upstream now. Instead, its purpose is simply to serve as an example of how the kernel might use these registers. Beyond a quick look, it will be likely wasting your time if deeply reviewing the attached patches. == Background == Advanced Performance Extensions (APX) introduces additional GPRs: R16–R32 (EGPRs) [1]. These EGPRs are accessible via new prefix encodings on legacy instructions. Their state is handled through XSAVE, and support for this new XSTATE component was merged in v6.16 [2]. So far, APX is primarily targeted toward userspace enablement. However, in-kernel use still needs to be explored. Ingo previously noted that EGPRs may help reduce kernel stack pressue [3], and this topic comes up in the x86 microconference at LPC [4]. I hope this posting can circulate some thoughts along with an example ahead. == Possible Approaches == (1) Selective and Limited Use This follows how vector registers are used today in places like crypto routines. AVX state usage is bracketed by kernel_fpu_begin() / kernel_fpu_end(). EGPRs could be similarly used in a small bounded region. Under this model: * No changes are needed to the existing XSTATE management API. * Preemption and softirqs would be disabled while EGPRs are live, subsequently limiting usage to small regions. * This lends itself mostly to hand-written assembly, which is less scalable for broader adoption. PATCH3 in the attached set shows an example of this kind usage. (2) Broader or Tree-wide Adoption If the goal is to substantially reduce stack pressure or improve performance more broadly, EGPR usage would need to expand to larger regions. This raises some considerations: * The usage window would become too large to keep preemption disabled. In that case, the wrapper-based approach becomes infeasible. * The EGPR state would then need to be switched on entry to ensure a clean separation as APX usage becomes more pervasive. This could be handled by extending struct pt_regs or another structure. * The kernel must be able to select between legacy mode and APX, since APX remains optional for backward compatibility. Conversely, APX-only kernel image won't be distributed. * This suggests some level of code duplication or alternate code paths as an unavoidable trade-off. As the usage grows, so does image size, which raises the bar for demonstrating a measurable benefit. * At that scale, adoption will likely rely on compiler support. Their code-generation and optimization behavior need to be examined and ensured in advance. == Discussions == Given the above, a staged adoption may make sense. EGPR usage could begin in self-contained libraries or performance-critical paths, being evaluted incrementally as hardware becomes more broadly available. Now here are some questions to discuss preliminary: * Does this overall framing make sense? * Are there alternative or more pragmatic approaches for adoption? * Which kernel subsystems or hot paths might benefit most from early experimentation with EGPRs? Thanks, Chang [1] https://cdrdv2.intel.com/v1/dl/getContent/784266 [2] https://lore.kernel.org/lkml/aDL35MA4vH0wQ6Gb@gmail.com/ [3] https://lore.kernel.org/lkml/Z8C57rzRt90obAFg@gmail.com/ [4] https://lpc.events/event/19/contributions/2028/ Chang S. Bae (3): x86/lib: Refactor csum_partial_copy_generic() into a macro x86/lib: Convert repeated asm sequences in checksum copy into macros x86/lib: Use EGPRs in 64-bit checksum copy loop arch/x86/Kconfig | 6 + arch/x86/Kconfig.assembler | 6 + arch/x86/include/asm/checksum_64.h | 24 ++- arch/x86/lib/csum-copy_64.S | 282 +++++++++++++++++------------ 4 files changed, 206 insertions(+), 112 deletions(-) base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d -- 2.51.0