linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv8 00/17] x86: Enable Linear Address Space Separation support
@ 2025-07-01  9:58 Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 01/17] x86/cpu: Enumerate the LASS feature bits Kirill A. Shutemov
                   ` (16 more replies)
  0 siblings, 17 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

Linear Address Space Separation (LASS) is a security feature that intends to
prevent malicious virtual address space accesses across user/kernel mode.

Such mode based access protection already exists today with paging and features
such as SMEP and SMAP. However, to enforce these protections, the processor
must traverse the paging structures in memory.  Malicious software can use
timing information resulting from this traversal to determine details about the
paging structures, and these details may also be used to determine the layout
of the kernel memory.

The LASS mechanism provides the same mode-based protections as paging but
without traversing the paging structures. Because the protections enforced by
LASS are applied before paging, software will not be able to derive
paging-based timing information from the various caching structures such as the
TLBs, mid-level caches, page walker, data caches, etc. LASS can avoid probing
using double page faults, TLB flush and reload, and SW prefetch instructions.
See [2], [3] and [4] for some research on the related attack vectors.

Had it been available, LASS alone would have mitigated Meltdown. (Hindsight is
20/20 :)

In addition, LASS prevents an attack vector described in a Spectre LAM (SLAM)
whitepaper [7].

LASS enforcement relies on the typical kernel implementation to divide the
64-bit virtual address space into two halves:
  Addr[63]=0 -> User address space
  Addr[63]=1 -> Kernel address space
Any data access or code execution across address spaces typically results in a
#GP fault.

Kernel accesses usually only happen to the kernel address space. However, there
are valid reasons for kernel to access memory in the user half. For these cases
(such as text poking and EFI runtime accesses), the kernel can temporarily
suspend the enforcement of LASS by toggling SMAP (Supervisor Mode Access
Prevention) using the stac()/clac() instructions and in one instance a downright
disabling LASS for an EFI runtime call.

User space cannot access any kernel address while LASS is enabled.
Unfortunately, legacy vsyscall functions are located in the address range
0xffffffffff600000 - 0xffffffffff601000 and emulated in kernel.  To avoid
breaking user applications when LASS is enabled, extend the vsyscall emulation
in execute (XONLY) mode to the #GP fault handler.

In contrast, the vsyscall EMULATE mode is deprecated and not expected to be
used by anyone.  Supporting EMULATE mode with LASS would need complex
instruction decoding in the #GP fault handler and is probably not worth the
hassle. Disable LASS in this rare case when someone absolutely needs and
enables vsyscall=emulate via the command line.

Changes from v7[11]:
- Fix __inline_memset();
- Rename lass_disable/enable_enforcement() back to to lass_clac/stac()
- Generalize #GP address decode and hint code. Rename stuff to be
  non-GP-centric;
- Update commit messages and comments;

Changes from v6[10]:
- Rewrok #SS handler to work properly on FRED;
- Do not require X86_PF_INSTR to emulate vsyscall;
- Move lass_clac()/stac() definition to the patch where they are used;
- Rename lass_clac/stac() to lass_disable/enable_enforcement();
- Fix several build issues around inline memcpy and memset;
- Fix sparse warning;
- Adjust comments and commit messages;
- Drop "x86/efi: Move runtime service initialization to arch/x86" patch
  as it got applied;

Changes from v5[9]:
- Report LASS violation as NULL pointer dereference if the address is in the
  first page frame;
- Provide helpful error message on #SS due to LASS violation;
- Fold patch for vsyscall=emulate documentation into patch
  that disables LASS with vsyscall=emulate;
- Rewrite __inline_memeset() and __inline_memcpy();
- Adjust comments and commit messages;

Changes from v4[8]:
- Added PeterZ's Originally-by and SoB to 2/16
- Added lass_clac()/lass_stac() to differentiate from SMAP necessitated
  clac()/stac() and to be NOPs on CPUs that don't support LASS
- Moved LASS enabling patch to the end to avoid rendering machines
  unbootable between until the patch that disables LASS around EFI
  initialization
- Reverted Pawan's LAM disabling commit

Changes from v3[6]:
- Made LAM dependent on LASS
- Moved EFI runtime initialization to x86 side of things
- Suspended LASS validation around EFI set_virtual_address_map call
- Added a message for the case of kernel side LASS violation
- Moved inline memset/memcpy versions to the common string.h

Changes from v2[5]:
- Added myself to the SoB chain

Changes from v1[1]:
- Emulate vsyscall violations in execute mode in the #GP fault handler
- Use inline memcpy and memset while patching alternatives
- Remove CONFIG_X86_LASS
- Make LASS depend on SMAP
- Dropped the minimal KVM enabling patch


[1] https://lore.kernel.org/lkml/20230110055204.3227669-1-yian.chen@intel.com/
[2] “Practical Timing Side Channel Attacks against Kernel Space ASLR”,
https://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[3] “Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR”, http://doi.acm.org/10.1145/2976749.2978356
[4] “Harmful prefetch on Intel”, https://ioactive.com/harmful-prefetch-on-intel/ (H/T Anders)
[5] https://lore.kernel.org/all/20230530114247.21821-1-alexander.shishkin@linux.intel.com/
[6] https://lore.kernel.org/all/20230609183632.48706-1-alexander.shishkin@linux.intel.com/
[7] https://download.vusec.net/papers/slam_sp24.pdf
[8] https://lore.kernel.org/all/20240710160655.3402786-1-alexander.shishkin@linux.intel.com/
[9] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@linux.intel.com
[10] https://lore.kernel.org/all/20250620135325.3300848-1-kirill.shutemov@linux.intel.com/
[11] https://lore.kernel.org/all/20250625125112.3943745-1-kirill.shutemov@linux.intel.com/


Alexander Shishkin (4):
  x86/cpu: Defer CR pinning setup until after EFI initialization
  efi: Disable LASS around set_virtual_address_map() EFI call
  x86/traps: Communicate a LASS violation in #GP message
  x86/cpu: Make LAM depend on LASS

Kirill A. Shutemov (5):
  x86/asm: Introduce inline memcpy and memset
  x86/vsyscall: Do not require X86_PF_INSTR to emulate vsyscall
  x86/traps: Generalize #GP address decode and hint code
  x86/traps: Handle LASS thrown #SS
  x86: Re-enable Linear Address Masking

Sohil Mehta (7):
  x86/cpu: Enumerate the LASS feature bits
  x86/alternatives: Disable LASS when patching kernel alternatives
  x86/vsyscall: Reorganize the #PF emulation code
  x86/traps: Consolidate user fixups in exc_general_protection()
  x86/vsyscall: Add vsyscall emulation for #GP
  x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  x86/cpu: Enable LASS during CPU initialization

Yian Chen (1):
  x86/cpu: Set LASS CR4 bit as pinning sensitive

 .../admin-guide/kernel-parameters.txt         |   4 +-
 arch/x86/Kconfig                              |   1 -
 arch/x86/Kconfig.cpufeatures                  |   4 +
 arch/x86/entry/vsyscall/vsyscall_64.c         |  69 +++++++----
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/smap.h                   |  33 ++++-
 arch/x86/include/asm/string.h                 |  46 +++++++
 arch/x86/include/asm/uaccess_64.h             |  38 ++----
 arch/x86/include/asm/vsyscall.h               |  14 ++-
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/alternative.c                 |  14 ++-
 arch/x86/kernel/cpu/common.c                  |  21 ++--
 arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
 arch/x86/kernel/traps.c                       | 113 ++++++++++++------
 arch/x86/kernel/umip.c                        |   3 +
 arch/x86/lib/clear_page_64.S                  |  13 +-
 arch/x86/mm/fault.c                           |   2 +-
 arch/x86/platform/efi/efi.c                   |  15 +++
 tools/arch/x86/include/asm/cpufeatures.h      |   1 +
 19 files changed, 292 insertions(+), 104 deletions(-)

-- 
2.47.2


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCHv8 01/17] x86/cpu: Enumerate the LASS feature bits
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset Kirill A. Shutemov
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Sohil Mehta <sohil.mehta@intel.com>

Linear Address Space Separation (LASS) is a security feature that
intends to prevent malicious virtual address space accesses across
user/kernel mode.

Such mode based access protection already exists today with paging and
features such as SMEP and SMAP. However, to enforce these protections,
the processor must traverse the paging structures in memory.  Malicious
software can use timing information resulting from this traversal to
determine details about the paging structures, and these details may
also be used to determine the layout of the kernel memory.

The LASS mechanism provides the same mode-based protections as paging
but without traversing the paging structures. Because the protections
enforced by LASS are applied before paging, software will not be able to
derive paging-based timing information from the various caching
structures such as the TLBs, mid-level caches, page walker, data caches,
etc.

LASS enforcement relies on the typical kernel implementation to divide
the 64-bit virtual address space into two halves:
  Addr[63]=0 -> User address space
  Addr[63]=1 -> Kernel address space

Any data access or code execution across address spaces typically
results in a #GP fault.

The LASS enforcement for kernel data access is dependent on CR4.SMAP
being set. The enforcement can be disabled by toggling the RFLAGS.AC bit
similar to SMAP.

Define the CPU feature bits to enumerate this feature and include
feature dependencies to reflect the same.

LASS provides protection against a class of speculative attacks, such as
SLAM[1]. Add the "lass" flag to /proc/cpuinfo to indicate that the feature
is supported by hardware and enabled by the kernel. This allows userspace
to determine if the setup is secure against such attacks.

[1] https://download.vusec.net/papers/slam_sp24.pdf

Co-developed-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
---
 arch/x86/Kconfig.cpufeatures                | 4 ++++
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 tools/arch/x86/include/asm/cpufeatures.h    | 1 +
 5 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures
index 250c10627ab3..733d5aff2456 100644
--- a/arch/x86/Kconfig.cpufeatures
+++ b/arch/x86/Kconfig.cpufeatures
@@ -124,6 +124,10 @@ config X86_DISABLED_FEATURE_PCID
 	def_bool y
 	depends on !X86_64
 
+config X86_DISABLED_FEATURE_LASS
+	def_bool y
+	depends on X86_32
+
 config X86_DISABLED_FEATURE_PKU
 	def_bool y
 	depends on !X86_INTEL_MEMORY_PROTECTION_KEYS
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b78af55aa22e..8eef1ad7aca2 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -313,6 +313,7 @@
 #define X86_FEATURE_SM4			(12*32+ 2) /* SM4 instructions */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* "avx_vnni" AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* "avx512_bf16" AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LASS		(12*32+ 6) /* "lass" Linear Address Space Separation */
 #define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* CMPccXADD instructions */
 #define X86_FEATURE_ARCH_PERFMON_EXT	(12*32+ 8) /* Intel Architectural PerfMon Extension */
 #define X86_FEATURE_FZRM		(12*32+10) /* Fast zero-length REP MOVSB */
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index f1a4adc78272..81d0c8bf1137 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -136,6 +136,8 @@
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
 #define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
 #define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_LASS_BIT	27 /* enable Linear Address Space Separation support */
+#define X86_CR4_LASS		_BITUL(X86_CR4_LASS_BIT)
 #define X86_CR4_LAM_SUP_BIT	28 /* LAM for supervisor pointers */
 #define X86_CR4_LAM_SUP		_BITUL(X86_CR4_LAM_SUP_BIT)
 
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 46efcbd6afa4..98d0cdd82574 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -89,6 +89,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_FRED,			X86_FEATURE_LKGS      },
 	{ X86_FEATURE_SPEC_CTRL_SSBD,		X86_FEATURE_SPEC_CTRL },
+	{ X86_FEATURE_LASS,			X86_FEATURE_SMAP      },
 	{}
 };
 
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index ee176236c2be..4473a6f7800b 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -313,6 +313,7 @@
 #define X86_FEATURE_SM4			(12*32+ 2) /* SM4 instructions */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* "avx_vnni" AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* "avx512_bf16" AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LASS		(12*32+ 6) /* "lass" Linear Address Space Separation */
 #define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* CMPccXADD instructions */
 #define X86_FEATURE_ARCH_PERFMON_EXT	(12*32+ 8) /* Intel Architectural PerfMon Extension */
 #define X86_FEATURE_FZRM		(12*32+10) /* Fast zero-length REP MOVSB */
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 01/17] x86/cpu: Enumerate the LASS feature bits Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-03  8:44   ` David Laight
  2025-07-03 17:13   ` Dave Hansen
  2025-07-01  9:58 ` [PATCHv8 03/17] x86/alternatives: Disable LASS when patching kernel alternatives Kirill A. Shutemov
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

Extract memcpy and memset functions from copy_user_generic() and
__clear_user().

They can be used as inline memcpy and memset instead of the GCC builtins
whenever necessary. LASS requires them to handle text_poke.

Originally-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/all/20241029184840.GJ14555@noisy.programming.kicks-ass.net/
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/string.h     | 46 +++++++++++++++++++++++++++++++
 arch/x86/include/asm/uaccess_64.h | 38 +++++++------------------
 arch/x86/lib/clear_page_64.S      | 13 +++++++--
 3 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
index c3c2c1914d65..17f6b5bfa8c1 100644
--- a/arch/x86/include/asm/string.h
+++ b/arch/x86/include/asm/string.h
@@ -1,6 +1,52 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_STRING_H
+#define _ASM_X86_STRING_H
+
+#include <asm/asm.h>
+#include <asm/alternative.h>
+#include <asm/cpufeatures.h>
+
 #ifdef CONFIG_X86_32
 # include <asm/string_32.h>
 #else
 # include <asm/string_64.h>
 #endif
+
+#ifdef CONFIG_X86_64
+#define ALT_64(orig, alt, feat) ALTERNATIVE(orig, alt, feat)
+#else
+#define ALT_64(orig, alt, feat) orig "\n"
+#endif
+
+static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
+{
+	void *ret = to;
+
+	asm volatile("1:\n\t"
+		     ALT_64("rep movsb",
+			    "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
+		     "2:\n\t"
+		     _ASM_EXTABLE_UA(1b, 2b)
+		     : "+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
+		     : : "memory", _ASM_AX);
+
+	return ret + len;
+}
+
+static __always_inline void *__inline_memset(void *addr, int v, size_t len)
+{
+	void *ret = addr;
+
+	asm volatile("1:\n\t"
+		     ALT_64("rep stosb",
+			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
+		     "2:\n\t"
+		     _ASM_EXTABLE_UA(1b, 2b)
+		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
+		     : "a" ((uint8_t)v)
+		     : "memory", _ASM_SI, _ASM_DX);
+
+	return ret + len;
+}
+
+#endif /* _ASM_X86_STRING_H */
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index c8a5ae35c871..eb531e13e659 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -13,6 +13,7 @@
 #include <asm/page.h>
 #include <asm/percpu.h>
 #include <asm/runtime-const.h>
+#include <asm/string.h>
 
 /*
  * Virtual variable: there's no actual backing store for this,
@@ -118,21 +119,12 @@ rep_movs_alternative(void *to, const void *from, unsigned len);
 static __always_inline __must_check unsigned long
 copy_user_generic(void *to, const void *from, unsigned long len)
 {
+	void *ret;
+
 	stac();
-	/*
-	 * If CPU has FSRM feature, use 'rep movs'.
-	 * Otherwise, use rep_movs_alternative.
-	 */
-	asm volatile(
-		"1:\n\t"
-		ALTERNATIVE("rep movsb",
-			    "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
-		"2:\n"
-		_ASM_EXTABLE_UA(1b, 2b)
-		:"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
-		: : "memory", "rax");
+	ret = __inline_memcpy(to, from, len);
 	clac();
-	return len;
+	return ret - to;
 }
 
 static __always_inline __must_check unsigned long
@@ -178,25 +170,15 @@ rep_stos_alternative(void __user *addr, unsigned long len);
 
 static __always_inline __must_check unsigned long __clear_user(void __user *addr, unsigned long size)
 {
+	void *ptr = (__force void *)addr;
+	void *ret;
+
 	might_fault();
 	stac();
-
-	/*
-	 * No memory constraint because it doesn't change any memory gcc
-	 * knows about.
-	 */
-	asm volatile(
-		"1:\n\t"
-		ALTERNATIVE("rep stosb",
-			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRS))
-		"2:\n"
-	       _ASM_EXTABLE_UA(1b, 2b)
-	       : "+c" (size), "+D" (addr), ASM_CALL_CONSTRAINT
-	       : "a" (0));
-
+	ret = __inline_memset(ptr, 0, size);
 	clac();
 
-	return size;
+	return ret - ptr;
 }
 
 static __always_inline unsigned long clear_user(void __user *to, unsigned long n)
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index a508e4a8c66a..47b613690f84 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
 EXPORT_SYMBOL_GPL(clear_page_erms)
 
 /*
- * Default clear user-space.
+ * Default memset.
  * Input:
  * rdi destination
+ * rsi scratch
  * rcx count
- * rax is zero
+ * al is value
  *
  * Output:
  * rcx: uncleared bytes or 0 if successful.
+ * rdx: clobbered
  */
 SYM_FUNC_START(rep_stos_alternative)
 	ANNOTATE_NOENDBR
+
+	movzbq %al, %rsi
+	movabs $0x0101010101010101, %rax
+
+	/* RDX:RAX = RAX * RSI */
+	mulq %rsi
+
 	cmpq $64,%rcx
 	jae .Lunrolled
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 03/17] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 01/17] x86/cpu: Enumerate the LASS feature bits Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01 18:44   ` Sohil Mehta
  2025-07-01  9:58 ` [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization Kirill A. Shutemov
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Sohil Mehta <sohil.mehta@intel.com>

For patching, the kernel initializes a temporary mm area in the lower
half of the address range. See commit 4fc19708b165 ("x86/alternatives:
Initialize temporary mm for patching").

Disable LASS enforcement during patching to avoid triggering a #GP
fault.

The objtool warns due to a call to a non-allowed function that exists
outside of the stac/clac guard, or references to any function with a
dynamic function pointer inside the guard. See the Objtool warnings
section #9 in the document tools/objtool/Documentation/objtool.txt.

Considering that patching is usually small, replace the memcpy and
memset functions in the text poking functions with their inline versions
respectively.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/include/asm/smap.h   | 33 +++++++++++++++++++++++++++++++--
 arch/x86/kernel/alternative.c | 14 ++++++++++++--
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 4f84d421d1cf..d0cc24348641 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -23,18 +23,47 @@
 
 #else /* __ASSEMBLER__ */
 
+/*
+ * The CLAC/STAC instructions toggle the enforcement of X86_FEATURE_SMAP and
+ * X86_FEATURE_LASS.
+ *
+ * SMAP enforcement is based on the _PAGE_BIT_USER bit in the page tables: the
+ * kernel is not allowed to touch pages with the bit set unless the AC bit is
+ * set.
+ *
+ * LASS enforcement is based on bit 63 of the virtual address. The kernel is
+ * not allowed to touch memory in the lower half of the virtual address space
+ * unless the AC bit is set.
+ *
+ * Use stac()/clac() when accessing userspace (_PAGE_USER) mappings,
+ * regardless of location.
+ *
+ * Use lass_stac()/lass_clac() when accessing kernel mappings (!_PAGE_USER)
+ * in the lower half of the address space.
+ *
+ * Note: a barrier is implicit in alternative().
+ */
+
 static __always_inline void clac(void)
 {
-	/* Note: a barrier is implicit in alternative() */
 	alternative("", "clac", X86_FEATURE_SMAP);
 }
 
 static __always_inline void stac(void)
 {
-	/* Note: a barrier is implicit in alternative() */
 	alternative("", "stac", X86_FEATURE_SMAP);
 }
 
+static __always_inline void lass_clac(void)
+{
+	alternative("", "clac", X86_FEATURE_LASS);
+}
+
+static __always_inline void lass_stac(void)
+{
+	alternative("", "stac", X86_FEATURE_LASS);
+}
+
 static __always_inline unsigned long smap_save(void)
 {
 	unsigned long flags;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ea1d984166cd..3d2bcb7682eb 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2447,16 +2447,26 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
+/*
+ * Text poking creates and uses a mapping in the lower half of the
+ * address space. Relax LASS enforcement when accessing the poking
+ * address.
+ */
+
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
-	memcpy(dst, src, len);
+	lass_stac();
+	__inline_memcpy(dst, src, len);
+	lass_clac();
 }
 
 static void text_poke_memset(void *dst, const void *src, size_t len)
 {
 	int c = *(const int *)src;
 
-	memset(dst, c, len);
+	lass_stac();
+	__inline_memset(dst, c, len);
+	lass_clac();
 }
 
 typedef void text_poke_f(void *dst, const void *src, size_t len);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 03/17] x86/alternatives: Disable LASS when patching kernel alternatives Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01 19:03   ` Sohil Mehta
  2025-07-01 23:10   ` Dave Hansen
  2025-07-01  9:58 ` [PATCHv8 05/17] efi: Disable LASS around set_virtual_address_map() EFI call Kirill A. Shutemov
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>

In order to map the EFI runtime services, set_virtual_address_map()
needs to be called, which resides in the lower half of the address
space. This means that LASS needs to be temporarily disabled around
this call. This can only be done before the CR pinning is set up.

Move CR pinning setup behind the EFI initialization.

Wrapping efi_enter_virtual_mode() into lass_disable/enable_enforcement()
is not enough because AC flag gates data accesses, but not instruction
fetch. Clearing the CR4 bit is required.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4f430be285de..9918121e0adc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2081,7 +2081,6 @@ static __init void identify_boot_cpu(void)
 	enable_sep_cpu();
 #endif
 	cpu_detect_tlb(&boot_cpu_data);
-	setup_cr_pinning();
 
 	tsx_init();
 	tdx_init();
@@ -2532,10 +2531,14 @@ void __init arch_cpu_finalize_init(void)
 
 	/*
 	 * This needs to follow the FPU initializtion, since EFI depends on it.
+	 *
+	 * EFI twiddles CR4.LASS. Do it before CR pinning.
 	 */
 	if (efi_enabled(EFI_RUNTIME_SERVICES))
 		efi_enter_virtual_mode();
 
+	setup_cr_pinning();
+
 	/*
 	 * Ensure that access to the per CPU representation has the initial
 	 * boot CPU configuration.
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 05/17] efi: Disable LASS around set_virtual_address_map() EFI call
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 06/17] x86/vsyscall: Do not require X86_PF_INSTR to emulate vsyscall Kirill A. Shutemov
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>

Of all the EFI runtime services, set_virtual_address_map() is the only
one that is called at its lower mapping, which LASS prohibits regardless
of EFLAGS.AC setting. The only way to allow this to happen is to disable
LASS in the CR4 register.

Disable LASS around this low address EFI call.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/platform/efi/efi.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 463b784499a8..5b23c0daedef 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -787,6 +787,7 @@ static void __init __efi_enter_virtual_mode(void)
 	int count = 0, pg_shift = 0;
 	void *new_memmap = NULL;
 	efi_status_t status;
+	unsigned long lass;
 	unsigned long pa;
 
 	if (efi_alloc_page_tables()) {
@@ -825,11 +826,25 @@ static void __init __efi_enter_virtual_mode(void)
 
 	efi_sync_low_kernel_mappings();
 
+	/*
+	 * set_virtual_address_map() is the only service located at lower
+	 * addresses, so LASS has to be disabled around it.
+	 *
+	 * Note that flipping RFLAGS.AC is not sufficient for this, as it only
+	 * permits data accesses and not instruction fetch. The entire LASS
+	 * needs to be disabled.
+	 */
+	lass = cr4_read_shadow() & X86_CR4_LASS;
+	cr4_clear_bits(lass);
+
 	status = efi_set_virtual_address_map(efi.memmap.desc_size * count,
 					     efi.memmap.desc_size,
 					     efi.memmap.desc_version,
 					     (efi_memory_desc_t *)pa,
 					     efi_systab_phys);
+
+	cr4_set_bits(lass);
+
 	if (status != EFI_SUCCESS) {
 		pr_err("Unable to switch EFI into virtual mode (status=%lx)!\n",
 		       status);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 06/17] x86/vsyscall: Do not require X86_PF_INSTR to emulate vsyscall
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 05/17] efi: Disable LASS around set_virtual_address_map() EFI call Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 07/17] x86/vsyscall: Reorganize the #PF emulation code Kirill A. Shutemov
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

emulate_vsyscall() expects to see X86_PF_INSTR in PFEC on a vsyscall
page fault, but the CPU does not report X86_PF_INSTR if neither
X86_FEATURE_NX nor X86_FEATURE_SMEP are enabled.

X86_FEATURE_NX should be enabled on nearly all 64-bit CPUs, except for
early P4 processors that did not support this feature.

Instead of explicitly checking for X86_PF_INSTR, compare the fault
address against RIP.

On machines with X86_FEATURE_NX enabled, issue a warning if RIP is equal
to fault address but X86_PF_INSTR is absent.

Originally-by: Dave Hansen <dave.hansen@intel.com>
Link: https://lore.kernel.org/all/bd81a98b-f8d4-4304-ac55-d4151a1a77ab@intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index c9103a6fa06e..0b0e0283994f 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -124,7 +124,8 @@ bool emulate_vsyscall(unsigned long error_code,
 	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
 		return false;
 
-	if (!(error_code & X86_PF_INSTR)) {
+	/* Avoid emulation unless userspace was executing from vsyscall page: */
+	if (address != regs->ip) {
 		/* Failed vsyscall read */
 		if (vsyscall_mode == EMULATE)
 			return false;
@@ -136,13 +137,16 @@ bool emulate_vsyscall(unsigned long error_code,
 		return false;
 	}
 
+
+	/* X86_PF_INSTR is only set when NX is supported: */
+	if (cpu_feature_enabled(X86_FEATURE_NX))
+		WARN_ON_ONCE(!(error_code & X86_PF_INSTR));
+
 	/*
 	 * No point in checking CS -- the only way to get here is a user mode
 	 * trap to a high address, which means that we're in 64-bit user code.
 	 */
 
-	WARN_ON_ONCE(address != regs->ip);
-
 	if (vsyscall_mode == NONE) {
 		warn_bad_vsyscall(KERN_INFO, regs,
 				  "vsyscall attempted with vsyscall=none");
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 07/17] x86/vsyscall: Reorganize the #PF emulation code
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 06/17] x86/vsyscall: Do not require X86_PF_INSTR to emulate vsyscall Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 08/17] x86/traps: Consolidate user fixups in exc_general_protection() Kirill A. Shutemov
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Sohil Mehta <sohil.mehta@intel.com>

Separate out the actual vsyscall emulation from the page fault specific
handling in preparation for the upcoming #GP fault emulation.

No functional change intended.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 52 ++++++++++++++-------------
 arch/x86/include/asm/vsyscall.h       |  8 ++---
 arch/x86/mm/fault.c                   |  2 +-
 3 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 0b0e0283994f..25f94ac5fd35 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -112,36 +112,13 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
 	}
 }
 
-bool emulate_vsyscall(unsigned long error_code,
-		      struct pt_regs *regs, unsigned long address)
+static bool __emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 {
 	unsigned long caller;
 	int vsyscall_nr, syscall_nr, tmp;
 	long ret;
 	unsigned long orig_dx;
 
-	/* Write faults or kernel-privilege faults never get fixed up. */
-	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
-		return false;
-
-	/* Avoid emulation unless userspace was executing from vsyscall page: */
-	if (address != regs->ip) {
-		/* Failed vsyscall read */
-		if (vsyscall_mode == EMULATE)
-			return false;
-
-		/*
-		 * User code tried and failed to read the vsyscall page.
-		 */
-		warn_bad_vsyscall(KERN_INFO, regs, "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
-		return false;
-	}
-
-
-	/* X86_PF_INSTR is only set when NX is supported: */
-	if (cpu_feature_enabled(X86_FEATURE_NX))
-		WARN_ON_ONCE(!(error_code & X86_PF_INSTR));
-
 	/*
 	 * No point in checking CS -- the only way to get here is a user mode
 	 * trap to a high address, which means that we're in 64-bit user code.
@@ -274,6 +251,33 @@ bool emulate_vsyscall(unsigned long error_code,
 	return true;
 }
 
+bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
+			 unsigned long address)
+{
+	/* Write faults or kernel-privilege faults never get fixed up. */
+	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
+		return false;
+
+	if (address == regs->ip) {
+		/* X86_PF_INSTR is only set when NX is supported: */
+		if (cpu_feature_enabled(X86_FEATURE_NX))
+			WARN_ON_ONCE(!(error_code & X86_PF_INSTR));
+
+		return __emulate_vsyscall(regs, address);
+	}
+
+	/* Failed vsyscall read */
+	if (vsyscall_mode == EMULATE)
+		return false;
+
+	/*
+	 * User code tried and failed to read the vsyscall page.
+	 */
+	warn_bad_vsyscall(KERN_INFO, regs,
+			  "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
+	return false;
+}
+
 /*
  * A pseudo VMA to allow ptrace access for the vsyscall page.  This only
  * covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index 472f0263dbc6..214977f4fa11 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -14,12 +14,12 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
  * Called on instruction fetch fault in vsyscall page.
  * Returns true if handled.
  */
-extern bool emulate_vsyscall(unsigned long error_code,
-			     struct pt_regs *regs, unsigned long address);
+extern bool emulate_vsyscall_pf(unsigned long error_code,
+				struct pt_regs *regs, unsigned long address);
 #else
 static inline void map_vsyscall(void) {}
-static inline bool emulate_vsyscall(unsigned long error_code,
-				    struct pt_regs *regs, unsigned long address)
+static inline bool emulate_vsyscall_pf(unsigned long error_code,
+				       struct pt_regs *regs, unsigned long address)
 {
 	return false;
 }
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 998bd807fc7b..fbcc2da75fd6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1316,7 +1316,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * to consider the PF_PK bit.
 	 */
 	if (is_vsyscall_vaddr(address)) {
-		if (emulate_vsyscall(error_code, regs, address))
+		if (emulate_vsyscall_pf(error_code, regs, address))
 			return;
 	}
 #endif
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 08/17] x86/traps: Consolidate user fixups in exc_general_protection()
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 07/17] x86/vsyscall: Reorganize the #PF emulation code Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 09/17] x86/vsyscall: Add vsyscall emulation for #GP Kirill A. Shutemov
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Sohil Mehta <sohil.mehta@intel.com>

Move the UMIP exception fixup along with the other user mode fixups,
that is, under the common "if (user_mode(regs))" condition where the
rest of the fixups reside.

No functional change intended.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
 arch/x86/kernel/traps.c | 8 +++-----
 arch/x86/kernel/umip.c  | 3 +++
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c5c897a86418..10856e0ac46c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -800,11 +800,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 
 	cond_local_irq_enable(regs);
 
-	if (static_cpu_has(X86_FEATURE_UMIP)) {
-		if (user_mode(regs) && fixup_umip_exception(regs))
-			goto exit;
-	}
-
 	if (v8086_mode(regs)) {
 		local_irq_enable();
 		handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
@@ -819,6 +814,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 		if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
 			goto exit;
 
+		if (fixup_umip_exception(regs))
+			goto exit;
+
 		gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
 		goto exit;
 	}
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index 5a4b21389b1d..80f2ad26363c 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -343,6 +343,9 @@ bool fixup_umip_exception(struct pt_regs *regs)
 	void __user *uaddr;
 	struct insn insn;
 
+	if (!cpu_feature_enabled(X86_FEATURE_UMIP))
+		return false;
+
 	if (!regs)
 		return false;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 09/17] x86/vsyscall: Add vsyscall emulation for #GP
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 08/17] x86/traps: Consolidate user fixups in exc_general_protection() Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 10/17] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Kirill A. Shutemov
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Sohil Mehta <sohil.mehta@intel.com>

The legacy vsyscall page is mapped at a fixed address in the kernel
address range 0xffffffffff600000-0xffffffffff601000. Prior to LASS being
introduced, a legacy vsyscall page access from userspace would always
generate a page fault. The kernel emulates the execute (XONLY) accesses
in the page fault handler and returns back to userspace with the
appropriate register values.

Since LASS intercepts these accesses before the paging structures are
traversed it generates a general protection fault instead of a page
fault. The #GP fault doesn't provide much information in terms of the
error code. So, use the faulting RIP which is preserved in the user
registers to emulate the vsyscall access without going through complex
instruction decoding.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 14 +++++++++++++-
 arch/x86/include/asm/vsyscall.h       |  6 ++++++
 arch/x86/kernel/traps.c               |  4 ++++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 25f94ac5fd35..be77385b311e 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -23,7 +23,7 @@
  * soon be no new userspace code that will ever use a vsyscall.
  *
  * The code in this file emulates vsyscalls when notified of a page
- * fault to a vsyscall address.
+ * fault or a general protection fault to a vsyscall address.
  */
 
 #include <linux/kernel.h>
@@ -278,6 +278,18 @@ bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
 	return false;
 }
 
+bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_LASS))
+		return false;
+
+	/* Emulate only if the RIP points to the vsyscall address */
+	if (!is_vsyscall_vaddr(regs->ip))
+		return false;
+
+	return __emulate_vsyscall(regs, regs->ip);
+}
+
 /*
  * A pseudo VMA to allow ptrace access for the vsyscall page.  This only
  * covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index 214977f4fa11..4eb8d3673223 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -16,6 +16,7 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
  */
 extern bool emulate_vsyscall_pf(unsigned long error_code,
 				struct pt_regs *regs, unsigned long address);
+extern bool emulate_vsyscall_gp(struct pt_regs *regs);
 #else
 static inline void map_vsyscall(void) {}
 static inline bool emulate_vsyscall_pf(unsigned long error_code,
@@ -23,6 +24,11 @@ static inline bool emulate_vsyscall_pf(unsigned long error_code,
 {
 	return false;
 }
+
+static inline bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+	return false;
+}
 #endif
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 10856e0ac46c..40e34bb66d7c 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -69,6 +69,7 @@
 #include <asm/tdx.h>
 #include <asm/cfi.h>
 #include <asm/msr.h>
+#include <asm/vsyscall.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -817,6 +818,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 		if (fixup_umip_exception(regs))
 			goto exit;
 
+		if (emulate_vsyscall_gp(regs))
+			goto exit;
+
 		gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
 		goto exit;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 10/17] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (8 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 09/17] x86/vsyscall: Add vsyscall emulation for #GP Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 11/17] x86/cpu: Set LASS CR4 bit as pinning sensitive Kirill A. Shutemov
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Sohil Mehta <sohil.mehta@intel.com>

The EMULATE mode of vsyscall maps the vsyscall page into user address
space which can be read directly by the user application. This mode has
been deprecated recently and can only be enabled from a special command
line parameter vsyscall=emulate. See commit bf00745e7791 ("x86/vsyscall:
Remove CONFIG_LEGACY_VSYSCALL_EMULATE")

Fixing the LASS violations during the EMULATE mode would need complex
instruction decoding since the resulting #GP fault does not include any
useful error information and the vsyscall address is not readily
available in the RIP.

At this point, no one is expected to be using the insecure and
deprecated EMULATE mode. The rare usages that need support probably
don't care much about security anyway. Disable LASS when EMULATE mode is
requested during command line parsing to avoid breaking user software.
LASS will be supported if vsyscall mode is set to XONLY or NONE.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +++-
 arch/x86/entry/vsyscall/vsyscall_64.c           | 7 +++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f1f2c0874da9..796c987372df 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7926,7 +7926,9 @@
 
 			emulate     Vsyscalls turn into traps and are emulated
 			            reasonably safely.  The vsyscall page is
-				    readable.
+				    readable.  This disables the Linear
+				    Address Space Separation (LASS) security
+				    feature and makes the system less secure.
 
 			xonly       [default] Vsyscalls turn into traps and are
 			            emulated reasonably safely.  The vsyscall
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index be77385b311e..d37df40bfb26 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -63,6 +63,13 @@ static int __init vsyscall_setup(char *str)
 		else
 			return -EINVAL;
 
+		if (cpu_feature_enabled(X86_FEATURE_LASS) &&
+		    vsyscall_mode == EMULATE) {
+			cr4_clear_bits(X86_CR4_LASS);
+			setup_clear_cpu_cap(X86_FEATURE_LASS);
+			pr_warn_once("x86/cpu: Disabling LASS support due to vsyscall=emulate\n");
+		}
+
 		return 0;
 	}
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 11/17] x86/cpu: Set LASS CR4 bit as pinning sensitive
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (9 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 10/17] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01 22:51   ` Sohil Mehta
  2025-07-01  9:58 ` [PATCHv8 12/17] x86/traps: Communicate a LASS violation in #GP message Kirill A. Shutemov
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Yian Chen <yian.chen@intel.com>

Security features such as LASS are not expected to be disabled once
initialized. Add LASS to the CR4 pinned mask.

Signed-off-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9918121e0adc..1552c7510380 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -403,7 +403,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
-					     X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED;
+					     X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED |
+					     X86_CR4_LASS;
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 12/17] x86/traps: Communicate a LASS violation in #GP message
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (10 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 11/17] x86/cpu: Set LASS CR4 bit as pinning sensitive Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-02  0:36   ` Sohil Mehta
  2025-07-01  9:58 ` [PATCHv8 13/17] x86/traps: Generalize #GP address decode and hint code Kirill A. Shutemov
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>

Provide a more helpful message on #GP when a kernel side LASS violation
is detected.

A NULL pointer dereference is reported if a LASS violation occurs due to
accessing the first page frame.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/traps.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 40e34bb66d7c..5206eb0ab01a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -636,7 +636,16 @@ DEFINE_IDTENTRY(exc_bounds)
 enum kernel_gp_hint {
 	GP_NO_HINT,
 	GP_NON_CANONICAL,
-	GP_CANONICAL
+	GP_CANONICAL,
+	GP_LASS_VIOLATION,
+	GP_NULL_POINTER,
+};
+
+static const char * const kernel_gp_hint_help[] = {
+	[GP_NON_CANONICAL]	= "probably for non-canonical address",
+	[GP_CANONICAL]		= "maybe for address",
+	[GP_LASS_VIOLATION]	= "LASS prevented access to address",
+	[GP_NULL_POINTER]	= "kernel NULL pointer dereference",
 };
 
 /*
@@ -672,6 +681,12 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
 	if (*addr < ~__VIRTUAL_MASK &&
 	    *addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
 		return GP_NON_CANONICAL;
+	else if (*addr < ~__VIRTUAL_MASK &&
+		 cpu_feature_enabled(X86_FEATURE_LASS)) {
+		if (*addr < PAGE_SIZE)
+			return GP_NULL_POINTER;
+		return GP_LASS_VIOLATION;
+	}
 #endif
 
 	return GP_CANONICAL;
@@ -833,11 +848,10 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 	else
 		hint = get_kernel_gp_address(regs, &gp_addr);
 
-	if (hint != GP_NO_HINT)
+	if (hint != GP_NO_HINT) {
 		snprintf(desc, sizeof(desc), GPFSTR ", %s 0x%lx",
-			 (hint == GP_NON_CANONICAL) ? "probably for non-canonical address"
-						    : "maybe for address",
-			 gp_addr);
+			 kernel_gp_hint_help[hint], gp_addr);
+	}
 
 	/*
 	 * KASAN is interested only in the non-canonical case, clear it
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 13/17] x86/traps: Generalize #GP address decode and hint code
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (11 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 12/17] x86/traps: Communicate a LASS violation in #GP message Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-02  0:54   ` Sohil Mehta
  2025-07-01  9:58 ` [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS Kirill A. Shutemov
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

Handlers for #GP and #SS will now share code to decode the exception
address and retrieve the exception hint string.

The helper, enum, and array should be renamed as they are no longer
specific to #GP.

No functional change intended.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/traps.c | 62 ++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 5206eb0ab01a..ceb091f17a5b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -633,28 +633,28 @@ DEFINE_IDTENTRY(exc_bounds)
 	cond_local_irq_disable(regs);
 }
 
-enum kernel_gp_hint {
-	GP_NO_HINT,
-	GP_NON_CANONICAL,
-	GP_CANONICAL,
-	GP_LASS_VIOLATION,
-	GP_NULL_POINTER,
+enum kernel_exc_hint {
+	EXC_NO_HINT,
+	EXC_NON_CANONICAL,
+	EXC_CANONICAL,
+	EXC_LASS_VIOLATION,
+	EXC_NULL_POINTER,
 };
 
-static const char * const kernel_gp_hint_help[] = {
-	[GP_NON_CANONICAL]	= "probably for non-canonical address",
-	[GP_CANONICAL]		= "maybe for address",
-	[GP_LASS_VIOLATION]	= "LASS prevented access to address",
-	[GP_NULL_POINTER]	= "kernel NULL pointer dereference",
+static const char * const kernel_exc_hint_help[] = {
+	[EXC_NON_CANONICAL]	= "probably for non-canonical address",
+	[EXC_CANONICAL]		= "maybe for address",
+	[EXC_LASS_VIOLATION]	= "LASS prevented access to address",
+	[EXC_NULL_POINTER]	= "kernel NULL pointer dereference",
 };
 
 /*
- * When an uncaught #GP occurs, try to determine the memory address accessed by
- * the instruction and return that address to the caller. Also, try to figure
- * out whether any part of the access to that address was non-canonical.
+ * When an uncaught #GP/#SS occurs, try to determine the memory address accessed
+ * by  the instruction and return that address to the caller. Also, try to
+ * figure out whether any part of the access to that address was non-canonical.
  */
-static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
-						 unsigned long *addr)
+static enum kernel_exc_hint get_kernel_exc_address(struct pt_regs *regs,
+						   unsigned long *addr)
 {
 	u8 insn_buf[MAX_INSN_SIZE];
 	struct insn insn;
@@ -662,15 +662,15 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
 
 	if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip,
 			MAX_INSN_SIZE))
-		return GP_NO_HINT;
+		return EXC_NO_HINT;
 
 	ret = insn_decode_kernel(&insn, insn_buf);
 	if (ret < 0)
-		return GP_NO_HINT;
+		return EXC_NO_HINT;
 
 	*addr = (unsigned long)insn_get_addr_ref(&insn, regs);
 	if (*addr == -1UL)
-		return GP_NO_HINT;
+		return EXC_NO_HINT;
 
 #ifdef CONFIG_X86_64
 	/*
@@ -680,16 +680,16 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
 	 */
 	if (*addr < ~__VIRTUAL_MASK &&
 	    *addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
-		return GP_NON_CANONICAL;
+		return EXC_NON_CANONICAL;
 	else if (*addr < ~__VIRTUAL_MASK &&
 		 cpu_feature_enabled(X86_FEATURE_LASS)) {
 		if (*addr < PAGE_SIZE)
-			return GP_NULL_POINTER;
-		return GP_LASS_VIOLATION;
+			return EXC_NULL_POINTER;
+		return EXC_LASS_VIOLATION;
 	}
 #endif
 
-	return GP_CANONICAL;
+	return EXC_CANONICAL;
 }
 
 #define GPFSTR "general protection fault"
@@ -808,8 +808,8 @@ static void gp_user_force_sig_segv(struct pt_regs *regs, int trapnr,
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 {
 	char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
-	enum kernel_gp_hint hint = GP_NO_HINT;
-	unsigned long gp_addr;
+	enum kernel_exc_hint hint = EXC_NO_HINT;
+	unsigned long exc_addr;
 
 	if (user_mode(regs) && try_fixup_enqcmd_gp())
 		return;
@@ -846,21 +846,21 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 	if (error_code)
 		snprintf(desc, sizeof(desc), "segment-related " GPFSTR);
 	else
-		hint = get_kernel_gp_address(regs, &gp_addr);
+		hint = get_kernel_exc_address(regs, &exc_addr);
 
-	if (hint != GP_NO_HINT) {
+	if (hint != EXC_NO_HINT) {
 		snprintf(desc, sizeof(desc), GPFSTR ", %s 0x%lx",
-			 kernel_gp_hint_help[hint], gp_addr);
+			 kernel_exc_hint_help[hint], exc_addr);
 	}
 
 	/*
 	 * KASAN is interested only in the non-canonical case, clear it
 	 * otherwise.
 	 */
-	if (hint != GP_NON_CANONICAL)
-		gp_addr = 0;
+	if (hint != EXC_NON_CANONICAL)
+		exc_addr = 0;
 
-	die_addr(desc, regs, error_code, gp_addr);
+	die_addr(desc, regs, error_code, exc_addr);
 
 exit:
 	cond_local_irq_disable(regs);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (12 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 13/17] x86/traps: Generalize #GP address decode and hint code Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-02  1:35   ` Sohil Mehta
  2025-07-01  9:58 ` [PATCHv8 15/17] x86/cpu: Make LAM depend on LASS Kirill A. Shutemov
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

LASS throws a #GP for any violations except for stack register accesses,
in which case it throws a #SS instead. Handle this similarly to how other
LASS violations are handled.

In case of FRED, before handling #SS as LASS violation, kernel has to
check if there's a fixup for the exception. It can address #SS due to
invalid user context on ERETU. See 5105e7687ad3 ("x86/fred: Fixup
fault on ERETU by jumping to fred_entrypoint_user") for more details.

Co-developed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/traps.c | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index ceb091f17a5b..f9ca5b911141 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -418,12 +418,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
 		      SIGBUS, 0, NULL);
 }
 
-DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
-{
-	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
-		      0, NULL);
-}
-
 DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
 {
 	char *str = "alignment check";
@@ -866,6 +860,39 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 	cond_local_irq_disable(regs);
 }
 
+#define SSFSTR "stack segment fault"
+
+DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
+{
+	if (user_mode(regs))
+		goto error_trap;
+
+	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
+	    fixup_exception(regs, X86_TRAP_SS, error_code, 0))
+		return;
+
+	if (cpu_feature_enabled(X86_FEATURE_LASS)) {
+		enum kernel_exc_hint hint;
+		unsigned long exc_addr;
+
+		hint = get_kernel_exc_address(regs, &exc_addr);
+		if (hint != EXC_NO_HINT) {
+			printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint],
+			       exc_addr);
+		}
+
+		if (hint != EXC_NON_CANONICAL)
+			exc_addr = 0;
+
+		die_addr(SSFSTR, regs, error_code, exc_addr);
+		return;
+	}
+
+error_trap:
+	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
+		      0, NULL);
+}
+
 static bool do_int3(struct pt_regs *regs)
 {
 	int res;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 15/17] x86/cpu: Make LAM depend on LASS
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (13 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01 23:03   ` Sohil Mehta
  2025-07-01  9:58 ` [PATCHv8 16/17] x86/cpu: Enable LASS during CPU initialization Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 17/17] x86: Re-enable Linear Address Masking Kirill A. Shutemov
  16 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>

To prevent exploits for Spectre based on LAM as demonstrated by the
whitepaper [1], make LAM depend on LASS, which avoids this type of
vulnerability.

[1] https://download.vusec.net/papers/slam_sp24.pdf

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/cpuid-deps.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 98d0cdd82574..11bb9ed40140 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -90,6 +90,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_FRED,			X86_FEATURE_LKGS      },
 	{ X86_FEATURE_SPEC_CTRL_SSBD,		X86_FEATURE_SPEC_CTRL },
 	{ X86_FEATURE_LASS,			X86_FEATURE_SMAP      },
+	{ X86_FEATURE_LAM,			X86_FEATURE_LASS      },
 	{}
 };
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 16/17] x86/cpu: Enable LASS during CPU initialization
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (14 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 15/17] x86/cpu: Make LAM depend on LASS Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01  9:58 ` [PATCHv8 17/17] x86: Re-enable Linear Address Masking Kirill A. Shutemov
  16 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

From: Sohil Mehta <sohil.mehta@intel.com>

Being a security feature, enable LASS by default if the platform
supports it.

While at it, get rid of the comment above the SMAP/SMEP/UMIP/LASS setup
instead of updating it to mention LASS as well, as the whole sequence is
quite self-explanatory.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1552c7510380..97a228f917a9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -401,6 +401,12 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
+static __always_inline void setup_lass(struct cpuinfo_x86 *c)
+{
+	if (cpu_feature_enabled(X86_FEATURE_LASS))
+		cr4_set_bits(X86_CR4_LASS);
+}
+
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 					     X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED |
@@ -1975,10 +1981,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_lass(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCHv8 17/17] x86: Re-enable Linear Address Masking
  2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
                   ` (15 preceding siblings ...)
  2025-07-01  9:58 ` [PATCHv8 16/17] x86/cpu: Enable LASS during CPU initialization Kirill A. Shutemov
@ 2025-07-01  9:58 ` Kirill A. Shutemov
  2025-07-01 23:13   ` Sohil Mehta
  16 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-01  9:58 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

This reverts commit 3267cb6d3a174ff83d6287dcd5b0047bbd912452.

LASS mitigates the Spectre based on LAM (SLAM) [1] and the previous
commit made LAM depend on LASS, so we no longer need to disable LAM at
compile time, so revert the commit that disables LAM.

Adjust USER_PTR_MAX if LAM enabled, allowing tag bits to be set for
userspace pointers. The value for the constant is defined in a way to
avoid overflow compiler warning on 32-bit config.

[1] https://download.vusec.net/papers/slam_sp24.pdf

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/Kconfig             | 1 -
 arch/x86/kernel/cpu/common.c | 5 +----
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 71019b3b54ea..2b48e916b754 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2181,7 +2181,6 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
 config ADDRESS_MASKING
 	bool "Linear Address Masking support"
 	depends on X86_64
-	depends on COMPILE_TEST || !CPU_MITIGATIONS # wait for LASS
 	help
 	  Linear Address Masking (LAM) modifies the checking that is applied
 	  to 64-bit linear addresses, allowing software to use of the
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 97a228f917a9..6f2ae9e702bc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2558,11 +2558,8 @@ void __init arch_cpu_finalize_init(void)
 	if (IS_ENABLED(CONFIG_X86_64)) {
 		unsigned long USER_PTR_MAX = TASK_SIZE_MAX;
 
-		/*
-		 * Enable this when LAM is gated on LASS support
 		if (cpu_feature_enabled(X86_FEATURE_LAM))
-			USER_PTR_MAX = (1ul << 63) - PAGE_SIZE;
-		 */
+			USER_PTR_MAX = (-1UL >> 1) & PAGE_MASK;
 		runtime_const_init(ptr, USER_PTR_MAX);
 
 		/*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 03/17] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-07-01  9:58 ` [PATCHv8 03/17] x86/alternatives: Disable LASS when patching kernel alternatives Kirill A. Shutemov
@ 2025-07-01 18:44   ` Sohil Mehta
  0 siblings, 0 replies; 60+ messages in thread
From: Sohil Mehta @ 2025-07-01 18:44 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
>  
> +/*
> + * The CLAC/STAC instructions toggle the enforcement of X86_FEATURE_SMAP and
> + * X86_FEATURE_LASS.
> + *
> + * SMAP enforcement is based on the _PAGE_BIT_USER bit in the page tables: the
> + * kernel is not allowed to touch pages with the bit set unless the AC bit is
> + * set.
> + *
> + * LASS enforcement is based on bit 63 of the virtual address. The kernel is
> + * not allowed to touch memory in the lower half of the virtual address space
> + * unless the AC bit is set.
> + *
> + * Use stac()/clac() when accessing userspace (_PAGE_USER) mappings,
> + * regardless of location.
> + *
> + * Use lass_stac()/lass_clac() when accessing kernel mappings (!_PAGE_USER)
> + * in the lower half of the address space.
> + *
> + * Note: a barrier is implicit in alternative().
> + */
> +
Thank you for incorporating my feedback. I like the updated wording.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization
  2025-07-01  9:58 ` [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization Kirill A. Shutemov
@ 2025-07-01 19:03   ` Sohil Mehta
  2025-07-02  9:47     ` Kirill A. Shutemov
  2025-07-01 23:10   ` Dave Hansen
  1 sibling, 1 reply; 60+ messages in thread
From: Sohil Mehta @ 2025-07-01 19:03 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> 
> In order to map the EFI runtime services, set_virtual_address_map()
> needs to be called, which resides in the lower half of the address
> space. This means that LASS needs to be temporarily disabled around
> this call. This can only be done before the CR pinning is set up.
> 
> Move CR pinning setup behind the EFI initialization.
> 
> Wrapping efi_enter_virtual_mode() into lass_disable/enable_enforcement()

I believe this should be lass_stac()/clac() since we reverted to the
original naming.

> is not enough because AC flag gates data accesses, but not instruction
> fetch. Clearing the CR4 bit is required.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/cpu/common.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 4f430be285de..9918121e0adc 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2081,7 +2081,6 @@ static __init void identify_boot_cpu(void)
>  	enable_sep_cpu();
>  #endif
>  	cpu_detect_tlb(&boot_cpu_data);
> -	setup_cr_pinning();
>  
>  	tsx_init();
>  	tdx_init();
> @@ -2532,10 +2531,14 @@ void __init arch_cpu_finalize_init(void)
>  
>  	/*
>  	 * This needs to follow the FPU initializtion, since EFI depends on it.
> +	 *
> +	 * EFI twiddles CR4.LASS. Do it before CR pinning.
>  	 */
>  	if (efi_enabled(EFI_RUNTIME_SERVICES))
>  		efi_enter_virtual_mode();
>  
> +	setup_cr_pinning();
> +

Instead of EFI toggling CR4.LASS, why not defer the first LASS
activation itself?

i.e.

	if (efi_enabled(EFI_RUNTIME_SERVICES))
		efi_enter_virtual_mode();

	setup_lass();

	setup_cr_pinning();


This way, we can avoid the following patch (#5) altogether.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 11/17] x86/cpu: Set LASS CR4 bit as pinning sensitive
  2025-07-01  9:58 ` [PATCHv8 11/17] x86/cpu: Set LASS CR4 bit as pinning sensitive Kirill A. Shutemov
@ 2025-07-01 22:51   ` Sohil Mehta
  0 siblings, 0 replies; 60+ messages in thread
From: Sohil Mehta @ 2025-07-01 22:51 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> From: Yian Chen <yian.chen@intel.com>
> 
> Security features such as LASS are not expected to be disabled once
> initialized. Add LASS to the CR4 pinned mask.
> 
> Signed-off-by: Yian Chen <yian.chen@intel.com>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/cpu/common.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

I think this CR4 pinning change can be merged with the other CR pinning
related patch (#4). At a minimum, this should be placed close to that
patch to make logical sense.

1) Add LASS to the CR4 pinned mask
2) Defer CR pinning since it would cause XYZ issue.

Or the other way around. Anyway,

Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>

> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 9918121e0adc..1552c7510380 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -403,7 +403,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
>  
>  /* These bits should not change their value after CPU init is finished. */
>  static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
> -					     X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED;
> +					     X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED |
> +					     X86_CR4_LASS;
>  static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
>  static unsigned long cr4_pinned_bits __ro_after_init;
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 15/17] x86/cpu: Make LAM depend on LASS
  2025-07-01  9:58 ` [PATCHv8 15/17] x86/cpu: Make LAM depend on LASS Kirill A. Shutemov
@ 2025-07-01 23:03   ` Sohil Mehta
  0 siblings, 0 replies; 60+ messages in thread
From: Sohil Mehta @ 2025-07-01 23:03 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> 
> To prevent exploits for Spectre based on LAM as demonstrated by the
> whitepaper [1], make LAM depend on LASS, which avoids this type of
> vulnerability.
> 
> [1] https://download.vusec.net/papers/slam_sp24.pdf
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/cpu/cpuid-deps.c | 1 +
>  1 file changed, 1 insertion(+)
> 

In terms of patch ordering, wouldn't it make more sense to introduce LAM
related changes after LASS has been fully enabled? This patch should
probably be after Patch #16 which enables LASS.

Logically, the LAM re-enabling stuff can be a separate series, but since
it's only a few changed lines having it at the end seems okay.

Patch 1-15  => Enable LASS
Patch 16-17 => Re-enable LAM

Other than that,

Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>

> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index 98d0cdd82574..11bb9ed40140 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -90,6 +90,7 @@ static const struct cpuid_dep cpuid_deps[] = {
>  	{ X86_FEATURE_FRED,			X86_FEATURE_LKGS      },
>  	{ X86_FEATURE_SPEC_CTRL_SSBD,		X86_FEATURE_SPEC_CTRL },
>  	{ X86_FEATURE_LASS,			X86_FEATURE_SMAP      },
> +	{ X86_FEATURE_LAM,			X86_FEATURE_LASS      },
>  	{}
>  };
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization
  2025-07-01  9:58 ` [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization Kirill A. Shutemov
  2025-07-01 19:03   ` Sohil Mehta
@ 2025-07-01 23:10   ` Dave Hansen
  2025-07-02 10:05     ` Kirill A. Shutemov
  1 sibling, 1 reply; 60+ messages in thread
From: Dave Hansen @ 2025-07-01 23:10 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/25 02:58, Kirill A. Shutemov wrote:
> Move CR pinning setup behind the EFI initialization.

I kinda grumble about these one-off solutions. Could we just do this
once and for all and defer CR pinning as long as possible? For instance,
could we do it in a late_initcall()?

Do we need pinning before userspace comes up?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 17/17] x86: Re-enable Linear Address Masking
  2025-07-01  9:58 ` [PATCHv8 17/17] x86: Re-enable Linear Address Masking Kirill A. Shutemov
@ 2025-07-01 23:13   ` Sohil Mehta
  0 siblings, 0 replies; 60+ messages in thread
From: Sohil Mehta @ 2025-07-01 23:13 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> This reverts commit 3267cb6d3a174ff83d6287dcd5b0047bbd912452.
> 

This patch isn't truly a revert. This line can be skipped since the
additional changes going in are more than the reverted single line.

> LASS mitigates the Spectre based on LAM (SLAM) [1] and the previous
> commit made LAM depend on LASS, so we no longer need to disable LAM at
> compile time, so revert the commit that disables LAM.

Also, wording such as previous commit should be avoided since it can be
misleading. For example, in this series "the previous commit" is
enabling LASS. The commit before that adds the actual dependency between
LAM and LASS.

Other than that, the code changes look good to me.

Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>

> 
> Adjust USER_PTR_MAX if LAM enabled, allowing tag bits to be set for
> userspace pointers. The value for the constant is defined in a way to
> avoid overflow compiler warning on 32-bit config.
> 
> [1] https://download.vusec.net/papers/slam_sp24.pdf
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
>  arch/x86/Kconfig             | 1 -
>  arch/x86/kernel/cpu/common.c | 5 +----
>  2 files changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 71019b3b54ea..2b48e916b754 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2181,7 +2181,6 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
>  config ADDRESS_MASKING
>  	bool "Linear Address Masking support"
>  	depends on X86_64
> -	depends on COMPILE_TEST || !CPU_MITIGATIONS # wait for LASS
>  	help
>  	  Linear Address Masking (LAM) modifies the checking that is applied
>  	  to 64-bit linear addresses, allowing software to use of the
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 97a228f917a9..6f2ae9e702bc 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2558,11 +2558,8 @@ void __init arch_cpu_finalize_init(void)
>  	if (IS_ENABLED(CONFIG_X86_64)) {
>  		unsigned long USER_PTR_MAX = TASK_SIZE_MAX;
>  
> -		/*
> -		 * Enable this when LAM is gated on LASS support
>  		if (cpu_feature_enabled(X86_FEATURE_LAM))
> -			USER_PTR_MAX = (1ul << 63) - PAGE_SIZE;
> -		 */
> +			USER_PTR_MAX = (-1UL >> 1) & PAGE_MASK;
>  		runtime_const_init(ptr, USER_PTR_MAX);
>  
>  		/*


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 12/17] x86/traps: Communicate a LASS violation in #GP message
  2025-07-01  9:58 ` [PATCHv8 12/17] x86/traps: Communicate a LASS violation in #GP message Kirill A. Shutemov
@ 2025-07-02  0:36   ` Sohil Mehta
  2025-07-02 10:10     ` Kirill A. Shutemov
  0 siblings, 1 reply; 60+ messages in thread
From: Sohil Mehta @ 2025-07-02  0:36 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
>  /*
> @@ -672,6 +681,12 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
>  	if (*addr < ~__VIRTUAL_MASK &&
>  	    *addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
>  		return GP_NON_CANONICAL;
> +	else if (*addr < ~__VIRTUAL_MASK &&
> +		 cpu_feature_enabled(X86_FEATURE_LASS)) {
> +		if (*addr < PAGE_SIZE)
> +			return GP_NULL_POINTER;
> +		return GP_LASS_VIOLATION;
> +	}

The comments above this section of code say:

/*
 * Check that:
 *  - the operand is not in the kernel half
 *  - the last byte of the operand is not in the user canonical half
 */

They should be updated since we are updating the logic.

Also, below is easier to read than above:

	if (*addr < ~__VIRTUAL_MASK) {

		if (*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
			return EXC_NON_CANONICAL;

		if (cpu_feature_enabled(X86_FEATURE_LASS)) {
			if (*addr < PAGE_SIZE)
				return EXC_NULL_POINTER;
			return EXC_LASS_VIOLATION;
		}
	}

I am wondering if the NULL pointer exception should be made
unconditional, even if it is unlikely to reach here without LASS. So
maybe something like this:

	if (*addr < ~__VIRTUAL_MASK) {

		if (*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
			return EXC_NON_CANONICAL;

		if (*addr < PAGE_SIZE)
			return EXC_NULL_POINTER;

		if (cpu_feature_enabled(X86_FEATURE_LASS))
			return EXC_LASS_VIOLATION;
	}



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 13/17] x86/traps: Generalize #GP address decode and hint code
  2025-07-01  9:58 ` [PATCHv8 13/17] x86/traps: Generalize #GP address decode and hint code Kirill A. Shutemov
@ 2025-07-02  0:54   ` Sohil Mehta
  0 siblings, 0 replies; 60+ messages in thread
From: Sohil Mehta @ 2025-07-02  0:54 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> Handlers for #GP and #SS will now share code to decode the exception
> address and retrieve the exception hint string.
> 

This is missing an essential "why"? Why do #GP and #SS handlers need to
share code? None of the patches prior to this have hinted this.

It can probably be deduced from a later patch, but it needs to be
clarified in this one. Maybe a simplified version of the text from the SDM:

"In most cases, an access causing a LASS violation results in a general
protection exception (#GP); for stack accesses (those due to
stack-oriented instructions, as well as accesses that implicitly or
explicitly use the SS segment register), a stack fault (#SS) is generated."

> The helper, enum, and array should be renamed as they are no longer
> specific to #GP.
> 
> No functional change intended.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/traps.c | 62 ++++++++++++++++++++---------------------
>  1 file changed, 31 insertions(+), 31 deletions(-)
> 

The code changes look okay to me except a minor nit below.

>  #define GPFSTR "general protection fault"
> @@ -808,8 +808,8 @@ static void gp_user_force_sig_segv(struct pt_regs *regs, int trapnr,
>  DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>  {
>  	char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
> -	enum kernel_gp_hint hint = GP_NO_HINT;
> -	unsigned long gp_addr;

gp_addr is a local variable to the #GP handler. It can probably stay the
same.

> +	enum kernel_exc_hint hint = EXC_NO_HINT;
> +	unsigned long exc_addr;
>  
>  	if (user_mode(regs) && try_fixup_enqcmd_gp())
>  		return;
> @@ -846,21 +846,21 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>  	if (error_code)
>  		snprintf(desc, sizeof(desc), "segment-related " GPFSTR);
>  	else
> -		hint = get_kernel_gp_address(regs, &gp_addr);
> +		hint = get_kernel_exc_address(regs, &exc_addr);
>  
> -	if (hint != GP_NO_HINT) {
> +	if (hint != EXC_NO_HINT) {
>  		snprintf(desc, sizeof(desc), GPFSTR ", %s 0x%lx",
> -			 kernel_gp_hint_help[hint], gp_addr);
> +			 kernel_exc_hint_help[hint], exc_addr);
>  	}
>  
>  	/*
>  	 * KASAN is interested only in the non-canonical case, clear it
>  	 * otherwise.
>  	 */
> -	if (hint != GP_NON_CANONICAL)
> -		gp_addr = 0;
> +	if (hint != EXC_NON_CANONICAL)
> +		exc_addr = 0;
>  
> -	die_addr(desc, regs, error_code, gp_addr);
> +	die_addr(desc, regs, error_code, exc_addr);
>  
>  exit:
>  	cond_local_irq_disable(regs);


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-01  9:58 ` [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS Kirill A. Shutemov
@ 2025-07-02  1:35   ` Sohil Mehta
  2025-07-02  2:00     ` H. Peter Anvin
                       ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Sohil Mehta @ 2025-07-02  1:35 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> LASS throws a #GP for any violations except for stack register accesses,
> in which case it throws a #SS instead. Handle this similarly to how other
> LASS violations are handled.
> 

Maybe I've misunderstood something:

Is the underlying assumption here that #SS were previously only
generated by userspace, but now they can also be generated by the
kernel? And we want the kernel generated #SS to behave the same as the #GP?

> In case of FRED, before handling #SS as LASS violation, kernel has to
> check if there's a fixup for the exception. It can address #SS due to
> invalid user context on ERETU. See 5105e7687ad3 ("x86/fred: Fixup
> fault on ERETU by jumping to fred_entrypoint_user") for more details.
> 
> Co-developed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/traps.c | 39 +++++++++++++++++++++++++++++++++------
>  1 file changed, 33 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index ceb091f17a5b..f9ca5b911141 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -418,12 +418,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
>  		      SIGBUS, 0, NULL);
>  }
>  
> -DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
> -{
> -	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
> -		      0, NULL);
> -}
> -
>  DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
>  {
>  	char *str = "alignment check";
> @@ -866,6 +860,39 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>  	cond_local_irq_disable(regs);
>  }
>  
> +#define SSFSTR "stack segment fault"
> +
> +DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
> +{
> +	if (user_mode(regs))
> +		goto error_trap;
> +
> +	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
> +	    fixup_exception(regs, X86_TRAP_SS, error_code, 0))
> +		return;
> +
> +	if (cpu_feature_enabled(X86_FEATURE_LASS)) {
> +		enum kernel_exc_hint hint;
> +		unsigned long exc_addr;
> +
> +		hint = get_kernel_exc_address(regs, &exc_addr);
> +		if (hint != EXC_NO_HINT) {

The brackets are not needed for singular statements. Also the max line
length is longer now. You can fit this all in a single line.

> +			printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint],
> +			       exc_addr);
> +		}
> +

> +		if (hint != EXC_NON_CANONICAL)
> +			exc_addr = 0;
> +
> +		die_addr(SSFSTR, regs, error_code, exc_addr);

The variable names in die_addr() should be generalized as well. They
seem to assume the caller to be a #GP handler.

> +		return;
> +	}
> +
> +error_trap:
> +	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
> +		      0, NULL);
> +}
> +
>  static bool do_int3(struct pt_regs *regs)
>  {
>  	int res;


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02  1:35   ` Sohil Mehta
@ 2025-07-02  2:00     ` H. Peter Anvin
  2025-07-02  2:06     ` H. Peter Anvin
  2025-07-02 13:27     ` Kirill A. Shutemov
  2 siblings, 0 replies; 60+ messages in thread
From: H. Peter Anvin @ 2025-07-02  2:00 UTC (permalink / raw)
  To: Sohil Mehta, Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On July 1, 2025 6:35:40 PM PDT, Sohil Mehta <sohil.mehta@intel.com> wrote:
>On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
>> LASS throws a #GP for any violations except for stack register accesses,
>> in which case it throws a #SS instead. Handle this similarly to how other
>> LASS violations are handled.
>> 
>
>Maybe I've misunderstood something:
>
>Is the underlying assumption here that #SS were previously only
>generated by userspace, but now they can also be generated by the
>kernel? And we want the kernel generated #SS to behave the same as the #GP?
>
>> In case of FRED, before handling #SS as LASS violation, kernel has to
>> check if there's a fixup for the exception. It can address #SS due to
>> invalid user context on ERETU. See 5105e7687ad3 ("x86/fred: Fixup
>> fault on ERETU by jumping to fred_entrypoint_user") for more details.
>> 
>> Co-developed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> ---
>>  arch/x86/kernel/traps.c | 39 +++++++++++++++++++++++++++++++++------
>>  1 file changed, 33 insertions(+), 6 deletions(-)
>> 
>> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> index ceb091f17a5b..f9ca5b911141 100644
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>> @@ -418,12 +418,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
>>  		      SIGBUS, 0, NULL);
>>  }
>>  
>> -DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
>> -{
>> -	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
>> -		      0, NULL);
>> -}
>> -
>>  DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
>>  {
>>  	char *str = "alignment check";
>> @@ -866,6 +860,39 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>>  	cond_local_irq_disable(regs);
>>  }
>>  
>> +#define SSFSTR "stack segment fault"
>> +
>> +DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
>> +{
>> +	if (user_mode(regs))
>> +		goto error_trap;
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
>> +	    fixup_exception(regs, X86_TRAP_SS, error_code, 0))
>> +		return;
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_LASS)) {
>> +		enum kernel_exc_hint hint;
>> +		unsigned long exc_addr;
>> +
>> +		hint = get_kernel_exc_address(regs, &exc_addr);
>> +		if (hint != EXC_NO_HINT) {
>
>The brackets are not needed for singular statements. Also the max line
>length is longer now. You can fit this all in a single line.
>
>> +			printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint],
>> +			       exc_addr);
>> +		}
>> +
>
>> +		if (hint != EXC_NON_CANONICAL)
>> +			exc_addr = 0;
>> +
>> +		die_addr(SSFSTR, regs, error_code, exc_addr);
>
>The variable names in die_addr() should be generalized as well. They
>seem to assume the caller to be a #GP handler.
>
>> +		return;
>> +	}
>> +
>> +error_trap:
>> +	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
>> +		      0, NULL);
>> +}
>> +
>>  static bool do_int3(struct pt_regs *regs)
>>  {
>>  	int res;
>

An #SS can be generated by the kernel if RSP is corrupted. This is fatal, but as always we want to get a message out.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02  1:35   ` Sohil Mehta
  2025-07-02  2:00     ` H. Peter Anvin
@ 2025-07-02  2:06     ` H. Peter Anvin
  2025-07-02 10:17       ` Kirill A. Shutemov
                         ` (2 more replies)
  2025-07-02 13:27     ` Kirill A. Shutemov
  2 siblings, 3 replies; 60+ messages in thread
From: H. Peter Anvin @ 2025-07-02  2:06 UTC (permalink / raw)
  To: Sohil Mehta, Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On July 1, 2025 6:35:40 PM PDT, Sohil Mehta <sohil.mehta@intel.com> wrote:
>On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
>> LASS throws a #GP for any violations except for stack register accesses,
>> in which case it throws a #SS instead. Handle this similarly to how other
>> LASS violations are handled.
>> 
>
>Maybe I've misunderstood something:
>
>Is the underlying assumption here that #SS were previously only
>generated by userspace, but now they can also be generated by the
>kernel? And we want the kernel generated #SS to behave the same as the #GP?
>
>> In case of FRED, before handling #SS as LASS violation, kernel has to
>> check if there's a fixup for the exception. It can address #SS due to
>> invalid user context on ERETU. See 5105e7687ad3 ("x86/fred: Fixup
>> fault on ERETU by jumping to fred_entrypoint_user") for more details.
>> 
>> Co-developed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> ---
>>  arch/x86/kernel/traps.c | 39 +++++++++++++++++++++++++++++++++------
>>  1 file changed, 33 insertions(+), 6 deletions(-)
>> 
>> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> index ceb091f17a5b..f9ca5b911141 100644
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>> @@ -418,12 +418,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
>>  		      SIGBUS, 0, NULL);
>>  }
>>  
>> -DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
>> -{
>> -	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
>> -		      0, NULL);
>> -}
>> -
>>  DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
>>  {
>>  	char *str = "alignment check";
>> @@ -866,6 +860,39 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>>  	cond_local_irq_disable(regs);
>>  }
>>  
>> +#define SSFSTR "stack segment fault"
>> +
>> +DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
>> +{
>> +	if (user_mode(regs))
>> +		goto error_trap;
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
>> +	    fixup_exception(regs, X86_TRAP_SS, error_code, 0))
>> +		return;
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_LASS)) {
>> +		enum kernel_exc_hint hint;
>> +		unsigned long exc_addr;
>> +
>> +		hint = get_kernel_exc_address(regs, &exc_addr);
>> +		if (hint != EXC_NO_HINT) {
>
>The brackets are not needed for singular statements. Also the max line
>length is longer now. You can fit this all in a single line.
>
>> +			printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint],
>> +			       exc_addr);
>> +		}
>> +
>
>> +		if (hint != EXC_NON_CANONICAL)
>> +			exc_addr = 0;
>> +
>> +		die_addr(SSFSTR, regs, error_code, exc_addr);
>
>The variable names in die_addr() should be generalized as well. They
>seem to assume the caller to be a #GP handler.
>
>> +		return;
>> +	}
>> +
>> +error_trap:
>> +	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
>> +		      0, NULL);
>> +}
>> +
>>  static bool do_int3(struct pt_regs *regs)
>>  {
>>  	int res;
>

Note: for a FRED system, ERETU can generate #SS for a non-canonical user space RSP even in the absence of LASS, so if that is not currently handled that is an active bug.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization
  2025-07-01 19:03   ` Sohil Mehta
@ 2025-07-02  9:47     ` Kirill A. Shutemov
  0 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-02  9:47 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 01, 2025 at 12:03:01PM -0700, Sohil Mehta wrote:
> On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> > From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > 
> > In order to map the EFI runtime services, set_virtual_address_map()
> > needs to be called, which resides in the lower half of the address
> > space. This means that LASS needs to be temporarily disabled around
> > this call. This can only be done before the CR pinning is set up.
> > 
> > Move CR pinning setup behind the EFI initialization.
> > 
> > Wrapping efi_enter_virtual_mode() into lass_disable/enable_enforcement()
> 
> I believe this should be lass_stac()/clac() since we reverted to the
> original naming.

Doh. Will fix.

> > is not enough because AC flag gates data accesses, but not instruction
> > fetch. Clearing the CR4 bit is required.
> > 
> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/kernel/cpu/common.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> > index 4f430be285de..9918121e0adc 100644
> > --- a/arch/x86/kernel/cpu/common.c
> > +++ b/arch/x86/kernel/cpu/common.c
> > @@ -2081,7 +2081,6 @@ static __init void identify_boot_cpu(void)
> >  	enable_sep_cpu();
> >  #endif
> >  	cpu_detect_tlb(&boot_cpu_data);
> > -	setup_cr_pinning();
> >  
> >  	tsx_init();
> >  	tdx_init();
> > @@ -2532,10 +2531,14 @@ void __init arch_cpu_finalize_init(void)
> >  
> >  	/*
> >  	 * This needs to follow the FPU initializtion, since EFI depends on it.
> > +	 *
> > +	 * EFI twiddles CR4.LASS. Do it before CR pinning.
> >  	 */
> >  	if (efi_enabled(EFI_RUNTIME_SERVICES))
> >  		efi_enter_virtual_mode();
> >  
> > +	setup_cr_pinning();
> > +
> 
> Instead of EFI toggling CR4.LASS, why not defer the first LASS
> activation itself?
> 
> i.e.
> 
> 	if (efi_enabled(EFI_RUNTIME_SERVICES))
> 		efi_enter_virtual_mode();
> 
> 	setup_lass();
> 
> 	setup_cr_pinning();
> 
> 
> This way, we can avoid the following patch (#5) altogether.

That's definitely an option.

The benefit of current approach is that the enforcement is enabled
earlier and cover more boot code, providing marginal protection
improvement.

I also like that related security features (SMEP/SMAP/UMIP/LASS) are
enabled in the same place.

In the end it is a judgement call.

Maintainers, any preference?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization
  2025-07-01 23:10   ` Dave Hansen
@ 2025-07-02 10:05     ` Kirill A. Shutemov
  2025-07-04 12:23       ` Kirill A. Shutemov
  0 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-02 10:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 01, 2025 at 04:10:19PM -0700, Dave Hansen wrote:
> On 7/1/25 02:58, Kirill A. Shutemov wrote:
> > Move CR pinning setup behind the EFI initialization.
> 
> I kinda grumble about these one-off solutions. Could we just do this
> once and for all and defer CR pinning as long as possible? For instance,
> could we do it in a late_initcall()?
> 
> Do we need pinning before userspace comes up?

Hm. I operated from an assumption that we want to pin control registers as
early as possible to get most benefit from it.

I guess we can defer it until later. But I am not sure late_initcall() is
the right place. Do we want random driver to twiddle control registers?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 12/17] x86/traps: Communicate a LASS violation in #GP message
  2025-07-02  0:36   ` Sohil Mehta
@ 2025-07-02 10:10     ` Kirill A. Shutemov
  0 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-02 10:10 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 01, 2025 at 05:36:06PM -0700, Sohil Mehta wrote:
> On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> >  /*
> > @@ -672,6 +681,12 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
> >  	if (*addr < ~__VIRTUAL_MASK &&
> >  	    *addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
> >  		return GP_NON_CANONICAL;
> > +	else if (*addr < ~__VIRTUAL_MASK &&
> > +		 cpu_feature_enabled(X86_FEATURE_LASS)) {
> > +		if (*addr < PAGE_SIZE)
> > +			return GP_NULL_POINTER;
> > +		return GP_LASS_VIOLATION;
> > +	}
> 
> The comments above this section of code say:
> 
> /*
>  * Check that:
>  *  - the operand is not in the kernel half
>  *  - the last byte of the operand is not in the user canonical half
>  */
> 
> They should be updated since we are updating the logic.

Okay.

> Also, below is easier to read than above:
> 
> 	if (*addr < ~__VIRTUAL_MASK) {
> 
> 		if (*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
> 			return EXC_NON_CANONICAL;
> 
> 		if (cpu_feature_enabled(X86_FEATURE_LASS)) {
> 			if (*addr < PAGE_SIZE)
> 				return EXC_NULL_POINTER;
> 			return EXC_LASS_VIOLATION;
> 		}
> 	}
> 
> I am wondering if the NULL pointer exception should be made
> unconditional, even if it is unlikely to reach here without LASS. So
> maybe something like this:
> 
> 	if (*addr < ~__VIRTUAL_MASK) {
> 
> 		if (*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
> 			return EXC_NON_CANONICAL;
> 
> 		if (*addr < PAGE_SIZE)
> 			return EXC_NULL_POINTER;
> 
> 		if (cpu_feature_enabled(X86_FEATURE_LASS))
> 			return EXC_LASS_VIOLATION;
> 	}

That's cleaner.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02  2:06     ` H. Peter Anvin
@ 2025-07-02 10:17       ` Kirill A. Shutemov
  2025-07-02 14:37         ` H. Peter Anvin
  2025-07-02 23:42       ` Andrew Cooper
  2025-07-06  9:22       ` David Laight
  2 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-02 10:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sohil Mehta, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 01, 2025 at 07:06:10PM -0700, H. Peter Anvin wrote:
> On July 1, 2025 6:35:40 PM PDT, Sohil Mehta <sohil.mehta@intel.com> wrote:
> >On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> >> LASS throws a #GP for any violations except for stack register accesses,
> >> in which case it throws a #SS instead. Handle this similarly to how other
> >> LASS violations are handled.
> >> 
> >
> >Maybe I've misunderstood something:
> >
> >Is the underlying assumption here that #SS were previously only
> >generated by userspace, but now they can also be generated by the
> >kernel? And we want the kernel generated #SS to behave the same as the #GP?
> >
> >> In case of FRED, before handling #SS as LASS violation, kernel has to
> >> check if there's a fixup for the exception. It can address #SS due to
> >> invalid user context on ERETU. See 5105e7687ad3 ("x86/fred: Fixup
> >> fault on ERETU by jumping to fred_entrypoint_user") for more details.
> >> 
> >> Co-developed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> ---
> >>  arch/x86/kernel/traps.c | 39 +++++++++++++++++++++++++++++++++------
> >>  1 file changed, 33 insertions(+), 6 deletions(-)
> >> 
> >> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> >> index ceb091f17a5b..f9ca5b911141 100644
> >> --- a/arch/x86/kernel/traps.c
> >> +++ b/arch/x86/kernel/traps.c
> >> @@ -418,12 +418,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
> >>  		      SIGBUS, 0, NULL);
> >>  }
> >>  
> >> -DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
> >> -{
> >> -	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
> >> -		      0, NULL);
> >> -}
> >> -
> >>  DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
> >>  {
> >>  	char *str = "alignment check";
> >> @@ -866,6 +860,39 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
> >>  	cond_local_irq_disable(regs);
> >>  }
> >>  
> >> +#define SSFSTR "stack segment fault"
> >> +
> >> +DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
> >> +{
> >> +	if (user_mode(regs))
> >> +		goto error_trap;
> >> +
> >> +	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
> >> +	    fixup_exception(regs, X86_TRAP_SS, error_code, 0))
> >> +		return;
> >> +
> >> +	if (cpu_feature_enabled(X86_FEATURE_LASS)) {
> >> +		enum kernel_exc_hint hint;
> >> +		unsigned long exc_addr;
> >> +
> >> +		hint = get_kernel_exc_address(regs, &exc_addr);
> >> +		if (hint != EXC_NO_HINT) {
> >
> >The brackets are not needed for singular statements. Also the max line
> >length is longer now. You can fit this all in a single line.
> >
> >> +			printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint],
> >> +			       exc_addr);
> >> +		}
> >> +
> >
> >> +		if (hint != EXC_NON_CANONICAL)
> >> +			exc_addr = 0;
> >> +
> >> +		die_addr(SSFSTR, regs, error_code, exc_addr);
> >
> >The variable names in die_addr() should be generalized as well. They
> >seem to assume the caller to be a #GP handler.
> >
> >> +		return;
> >> +	}
> >> +
> >> +error_trap:
> >> +	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
> >> +		      0, NULL);
> >> +}
> >> +
> >>  static bool do_int3(struct pt_regs *regs)
> >>  {
> >>  	int res;
> >
> 
> Note: for a FRED system, ERETU can generate #SS for a non-canonical user space RSP even in the absence of LASS, so if that is not currently handled that is an active bug.

It is handled by fixup code inside do_error_trap(). We need to add
explicit fixup before LASS handling to avoid treating bad userspace RSP as
kernel LASS violation.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02  1:35   ` Sohil Mehta
  2025-07-02  2:00     ` H. Peter Anvin
  2025-07-02  2:06     ` H. Peter Anvin
@ 2025-07-02 13:27     ` Kirill A. Shutemov
  2025-07-02 17:56       ` Sohil Mehta
  2025-07-02 20:05       ` Sohil Mehta
  2 siblings, 2 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-02 13:27 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 01, 2025 at 06:35:40PM -0700, Sohil Mehta wrote:
> On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
> > LASS throws a #GP for any violations except for stack register accesses,
> > in which case it throws a #SS instead. Handle this similarly to how other
> > LASS violations are handled.
> > 
> 
> Maybe I've misunderstood something:
> 
> Is the underlying assumption here that #SS were previously only
> generated by userspace, but now they can also be generated by the
> kernel? And we want the kernel generated #SS to behave the same as the #GP?

It can be generated by both kernel and userspace if RSP gets corrupted.

So far, do_error_trap() did the trick, handling what has to be handled.
LASS requires a bit more, though.

> 
> > In case of FRED, before handling #SS as LASS violation, kernel has to
> > check if there's a fixup for the exception. It can address #SS due to
> > invalid user context on ERETU. See 5105e7687ad3 ("x86/fred: Fixup
> > fault on ERETU by jumping to fred_entrypoint_user") for more details.
> > 
> > Co-developed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/kernel/traps.c | 39 +++++++++++++++++++++++++++++++++------
> >  1 file changed, 33 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index ceb091f17a5b..f9ca5b911141 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -418,12 +418,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
> >  		      SIGBUS, 0, NULL);
> >  }
> >  
> > -DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
> > -{
> > -	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
> > -		      0, NULL);
> > -}
> > -
> >  DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
> >  {
> >  	char *str = "alignment check";
> > @@ -866,6 +860,39 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
> >  	cond_local_irq_disable(regs);
> >  }
> >  
> > +#define SSFSTR "stack segment fault"
> > +
> > +DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
> > +{
> > +	if (user_mode(regs))
> > +		goto error_trap;
> > +
> > +	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
> > +	    fixup_exception(regs, X86_TRAP_SS, error_code, 0))
> > +		return;
> > +
> > +	if (cpu_feature_enabled(X86_FEATURE_LASS)) {
> > +		enum kernel_exc_hint hint;
> > +		unsigned long exc_addr;
> > +
> > +		hint = get_kernel_exc_address(regs, &exc_addr);
> > +		if (hint != EXC_NO_HINT) {
> 
> The brackets are not needed for singular statements. Also the max line
> length is longer now. You can fit this all in a single line.

I think line split if justified. It is 120 characters long otherwise.
And with multi-line statement, brackets help readability.

I don't see a reason to change it.

> > +			printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint],
> > +			       exc_addr);
> > +		}
> > +
> 
> > +		if (hint != EXC_NON_CANONICAL)
> > +			exc_addr = 0;
> > +
> > +		die_addr(SSFSTR, regs, error_code, exc_addr);
> 
> The variable names in die_addr() should be generalized as well. They
> seem to assume the caller to be a #GP handler.

Okay, will fold into "x86/traps: Generalize #GP address decode and hint
code".

> > +		return;
> > +	}
> > +
> > +error_trap:
> > +	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
> > +		      0, NULL);
> > +}
> > +
> >  static bool do_int3(struct pt_regs *regs)
> >  {
> >  	int res;
> 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02 10:17       ` Kirill A. Shutemov
@ 2025-07-02 14:37         ` H. Peter Anvin
  2025-07-02 14:47           ` Kirill A. Shutemov
  0 siblings, 1 reply; 60+ messages in thread
From: H. Peter Anvin @ 2025-07-02 14:37 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sohil Mehta, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On July 2, 2025 3:17:10 AM PDT, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>On Tue, Jul 01, 2025 at 07:06:10PM -0700, H. Peter Anvin wrote:
>> On July 1, 2025 6:35:40 PM PDT, Sohil Mehta <sohil.mehta@intel.com> wrote:
>> >On 7/1/2025 2:58 AM, Kirill A. Shutemov wrote:
>> >> LASS throws a #GP for any violations except for stack register accesses,
>> >> in which case it throws a #SS instead. Handle this similarly to how other
>> >> LASS violations are handled.
>> >> 
>> >
>> >Maybe I've misunderstood something:
>> >
>> >Is the underlying assumption here that #SS were previously only
>> >generated by userspace, but now they can also be generated by the
>> >kernel? And we want the kernel generated #SS to behave the same as the #GP?
>> >
>> >> In case of FRED, before handling #SS as LASS violation, kernel has to
>> >> check if there's a fixup for the exception. It can address #SS due to
>> >> invalid user context on ERETU. See 5105e7687ad3 ("x86/fred: Fixup
>> >> fault on ERETU by jumping to fred_entrypoint_user") for more details.
>> >> 
>> >> Co-developed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> >> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> >> ---
>> >>  arch/x86/kernel/traps.c | 39 +++++++++++++++++++++++++++++++++------
>> >>  1 file changed, 33 insertions(+), 6 deletions(-)
>> >> 
>> >> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> >> index ceb091f17a5b..f9ca5b911141 100644
>> >> --- a/arch/x86/kernel/traps.c
>> >> +++ b/arch/x86/kernel/traps.c
>> >> @@ -418,12 +418,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
>> >>  		      SIGBUS, 0, NULL);
>> >>  }
>> >>  
>> >> -DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
>> >> -{
>> >> -	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
>> >> -		      0, NULL);
>> >> -}
>> >> -
>> >>  DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
>> >>  {
>> >>  	char *str = "alignment check";
>> >> @@ -866,6 +860,39 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>> >>  	cond_local_irq_disable(regs);
>> >>  }
>> >>  
>> >> +#define SSFSTR "stack segment fault"
>> >> +
>> >> +DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
>> >> +{
>> >> +	if (user_mode(regs))
>> >> +		goto error_trap;
>> >> +
>> >> +	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
>> >> +	    fixup_exception(regs, X86_TRAP_SS, error_code, 0))
>> >> +		return;
>> >> +
>> >> +	if (cpu_feature_enabled(X86_FEATURE_LASS)) {
>> >> +		enum kernel_exc_hint hint;
>> >> +		unsigned long exc_addr;
>> >> +
>> >> +		hint = get_kernel_exc_address(regs, &exc_addr);
>> >> +		if (hint != EXC_NO_HINT) {
>> >
>> >The brackets are not needed for singular statements. Also the max line
>> >length is longer now. You can fit this all in a single line.
>> >
>> >> +			printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint],
>> >> +			       exc_addr);
>> >> +		}
>> >> +
>> >
>> >> +		if (hint != EXC_NON_CANONICAL)
>> >> +			exc_addr = 0;
>> >> +
>> >> +		die_addr(SSFSTR, regs, error_code, exc_addr);
>> >
>> >The variable names in die_addr() should be generalized as well. They
>> >seem to assume the caller to be a #GP handler.
>> >
>> >> +		return;
>> >> +	}
>> >> +
>> >> +error_trap:
>> >> +	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
>> >> +		      0, NULL);
>> >> +}
>> >> +
>> >>  static bool do_int3(struct pt_regs *regs)
>> >>  {
>> >>  	int res;
>> >
>> 
>> Note: for a FRED system, ERETU can generate #SS for a non-canonical user space RSP even in the absence of LASS, so if that is not currently handled that is an active bug.
>
>It is handled by fixup code inside do_error_trap(). We need to add
>explicit fixup before LASS handling to avoid treating bad userspace RSP as
>kernel LASS violation.
>

Great. I was pretty sure, but I wanted to address Sohil's question directly. Thanks for verifying.

A LASS violation of any kind in the kernel (unless handled by fixup, including user access fixup) ought to be fatal, correct?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02 14:37         ` H. Peter Anvin
@ 2025-07-02 14:47           ` Kirill A. Shutemov
  2025-07-02 17:10             ` H. Peter Anvin
  0 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-02 14:47 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sohil Mehta, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Wed, Jul 02, 2025 at 07:37:12AM -0700, H. Peter Anvin wrote:
> A LASS violation of any kind in the kernel (unless handled by fixup,
> including user access fixup) ought to be fatal, correct?

Yes, LASS violation is fatal for !user_mode(regs), unless addressed by
fixups.

For user_mode(regs), emulate_vsyscall_gp() is the notable exception.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02 14:47           ` Kirill A. Shutemov
@ 2025-07-02 17:10             ` H. Peter Anvin
  0 siblings, 0 replies; 60+ messages in thread
From: H. Peter Anvin @ 2025-07-02 17:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sohil Mehta, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On July 2, 2025 7:47:30 AM PDT, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>On Wed, Jul 02, 2025 at 07:37:12AM -0700, H. Peter Anvin wrote:
>> A LASS violation of any kind in the kernel (unless handled by fixup,
>> including user access fixup) ought to be fatal, correct?
>
>Yes, LASS violation is fatal for !user_mode(regs), unless addressed by
>fixups.
>
>For user_mode(regs), emulate_vsyscall_gp() is the notable exception.
>

Note also that for FRED we can have separate kernel and user space paths basically "for free". I'm not sure if we do that yet (if the infrastructure is there), but we could. 

Not that it matters for this case. This is a slow path.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02 13:27     ` Kirill A. Shutemov
@ 2025-07-02 17:56       ` Sohil Mehta
  2025-07-03 10:40         ` Kirill A. Shutemov
  2025-07-02 20:05       ` Sohil Mehta
  1 sibling, 1 reply; 60+ messages in thread
From: Sohil Mehta @ 2025-07-02 17:56 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On 7/2/2025 6:27 AM, Kirill A. Shutemov wrote:
>>> +
>>> +	if (cpu_feature_enabled(X86_FEATURE_LASS)) {
>>> +		enum kernel_exc_hint hint;
>>> +		unsigned long exc_addr;
>>> +
>>> +		hint = get_kernel_exc_address(regs, &exc_addr);
>>> +		if (hint != EXC_NO_HINT) {
>>
>> The brackets are not needed for singular statements. Also the max line
>> length is longer now. You can fit this all in a single line.
> 
> I think line split if justified. It is 120 characters long otherwise.
> And with multi-line statement, brackets help readability.
> 

Are you sure? Below ends at 90 characters for me, including the three
8-char tabs:

printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint], exc_addr);

> I don't see a reason to change it.

To reduce indentation, you could also do:

	if (!cpu_feature_enabled(X86_FEATURE_LASS))
		goto error_trap;

> 
>>> +			printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint],
>>> +			       exc_addr);
>>> +		}
>>> +
>>



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02 13:27     ` Kirill A. Shutemov
  2025-07-02 17:56       ` Sohil Mehta
@ 2025-07-02 20:05       ` Sohil Mehta
  2025-07-03 11:31         ` Kirill A. Shutemov
  1 sibling, 1 reply; 60+ messages in thread
From: Sohil Mehta @ 2025-07-02 20:05 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On 7/2/2025 6:27 AM, Kirill A. Shutemov wrote:

>>
>> Maybe I've misunderstood something:
>>
>> Is the underlying assumption here that #SS were previously only
>> generated by userspace, but now they can also be generated by the
>> kernel? And we want the kernel generated #SS to behave the same as the #GP?
> 
> It can be generated by both kernel and userspace if RSP gets corrupted.
> 
> So far, do_error_trap() did the trick, handling what has to be handled.
> LASS requires a bit more, though.
> 
Thank you for the information! The discussion in the other thread helped
clarify my confusion about the new FRED specific fixup outside the LASS
check.

IIUC, for kernel generated #SS, the prior code in do_error_trap()
would've done a few things such as notify_die() and
cond_local_irq_enable() before calling die().

The new code now directly calls die_addr(). Are we changing the behavior
for legacy kernel #SS? Also, why don't we need those calls for the new
LASS #SS?

I apologize if the questions seem too naive. I am finding the exception
handling code a bit convoluted to understand. In general, I would
suggest adding some code comments at least for the new code to help
ignorant folks like me looking at this in the future.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02  2:06     ` H. Peter Anvin
  2025-07-02 10:17       ` Kirill A. Shutemov
@ 2025-07-02 23:42       ` Andrew Cooper
  2025-07-03  0:44         ` H. Peter Anvin
  2025-07-06  9:22       ` David Laight
  2 siblings, 1 reply; 60+ messages in thread
From: Andrew Cooper @ 2025-07-02 23:42 UTC (permalink / raw)
  To: hpa
  Cc: acme, aik, akpm, alexander.shishkin, ardb, ast, bp, brijesh.singh,
	changbin.du, christophe.leroy, corbet, daniel.sneddon,
	dave.hansen, ebiggers, geert+renesas, houtao1, jgg, jgross,
	jpoimboe, kai.huang, kees, kirill.shutemov, leitao, linux-doc,
	linux-efi, linux-kernel, linux-mm, linux, luto, mcgrof, mhiramat,
	michael.roth, mingo, mingo, namhyung, paulmck, pawan.kumar.gupta,
	peterz, rick.p.edgecombe, rppt, sandipan.das, shijie, sohil.mehta,
	tglx, tj, tony.luck, vegard.nossum, x86, xin3.li, xiongwei.song,
	ytcoode

> Note: for a FRED system, ERETU can generate #SS for a non-canonical user space RSP

How?  Or to phrase it differently, I hope not.

%rsp is a 64bit value and does not have canonical restrictions elsewhere
in the architecture, so far as I'm aware.  IRET really can restore a
non-canonical %rsp, and userspace can run for an indeterminate period of
time with a non-canonical %rsp as long as there are no stack accesses.

Accesses relative to the the stack using a non-canonical pointer will
suffer #SS, but ERETU doesn't modify the userspace stack AFAICT.  I
can't see anything in the ERETU pseudocode in the FRED spec that
mentions a canonical check or memory access using %rsp.

~Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02 23:42       ` Andrew Cooper
@ 2025-07-03  0:44         ` H. Peter Anvin
  0 siblings, 0 replies; 60+ messages in thread
From: H. Peter Anvin @ 2025-07-03  0:44 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: acme, aik, akpm, alexander.shishkin, ardb, ast, bp, brijesh.singh,
	changbin.du, christophe.leroy, corbet, daniel.sneddon,
	dave.hansen, ebiggers, geert+renesas, houtao1, jgg, jgross,
	jpoimboe, kai.huang, kees, kirill.shutemov, leitao, linux-doc,
	linux-efi, linux-kernel, linux-mm, linux, luto, mcgrof, mhiramat,
	michael.roth, mingo, mingo, namhyung, paulmck, pawan.kumar.gupta,
	peterz, rick.p.edgecombe, rppt, sandipan.das, shijie, sohil.mehta,
	tglx, tj, tony.luck, vegard.nossum, x86, xin3.li, xiongwei.song,
	ytcoode

On July 2, 2025 4:42:27 PM PDT, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> Note: for a FRED system, ERETU can generate #SS for a non-canonical user space RSP
>
>How?  Or to phrase it differently, I hope not.
>
>%rsp is a 64bit value and does not have canonical restrictions elsewhere
>in the architecture, so far as I'm aware.  IRET really can restore a
>non-canonical %rsp, and userspace can run for an indeterminate period of
>time with a non-canonical %rsp as long as there are no stack accesses.
>
>Accesses relative to the the stack using a non-canonical pointer will
>suffer #SS, but ERETU doesn't modify the userspace stack AFAICT.  I
>can't see anything in the ERETU pseudocode in the FRED spec that
>mentions a canonical check or memory access using %rsp.
>
>~Andrew

You are right of course. Brainfart on my part.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-01  9:58 ` [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset Kirill A. Shutemov
@ 2025-07-03  8:44   ` David Laight
  2025-07-03 10:39     ` Kirill A. Shutemov
  2025-07-03 17:13   ` Dave Hansen
  1 sibling, 1 reply; 60+ messages in thread
From: David Laight @ 2025-07-03  8:44 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue,  1 Jul 2025 12:58:31 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> Extract memcpy and memset functions from copy_user_generic() and
> __clear_user().
> 
> They can be used as inline memcpy and memset instead of the GCC builtins
> whenever necessary. LASS requires them to handle text_poke.

Except they contain the fault handlers so aren't generic calls.

> 
> Originally-by: Peter Zijlstra <peterz@infradead.org>
> Link: https://lore.kernel.org/all/20241029184840.GJ14555@noisy.programming.kicks-ass.net/
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/asm/string.h     | 46 +++++++++++++++++++++++++++++++
>  arch/x86/include/asm/uaccess_64.h | 38 +++++++------------------
>  arch/x86/lib/clear_page_64.S      | 13 +++++++--
>  3 files changed, 67 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
> index c3c2c1914d65..17f6b5bfa8c1 100644
> --- a/arch/x86/include/asm/string.h
> +++ b/arch/x86/include/asm/string.h
> @@ -1,6 +1,52 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_X86_STRING_H
> +#define _ASM_X86_STRING_H
> +
> +#include <asm/asm.h>
> +#include <asm/alternative.h>
> +#include <asm/cpufeatures.h>
> +
>  #ifdef CONFIG_X86_32
>  # include <asm/string_32.h>
>  #else
>  # include <asm/string_64.h>
>  #endif
> +
> +#ifdef CONFIG_X86_64
> +#define ALT_64(orig, alt, feat) ALTERNATIVE(orig, alt, feat)
> +#else
> +#define ALT_64(orig, alt, feat) orig "\n"
> +#endif
> +
> +static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
> +{
> +	void *ret = to;
> +
> +	asm volatile("1:\n\t"
> +		     ALT_64("rep movsb",
> +			    "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
> +		     "2:\n\t"
> +		     _ASM_EXTABLE_UA(1b, 2b)
> +		     : "+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
> +		     : : "memory", _ASM_AX);
> +
> +	return ret + len;
> +}
> +
> +static __always_inline void *__inline_memset(void *addr, int v, size_t len)
> +{
> +	void *ret = addr;
> +
> +	asm volatile("1:\n\t"
> +		     ALT_64("rep stosb",
> +			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> +		     "2:\n\t"
> +		     _ASM_EXTABLE_UA(1b, 2b)
> +		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> +		     : "a" ((uint8_t)v)

You shouldn't need the (uint8_t) cast (should that be (u8) anyway).
At best it doesn't matter, at worst it will add code to mask with 0xff.

> +		     : "memory", _ASM_SI, _ASM_DX);
> +
> +	return ret + len;
> +}
> +
> +#endif /* _ASM_X86_STRING_H */
> diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
> index c8a5ae35c871..eb531e13e659 100644
> --- a/arch/x86/include/asm/uaccess_64.h
> +++ b/arch/x86/include/asm/uaccess_64.h
> @@ -13,6 +13,7 @@
>  #include <asm/page.h>
>  #include <asm/percpu.h>
>  #include <asm/runtime-const.h>
> +#include <asm/string.h>
>  
>  /*
>   * Virtual variable: there's no actual backing store for this,
> @@ -118,21 +119,12 @@ rep_movs_alternative(void *to, const void *from, unsigned len);
>  static __always_inline __must_check unsigned long
>  copy_user_generic(void *to, const void *from, unsigned long len)
>  {
> +	void *ret;
> +
>  	stac();
> -	/*
> -	 * If CPU has FSRM feature, use 'rep movs'.
> -	 * Otherwise, use rep_movs_alternative.
> -	 */
> -	asm volatile(
> -		"1:\n\t"
> -		ALTERNATIVE("rep movsb",
> -			    "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
> -		"2:\n"
> -		_ASM_EXTABLE_UA(1b, 2b)
> -		:"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
> -		: : "memory", "rax");
> +	ret = __inline_memcpy(to, from, len);
>  	clac();
> -	return len;
> +	return ret - to;
>  }
>  
>  static __always_inline __must_check unsigned long
> @@ -178,25 +170,15 @@ rep_stos_alternative(void __user *addr, unsigned long len);
>  
>  static __always_inline __must_check unsigned long __clear_user(void __user *addr, unsigned long size)
>  {
> +	void *ptr = (__force void *)addr;
> +	void *ret;
> +
>  	might_fault();
>  	stac();
> -
> -	/*
> -	 * No memory constraint because it doesn't change any memory gcc
> -	 * knows about.
> -	 */
> -	asm volatile(
> -		"1:\n\t"
> -		ALTERNATIVE("rep stosb",
> -			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRS))
> -		"2:\n"
> -	       _ASM_EXTABLE_UA(1b, 2b)
> -	       : "+c" (size), "+D" (addr), ASM_CALL_CONSTRAINT
> -	       : "a" (0));
> -
> +	ret = __inline_memset(ptr, 0, size);
>  	clac();
>  
> -	return size;
> +	return ret - ptr;
>  }
>  
>  static __always_inline unsigned long clear_user(void __user *to, unsigned long n)
> diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> index a508e4a8c66a..47b613690f84 100644
> --- a/arch/x86/lib/clear_page_64.S
> +++ b/arch/x86/lib/clear_page_64.S
> @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
>  EXPORT_SYMBOL_GPL(clear_page_erms)
>  
>  /*
> - * Default clear user-space.
> + * Default memset.
>   * Input:
>   * rdi destination
> + * rsi scratch
>   * rcx count
> - * rax is zero
> + * al is value
>   *
>   * Output:
>   * rcx: uncleared bytes or 0 if successful.
> + * rdx: clobbered
>   */
>  SYM_FUNC_START(rep_stos_alternative)
>  	ANNOTATE_NOENDBR
> +
> +	movzbq %al, %rsi
> +	movabs $0x0101010101010101, %rax
> +
> +	/* RDX:RAX = RAX * RSI */
> +	mulq %rsi

NAK - you can't do that here.
Neither %rsi nor %rdx can be trashed.
The function has a very explicit calling convention.

It is also almost certainly a waste of time.
Pretty much all the calls will be for a constant 0x00.
Rename it all memzero() ...

	David

> +
>  	cmpq $64,%rcx
>  	jae .Lunrolled
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-03  8:44   ` David Laight
@ 2025-07-03 10:39     ` Kirill A. Shutemov
  2025-07-03 12:15       ` David Laight
  0 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-03 10:39 UTC (permalink / raw)
  To: David Laight
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
> On Tue,  1 Jul 2025 12:58:31 +0300
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > Extract memcpy and memset functions from copy_user_generic() and
> > __clear_user().
> > 
> > They can be used as inline memcpy and memset instead of the GCC builtins
> > whenever necessary. LASS requires them to handle text_poke.
> 
> Except they contain the fault handlers so aren't generic calls.

That's true. I will add a comment to clarify it.

> > Originally-by: Peter Zijlstra <peterz@infradead.org>
> > Link: https://lore.kernel.org/all/20241029184840.GJ14555@noisy.programming.kicks-ass.net/
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/include/asm/string.h     | 46 +++++++++++++++++++++++++++++++
> >  arch/x86/include/asm/uaccess_64.h | 38 +++++++------------------
> >  arch/x86/lib/clear_page_64.S      | 13 +++++++--
> >  3 files changed, 67 insertions(+), 30 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
> > index c3c2c1914d65..17f6b5bfa8c1 100644
> > --- a/arch/x86/include/asm/string.h
> > +++ b/arch/x86/include/asm/string.h
> > @@ -1,6 +1,52 @@
> >  /* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _ASM_X86_STRING_H
> > +#define _ASM_X86_STRING_H
> > +
> > +#include <asm/asm.h>
> > +#include <asm/alternative.h>
> > +#include <asm/cpufeatures.h>
> > +
> >  #ifdef CONFIG_X86_32
> >  # include <asm/string_32.h>
> >  #else
> >  # include <asm/string_64.h>
> >  #endif
> > +
> > +#ifdef CONFIG_X86_64
> > +#define ALT_64(orig, alt, feat) ALTERNATIVE(orig, alt, feat)
> > +#else
> > +#define ALT_64(orig, alt, feat) orig "\n"
> > +#endif
> > +
> > +static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
> > +{
> > +	void *ret = to;
> > +
> > +	asm volatile("1:\n\t"
> > +		     ALT_64("rep movsb",
> > +			    "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
> > +		     "2:\n\t"
> > +		     _ASM_EXTABLE_UA(1b, 2b)
> > +		     : "+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
> > +		     : : "memory", _ASM_AX);
> > +
> > +	return ret + len;
> > +}
> > +
> > +static __always_inline void *__inline_memset(void *addr, int v, size_t len)
> > +{
> > +	void *ret = addr;
> > +
> > +	asm volatile("1:\n\t"
> > +		     ALT_64("rep stosb",
> > +			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> > +		     "2:\n\t"
> > +		     _ASM_EXTABLE_UA(1b, 2b)
> > +		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> > +		     : "a" ((uint8_t)v)
> 
> You shouldn't need the (uint8_t) cast (should that be (u8) anyway).
> At best it doesn't matter, at worst it will add code to mask with 0xff.

Right, will drop.

> > +		     : "memory", _ASM_SI, _ASM_DX);
> > +
> > +	return ret + len;
> > +}
> > +
> > +#endif /* _ASM_X86_STRING_H */
> > diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
> > index c8a5ae35c871..eb531e13e659 100644
> > --- a/arch/x86/include/asm/uaccess_64.h
> > +++ b/arch/x86/include/asm/uaccess_64.h
> > @@ -13,6 +13,7 @@
> >  #include <asm/page.h>
> >  #include <asm/percpu.h>
> >  #include <asm/runtime-const.h>
> > +#include <asm/string.h>
> >  
> >  /*
> >   * Virtual variable: there's no actual backing store for this,
> > @@ -118,21 +119,12 @@ rep_movs_alternative(void *to, const void *from, unsigned len);
> >  static __always_inline __must_check unsigned long
> >  copy_user_generic(void *to, const void *from, unsigned long len)
> >  {
> > +	void *ret;
> > +
> >  	stac();
> > -	/*
> > -	 * If CPU has FSRM feature, use 'rep movs'.
> > -	 * Otherwise, use rep_movs_alternative.
> > -	 */
> > -	asm volatile(
> > -		"1:\n\t"
> > -		ALTERNATIVE("rep movsb",
> > -			    "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
> > -		"2:\n"
> > -		_ASM_EXTABLE_UA(1b, 2b)
> > -		:"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
> > -		: : "memory", "rax");
> > +	ret = __inline_memcpy(to, from, len);
> >  	clac();
> > -	return len;
> > +	return ret - to;
> >  }
> >  
> >  static __always_inline __must_check unsigned long
> > @@ -178,25 +170,15 @@ rep_stos_alternative(void __user *addr, unsigned long len);
> >  
> >  static __always_inline __must_check unsigned long __clear_user(void __user *addr, unsigned long size)
> >  {
> > +	void *ptr = (__force void *)addr;
> > +	void *ret;
> > +
> >  	might_fault();
> >  	stac();
> > -
> > -	/*
> > -	 * No memory constraint because it doesn't change any memory gcc
> > -	 * knows about.
> > -	 */
> > -	asm volatile(
> > -		"1:\n\t"
> > -		ALTERNATIVE("rep stosb",
> > -			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRS))
> > -		"2:\n"
> > -	       _ASM_EXTABLE_UA(1b, 2b)
> > -	       : "+c" (size), "+D" (addr), ASM_CALL_CONSTRAINT
> > -	       : "a" (0));
> > -
> > +	ret = __inline_memset(ptr, 0, size);
> >  	clac();
> >  
> > -	return size;
> > +	return ret - ptr;
> >  }
> >  
> >  static __always_inline unsigned long clear_user(void __user *to, unsigned long n)
> > diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> > index a508e4a8c66a..47b613690f84 100644
> > --- a/arch/x86/lib/clear_page_64.S
> > +++ b/arch/x86/lib/clear_page_64.S
> > @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
> >  EXPORT_SYMBOL_GPL(clear_page_erms)
> >  
> >  /*
> > - * Default clear user-space.
> > + * Default memset.
> >   * Input:
> >   * rdi destination
> > + * rsi scratch
> >   * rcx count
> > - * rax is zero
> > + * al is value
> >   *
> >   * Output:
> >   * rcx: uncleared bytes or 0 if successful.
> > + * rdx: clobbered
> >   */
> >  SYM_FUNC_START(rep_stos_alternative)
> >  	ANNOTATE_NOENDBR
> > +
> > +	movzbq %al, %rsi
> > +	movabs $0x0101010101010101, %rax
> > +
> > +	/* RDX:RAX = RAX * RSI */
> > +	mulq %rsi
> 
> NAK - you can't do that here.
> Neither %rsi nor %rdx can be trashed.
> The function has a very explicit calling convention.

What calling convention? We change the only caller to confirm to this.

> It is also almost certainly a waste of time.
> Pretty much all the calls will be for a constant 0x00.
> Rename it all memzero() ...

text_poke_memset() is not limited to zeroing.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02 17:56       ` Sohil Mehta
@ 2025-07-03 10:40         ` Kirill A. Shutemov
  0 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-03 10:40 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Wed, Jul 02, 2025 at 10:56:25AM -0700, Sohil Mehta wrote:
> To reduce indentation, you could also do:
> 
> 	if (!cpu_feature_enabled(X86_FEATURE_LASS))
> 		goto error_trap;

Okay, will do this.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02 20:05       ` Sohil Mehta
@ 2025-07-03 11:31         ` Kirill A. Shutemov
  2025-07-03 20:12           ` Sohil Mehta
  0 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-03 11:31 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Wed, Jul 02, 2025 at 01:05:17PM -0700, Sohil Mehta wrote:
> On 7/2/2025 6:27 AM, Kirill A. Shutemov wrote:
> 
> >>
> >> Maybe I've misunderstood something:
> >>
> >> Is the underlying assumption here that #SS were previously only
> >> generated by userspace, but now they can also be generated by the
> >> kernel? And we want the kernel generated #SS to behave the same as the #GP?
> > 
> > It can be generated by both kernel and userspace if RSP gets corrupted.
> > 
> > So far, do_error_trap() did the trick, handling what has to be handled.
> > LASS requires a bit more, though.
> > 
> Thank you for the information! The discussion in the other thread helped
> clarify my confusion about the new FRED specific fixup outside the LASS
> check.
> 
> IIUC, for kernel generated #SS, the prior code in do_error_trap()
> would've done a few things such as notify_die() and
> cond_local_irq_enable() before calling die().

cond_local_irq_enable() need to happen if we want to do something
sleepable during exception handling. It is not the case here.

notify_die() will be called die_addr()->__die_body()->notify_die().

> The new code now directly calls die_addr(). Are we changing the behavior
> for legacy kernel #SS? Also, why don't we need those calls for the new
> LASS #SS?

do_error_trap() provides catch-all handling for unallowed-thing-happened
exception handling in either kernel or userspace.

We can take simpler path for fatal in-kernel exception. Following #GP
logic matches what we need.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-03 10:39     ` Kirill A. Shutemov
@ 2025-07-03 12:15       ` David Laight
  2025-07-03 13:33         ` Vegard Nossum
  2025-07-03 14:10         ` Kirill A. Shutemov
  0 siblings, 2 replies; 60+ messages in thread
From: David Laight @ 2025-07-03 12:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Thu, 3 Jul 2025 13:39:57 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
> > On Tue,  1 Jul 2025 12:58:31 +0300
> > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> >   
> > > Extract memcpy and memset functions from copy_user_generic() and
> > > __clear_user().
> > > 
> > > They can be used as inline memcpy and memset instead of the GCC builtins
> > > whenever necessary. LASS requires them to handle text_poke.  
> > 
> > Except they contain the fault handlers so aren't generic calls.  
> 
> That's true. I will add a comment to clarify it.

They need renaming.

...
> > > diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> > > index a508e4a8c66a..47b613690f84 100644
> > > --- a/arch/x86/lib/clear_page_64.S
> > > +++ b/arch/x86/lib/clear_page_64.S
> > > @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
> > >  EXPORT_SYMBOL_GPL(clear_page_erms)
> > >  
> > >  /*
> > > - * Default clear user-space.
> > > + * Default memset.
> > >   * Input:
> > >   * rdi destination
> > > + * rsi scratch
> > >   * rcx count
> > > - * rax is zero
> > > + * al is value
> > >   *
> > >   * Output:
> > >   * rcx: uncleared bytes or 0 if successful.
> > > + * rdx: clobbered
> > >   */
> > >  SYM_FUNC_START(rep_stos_alternative)
> > >  	ANNOTATE_NOENDBR
> > > +
> > > +	movzbq %al, %rsi
> > > +	movabs $0x0101010101010101, %rax
> > > +
> > > +	/* RDX:RAX = RAX * RSI */
> > > +	mulq %rsi  
> > 
> > NAK - you can't do that here.
> > Neither %rsi nor %rdx can be trashed.
> > The function has a very explicit calling convention.  
> 
> What calling convention? We change the only caller to confirm to this.

The one that is implicit in:

> > > +	asm volatile("1:\n\t"
> > > +		     ALT_64("rep stosb",
> > > +			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> > > +		     "2:\n\t"
> > > +		     _ASM_EXTABLE_UA(1b, 2b)
> > > +		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> > > +		     : "a" ((uint8_t)v)

The called function is only allowed to change the registers that
'rep stosb' uses - except it can access (but not change)
all of %rax - not just %al.

See: https://godbolt.org/z/3fnrT3x9r
In particular note that 'do_mset' must not change %rax.

This is very specific and is done so that the compiler can use
all the registers.

> > It is also almost certainly a waste of time.
> > Pretty much all the calls will be for a constant 0x00.
> > Rename it all memzero() ...  
> 
> text_poke_memset() is not limited to zeroing.

But you don't want the overhead of extending the constant
on all the calls - never mind reserving %rdx to do it.
Maybe define a function that requires the caller to have
done the 'dirty work' - so any code that wants memzero()
just passes zero.
Or do the multiply in the C code where it will get optimised
away for constant zero.
You do get the multiply for the 'rep stosb' case - but that
is always going to be true unless you complicate things further.  

	David



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-03 12:15       ` David Laight
@ 2025-07-03 13:33         ` Vegard Nossum
  2025-07-03 16:52           ` David Laight
  2025-07-03 14:10         ` Kirill A. Shutemov
  1 sibling, 1 reply; 60+ messages in thread
From: Vegard Nossum @ 2025-07-03 13:33 UTC (permalink / raw)
  To: David Laight, Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm


On 03/07/2025 14:15, David Laight wrote:
> On Thu, 3 Jul 2025 13:39:57 +0300
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>> On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
>>> On Tue,  1 Jul 2025 12:58:31 +0300
>>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>>>> diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
>>>> index a508e4a8c66a..47b613690f84 100644
>>>> --- a/arch/x86/lib/clear_page_64.S
>>>> +++ b/arch/x86/lib/clear_page_64.S
>>>> @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
>>>>   EXPORT_SYMBOL_GPL(clear_page_erms)
>>>>   
>>>>   /*
>>>> - * Default clear user-space.
>>>> + * Default memset.
>>>>    * Input:
>>>>    * rdi destination
>>>> + * rsi scratch
>>>>    * rcx count
>>>> - * rax is zero
>>>> + * al is value
>>>>    *
>>>>    * Output:
>>>>    * rcx: uncleared bytes or 0 if successful.
>>>> + * rdx: clobbered
>>>>    */
>>>>   SYM_FUNC_START(rep_stos_alternative)
>>>>   	ANNOTATE_NOENDBR
>>>> +
>>>> +	movzbq %al, %rsi
>>>> +	movabs $0x0101010101010101, %rax
>>>> +
>>>> +	/* RDX:RAX = RAX * RSI */
>>>> +	mulq %rsi
>>>
>>> NAK - you can't do that here.
>>> Neither %rsi nor %rdx can be trashed.
>>> The function has a very explicit calling convention.

That's why we have the clobbers... see below

>> What calling convention? We change the only caller to confirm to this.
> 
> The one that is implicit in:
> 
>>>> +	asm volatile("1:\n\t"
>>>> +		     ALT_64("rep stosb",
>>>> +			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
>>>> +		     "2:\n\t"
>>>> +		     _ASM_EXTABLE_UA(1b, 2b)
>>>> +		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
>>>> +		     : "a" ((uint8_t)v)
> 
> The called function is only allowed to change the registers that
> 'rep stosb' uses - except it can access (but not change)
> all of %rax - not just %al.
> 
> See: https://godbolt.org/z/3fnrT3x9r
> In particular note that 'do_mset' must not change %rax.
> 
> This is very specific and is done so that the compiler can use
> all the registers.

I feel like you trimmed off the clobbers from the asm() in the context
above. For reference, it is:

+		     : "memory", _ASM_SI, _ASM_DX);

I'm not saying this can't be optimized, but that doesn't seem to be your
complaint -- you say "the called function is only allowed to change
...", but this is not true when we have the clobbers, right?

This is exactly what I fixed with my v7 fixlet to this patch:

https://lore.kernel.org/all/1b96b0ca-5c14-4271-86c1-c305bf052b16@oracle.com/


Vegard

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-03 12:15       ` David Laight
  2025-07-03 13:33         ` Vegard Nossum
@ 2025-07-03 14:10         ` Kirill A. Shutemov
  2025-07-03 17:02           ` David Laight
  1 sibling, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-03 14:10 UTC (permalink / raw)
  To: David Laight
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Thu, Jul 03, 2025 at 01:15:52PM +0100, David Laight wrote:
> On Thu, 3 Jul 2025 13:39:57 +0300
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> 
> > On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:
> > > On Tue,  1 Jul 2025 12:58:31 +0300
> > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > >   
> > > > Extract memcpy and memset functions from copy_user_generic() and
> > > > __clear_user().
> > > > 
> > > > They can be used as inline memcpy and memset instead of the GCC builtins
> > > > whenever necessary. LASS requires them to handle text_poke.  
> > > 
> > > Except they contain the fault handlers so aren't generic calls.  
> > 
> > That's true. I will add a comment to clarify it.
> 
> They need renaming.

__inline_memcpy/memset_safe()?

> ...
> > > > diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> > > > index a508e4a8c66a..47b613690f84 100644
> > > > --- a/arch/x86/lib/clear_page_64.S
> > > > +++ b/arch/x86/lib/clear_page_64.S
> > > > @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
> > > >  EXPORT_SYMBOL_GPL(clear_page_erms)
> > > >  
> > > >  /*
> > > > - * Default clear user-space.
> > > > + * Default memset.
> > > >   * Input:
> > > >   * rdi destination
> > > > + * rsi scratch
> > > >   * rcx count
> > > > - * rax is zero
> > > > + * al is value
> > > >   *
> > > >   * Output:
> > > >   * rcx: uncleared bytes or 0 if successful.
> > > > + * rdx: clobbered
> > > >   */
> > > >  SYM_FUNC_START(rep_stos_alternative)
> > > >  	ANNOTATE_NOENDBR
> > > > +
> > > > +	movzbq %al, %rsi
> > > > +	movabs $0x0101010101010101, %rax
> > > > +
> > > > +	/* RDX:RAX = RAX * RSI */
> > > > +	mulq %rsi  
> > > 
> > > NAK - you can't do that here.
> > > Neither %rsi nor %rdx can be trashed.
> > > The function has a very explicit calling convention.  
> > 
> > What calling convention? We change the only caller to confirm to this.
> 
> The one that is implicit in:
> 
> > > > +	asm volatile("1:\n\t"
> > > > +		     ALT_64("rep stosb",
> > > > +			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> > > > +		     "2:\n\t"
> > > > +		     _ASM_EXTABLE_UA(1b, 2b)
> > > > +		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> > > > +		     : "a" ((uint8_t)v)
> 
> The called function is only allowed to change the registers that
> 'rep stosb' uses - except it can access (but not change)
> all of %rax - not just %al.
> 
> See: https://godbolt.org/z/3fnrT3x9r
> In particular note that 'do_mset' must not change %rax.
> 
> This is very specific and is done so that the compiler can use
> all the registers.

Okay, I see what you are saying.

> > > It is also almost certainly a waste of time.
> > > Pretty much all the calls will be for a constant 0x00.
> > > Rename it all memzero() ...  
> > 
> > text_poke_memset() is not limited to zeroing.
> 
> But you don't want the overhead of extending the constant
> on all the calls - never mind reserving %rdx to do it.
> Maybe define a function that requires the caller to have
> done the 'dirty work' - so any code that wants memzero()
> just passes zero.
> Or do the multiply in the C code where it will get optimised
> away for constant zero.
> You do get the multiply for the 'rep stosb' case - but that
> is always going to be true unless you complicate things further.  

The patch below seems to do the trick: compiler optimizes out the
multiplication for v == 0.

It would be nice to avoid it for X86_FEATURE_FSRM, but we cannot use
cpu_feature_enabled() here as <asm/cpufeature.h> depends on
<asm/string.h>.

I cannot say I like the result.

Any suggestions?

diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
index becb9ee3bc8a..c7644a6f426b 100644
--- a/arch/x86/include/asm/string.h
+++ b/arch/x86/include/asm/string.h
@@ -35,16 +35,27 @@ static __always_inline void *__inline_memcpy(void *to, const void *from, size_t
 
 static __always_inline void *__inline_memset(void *addr, int v, size_t len)
 {
+	unsigned long val = v;
 	void *ret = addr;
 
+	if (IS_ENABLED(CONFIG_X86_64)) {
+		/*
+		 * Fill all bytes by value in byte 0.
+		 *
+		 * To be used in rep_stos_alternative()i
+		 */
+		val &= 0xff;
+		val *= 0x0101010101010101;
+	}
+
 	asm volatile("1:\n\t"
 		     ALT_64("rep stosb",
 			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
 		     "2:\n\t"
 		     _ASM_EXTABLE_UA(1b, 2b)
 		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
-		     : "a" (v)
-		     : "memory", _ASM_SI, _ASM_DX);
+		     : "a" (val)
+		     : "memory");
 
 	return ret + len;
 }
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 47b613690f84..3ef7d796deb3 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -58,23 +58,15 @@ EXPORT_SYMBOL_GPL(clear_page_erms)
  * Default memset.
  * Input:
  * rdi destination
- * rsi scratch
  * rcx count
  * al is value
  *
  * Output:
  * rcx: uncleared bytes or 0 if successful.
- * rdx: clobbered
  */
 SYM_FUNC_START(rep_stos_alternative)
 	ANNOTATE_NOENDBR
 
-	movzbq %al, %rsi
-	movabs $0x0101010101010101, %rax
-
-	/* RDX:RAX = RAX * RSI */
-	mulq %rsi
-
 	cmpq $64,%rcx
 	jae .Lunrolled
 
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-03 13:33         ` Vegard Nossum
@ 2025-07-03 16:52           ` David Laight
  0 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2025-07-03 16:52 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin,
	Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On Thu, 3 Jul 2025 15:33:16 +0200
Vegard Nossum <vegard.nossum@oracle.com> wrote:

> On 03/07/2025 14:15, David Laight wrote:
> > On Thu, 3 Jul 2025 13:39:57 +0300
> > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:  
> >> On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:  
> >>> On Tue,  1 Jul 2025 12:58:31 +0300
> >>> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:  
> >>>> diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> >>>> index a508e4a8c66a..47b613690f84 100644
> >>>> --- a/arch/x86/lib/clear_page_64.S
> >>>> +++ b/arch/x86/lib/clear_page_64.S
> >>>> @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
> >>>>   EXPORT_SYMBOL_GPL(clear_page_erms)
> >>>>   
> >>>>   /*
> >>>> - * Default clear user-space.
> >>>> + * Default memset.
> >>>>    * Input:
> >>>>    * rdi destination
> >>>> + * rsi scratch
> >>>>    * rcx count
> >>>> - * rax is zero
> >>>> + * al is value
> >>>>    *
> >>>>    * Output:
> >>>>    * rcx: uncleared bytes or 0 if successful.
> >>>> + * rdx: clobbered
> >>>>    */
> >>>>   SYM_FUNC_START(rep_stos_alternative)
> >>>>   	ANNOTATE_NOENDBR
> >>>> +
> >>>> +	movzbq %al, %rsi
> >>>> +	movabs $0x0101010101010101, %rax
> >>>> +
> >>>> +	/* RDX:RAX = RAX * RSI */
> >>>> +	mulq %rsi  
> >>>
> >>> NAK - you can't do that here.
> >>> Neither %rsi nor %rdx can be trashed.
> >>> The function has a very explicit calling convention.  
> 
> That's why we have the clobbers... see below
> 
> >> What calling convention? We change the only caller to confirm to this.  
> > 
> > The one that is implicit in:
> >   
> >>>> +	asm volatile("1:\n\t"
> >>>> +		     ALT_64("rep stosb",
> >>>> +			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> >>>> +		     "2:\n\t"
> >>>> +		     _ASM_EXTABLE_UA(1b, 2b)
> >>>> +		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> >>>> +		     : "a" ((uint8_t)v)  
> > 
> > The called function is only allowed to change the registers that
> > 'rep stosb' uses - except it can access (but not change)
> > all of %rax - not just %al.
> > 
> > See: https://godbolt.org/z/3fnrT3x9r
> > In particular note that 'do_mset' must not change %rax.
> > 
> > This is very specific and is done so that the compiler can use
> > all the registers.  
> 
> I feel like you trimmed off the clobbers from the asm() in the context
> above. For reference, it is:
> 
> +		     : "memory", _ASM_SI, _ASM_DX);

I'm sure they weren't there...

Enough clobbers will 'un-break' it - but that isn't the point.
Linux will reject the patch if he reads it.
The whole point about the function is that it is as direct a replacement
for 'rep stos/movsb' as possible.  

> 
> I'm not saying this can't be optimized, but that doesn't seem to be your
> complaint -- you say "the called function is only allowed to change
> ...", but this is not true when we have the clobbers, right?

You can't change %rax either - not without a clobber.

Oh, and even with your version you only clobbers for %rax and %rdx.
There is no need to use both %rsi and %rdx.

The performance is a different problem.
And the extra clobbers are likely to matter.
x86 really doesn't have many registers.

	David

> 
> This is exactly what I fixed with my v7 fixlet to this patch:
> 
> https://lore.kernel.org/all/1b96b0ca-5c14-4271-86c1-c305bf052b16@oracle.com/
> 
> 
> Vegard


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-03 14:10         ` Kirill A. Shutemov
@ 2025-07-03 17:02           ` David Laight
  0 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2025-07-03 17:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Thu, 3 Jul 2025 17:10:34 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> On Thu, Jul 03, 2025 at 01:15:52PM +0100, David Laight wrote:
> > On Thu, 3 Jul 2025 13:39:57 +0300
> > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> >   
> > > On Thu, Jul 03, 2025 at 09:44:17AM +0100, David Laight wrote:  
> > > > On Tue,  1 Jul 2025 12:58:31 +0300
> > > > "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> > > >     
> > > > > Extract memcpy and memset functions from copy_user_generic() and
> > > > > __clear_user().
> > > > > 
> > > > > They can be used as inline memcpy and memset instead of the GCC builtins
> > > > > whenever necessary. LASS requires them to handle text_poke.    
> > > > 
> > > > Except they contain the fault handlers so aren't generic calls.    
> > > 
> > > That's true. I will add a comment to clarify it.  
> > 
> > They need renaming.  
> 
> __inline_memcpy/memset_safe()?

'safe' against what :-)
They can't be used for user accesses without access_ok() and clac.
The get/put_user variants without access_ok() have _unsafe() suffix.

> 
> > ...  
> > > > > diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> > > > > index a508e4a8c66a..47b613690f84 100644
> > > > > --- a/arch/x86/lib/clear_page_64.S
> > > > > +++ b/arch/x86/lib/clear_page_64.S
> > > > > @@ -55,17 +55,26 @@ SYM_FUNC_END(clear_page_erms)
> > > > >  EXPORT_SYMBOL_GPL(clear_page_erms)
> > > > >  
> > > > >  /*
> > > > > - * Default clear user-space.
> > > > > + * Default memset.
> > > > >   * Input:
> > > > >   * rdi destination
> > > > > + * rsi scratch
> > > > >   * rcx count
> > > > > - * rax is zero
> > > > > + * al is value
> > > > >   *
> > > > >   * Output:
> > > > >   * rcx: uncleared bytes or 0 if successful.
> > > > > + * rdx: clobbered
> > > > >   */
> > > > >  SYM_FUNC_START(rep_stos_alternative)
> > > > >  	ANNOTATE_NOENDBR
> > > > > +
> > > > > +	movzbq %al, %rsi
> > > > > +	movabs $0x0101010101010101, %rax
> > > > > +
> > > > > +	/* RDX:RAX = RAX * RSI */
> > > > > +	mulq %rsi    
> > > > 
> > > > NAK - you can't do that here.
> > > > Neither %rsi nor %rdx can be trashed.
> > > > The function has a very explicit calling convention.    
> > > 
> > > What calling convention? We change the only caller to confirm to this.  
> > 
> > The one that is implicit in:
> >   
> > > > > +	asm volatile("1:\n\t"
> > > > > +		     ALT_64("rep stosb",
> > > > > +			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
> > > > > +		     "2:\n\t"
> > > > > +		     _ASM_EXTABLE_UA(1b, 2b)
> > > > > +		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> > > > > +		     : "a" ((uint8_t)v)  
> > 
> > The called function is only allowed to change the registers that
> > 'rep stosb' uses - except it can access (but not change)
> > all of %rax - not just %al.
> > 
> > See: https://godbolt.org/z/3fnrT3x9r
> > In particular note that 'do_mset' must not change %rax.
> > 
> > This is very specific and is done so that the compiler can use
> > all the registers.  
> 
> Okay, I see what you are saying.
> 
> > > > It is also almost certainly a waste of time.
> > > > Pretty much all the calls will be for a constant 0x00.
> > > > Rename it all memzero() ...    
> > > 
> > > text_poke_memset() is not limited to zeroing.  
> > 
> > But you don't want the overhead of extending the constant
> > on all the calls - never mind reserving %rdx to do it.
> > Maybe define a function that requires the caller to have
> > done the 'dirty work' - so any code that wants memzero()
> > just passes zero.
> > Or do the multiply in the C code where it will get optimised
> > away for constant zero.
> > You do get the multiply for the 'rep stosb' case - but that
> > is always going to be true unless you complicate things further.    
> 
> The patch below seems to do the trick: compiler optimizes out the
> multiplication for v == 0.
> 
> It would be nice to avoid it for X86_FEATURE_FSRM, but we cannot use
> cpu_feature_enabled() here as <asm/cpufeature.h> depends on
> <asm/string.h>.
> 
> I cannot say I like the result.
> 
> Any suggestions?
> 
> diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
> index becb9ee3bc8a..c7644a6f426b 100644
> --- a/arch/x86/include/asm/string.h
> +++ b/arch/x86/include/asm/string.h
> @@ -35,16 +35,27 @@ static __always_inline void *__inline_memcpy(void *to, const void *from, size_t
>  
>  static __always_inline void *__inline_memset(void *addr, int v, size_t len)
>  {
> +	unsigned long val = v;
>  	void *ret = addr;
>  
> +	if (IS_ENABLED(CONFIG_X86_64)) {
> +		/*
> +		 * Fill all bytes by value in byte 0.
> +		 *
> +		 * To be used in rep_stos_alternative()i
> +		 */
> +		val &= 0xff;
> +		val *= 0x0101010101010101;
> +	}

That won't compile for 32bit, and it needs the same thing done.
	val *= (unsigned long)0x0101010101010101ull;
should work.
I don't think you need the 'val &= 0xff', just rely on the caller
passing a valid value - nothing will break badly if it doesn't.

	David

> +
>  	asm volatile("1:\n\t"
>  		     ALT_64("rep stosb",
>  			    "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
>  		     "2:\n\t"
>  		     _ASM_EXTABLE_UA(1b, 2b)
>  		     : "+c" (len), "+D" (addr), ASM_CALL_CONSTRAINT
> -		     : "a" (v)
> -		     : "memory", _ASM_SI, _ASM_DX);
> +		     : "a" (val)
> +		     : "memory");
>  
>  	return ret + len;
>  }
> diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
> index 47b613690f84..3ef7d796deb3 100644
> --- a/arch/x86/lib/clear_page_64.S
> +++ b/arch/x86/lib/clear_page_64.S
> @@ -58,23 +58,15 @@ EXPORT_SYMBOL_GPL(clear_page_erms)
>   * Default memset.
>   * Input:
>   * rdi destination
> - * rsi scratch
>   * rcx count
>   * al is value
>   *
>   * Output:
>   * rcx: uncleared bytes or 0 if successful.
> - * rdx: clobbered
>   */
>  SYM_FUNC_START(rep_stos_alternative)
>  	ANNOTATE_NOENDBR
>  
> -	movzbq %al, %rsi
> -	movabs $0x0101010101010101, %rax
> -
> -	/* RDX:RAX = RAX * RSI */
> -	mulq %rsi
> -
>  	cmpq $64,%rcx
>  	jae .Lunrolled
>  


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-01  9:58 ` [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset Kirill A. Shutemov
  2025-07-03  8:44   ` David Laight
@ 2025-07-03 17:13   ` Dave Hansen
  2025-07-04  9:04     ` Kirill A. Shutemov
  2025-07-06  9:13     ` David Laight
  1 sibling, 2 replies; 60+ messages in thread
From: Dave Hansen @ 2025-07-03 17:13 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/1/25 02:58, Kirill A. Shutemov wrote:
> Extract memcpy and memset functions from copy_user_generic() and
> __clear_user().
> 
> They can be used as inline memcpy and memset instead of the GCC builtins
> whenever necessary. LASS requires them to handle text_poke.

Why are we messing with the normal user copy functions? Code reuse is
great, but as you're discovering, the user copy code is highly
specialized and not that easy to reuse for other things.

Don't we just need a dirt simple chunk of code that does (logically):

	stac();
	asm("rep stosq...");
	clac();

Performance doesn't matter for text poking, right? It could be stosq or
anything else that you can inline. It could be a for() loop for all I
care as long as the compiler doesn't transform it into some out-of-line
memset. Right?

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-03 11:31         ` Kirill A. Shutemov
@ 2025-07-03 20:12           ` Sohil Mehta
  2025-07-04  9:23             ` Kirill A. Shutemov
  0 siblings, 1 reply; 60+ messages in thread
From: Sohil Mehta @ 2025-07-03 20:12 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On 7/3/2025 4:31 AM, Kirill A. Shutemov wrote:

> 
> cond_local_irq_enable() need to happen if we want to do something
> sleepable during exception handling. It is not the case here.
> 

Makes sense.

> notify_die() will be called die_addr()->__die_body()->notify_die().

The notify_die() within __die_body() is slightly different from the one
called from the exception handlers.

__die_body():
notify_die(DIE_OOPS, str, regs, err, current->thread.trap_nr, SIGSEGV)

exc_alignment_check():
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS)

I believe we should include a #SS specific notify before calling
die_addr(). Similar to exc_alignment_check() which dies on kernel
exceptions.


if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_SS, SIGBUS) ==
NOTIFY_STOP)
	return;


This way the behavior remains consistent with other exception handlers
as well as with/without LASS.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-03 17:13   ` Dave Hansen
@ 2025-07-04  9:04     ` Kirill A. Shutemov
  2025-07-06  9:13     ` David Laight
  1 sibling, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-04  9:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Thu, Jul 03, 2025 at 10:13:44AM -0700, Dave Hansen wrote:
> On 7/1/25 02:58, Kirill A. Shutemov wrote:
> > Extract memcpy and memset functions from copy_user_generic() and
> > __clear_user().
> > 
> > They can be used as inline memcpy and memset instead of the GCC builtins
> > whenever necessary. LASS requires them to handle text_poke.
> 
> Why are we messing with the normal user copy functions? Code reuse is
> great, but as you're discovering, the user copy code is highly
> specialized and not that easy to reuse for other things.
> 
> Don't we just need a dirt simple chunk of code that does (logically):
> 
> 	stac();
> 	asm("rep stosq...");
> 	clac();
> 
> Performance doesn't matter for text poking, right? It could be stosq or
> anything else that you can inline. It could be a for() loop for all I
> care as long as the compiler doesn't transform it into some out-of-line
> memset. Right?

Yeah, performance doesn't matter for text poking. And this approach
simplifies the code quite a bit. I will use direct asm() for text poke.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-03 20:12           ` Sohil Mehta
@ 2025-07-04  9:23             ` Kirill A. Shutemov
  0 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-04  9:23 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Thu, Jul 03, 2025 at 01:12:11PM -0700, Sohil Mehta wrote:
> I believe we should include a #SS specific notify before calling
> die_addr(). Similar to exc_alignment_check() which dies on kernel
> exceptions.

You are right. Will update this.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization
  2025-07-02 10:05     ` Kirill A. Shutemov
@ 2025-07-04 12:23       ` Kirill A. Shutemov
  0 siblings, 0 replies; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-04 12:23 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Wed, Jul 02, 2025 at 01:05:23PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 01, 2025 at 04:10:19PM -0700, Dave Hansen wrote:
> > On 7/1/25 02:58, Kirill A. Shutemov wrote:
> > > Move CR pinning setup behind the EFI initialization.
> > 
> > I kinda grumble about these one-off solutions. Could we just do this
> > once and for all and defer CR pinning as long as possible? For instance,
> > could we do it in a late_initcall()?
> > 
> > Do we need pinning before userspace comes up?
> 
> Hm. I operated from an assumption that we want to pin control registers as
> early as possible to get most benefit from it.
> 
> I guess we can defer it until later. But I am not sure late_initcall() is
> the right place. Do we want random driver to twiddle control registers?

I will do it in core_initcall().

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-03 17:13   ` Dave Hansen
  2025-07-04  9:04     ` Kirill A. Shutemov
@ 2025-07-06  9:13     ` David Laight
  2025-07-07  8:02       ` Kirill A. Shutemov
  1 sibling, 1 reply; 60+ messages in thread
From: David Laight @ 2025-07-06  9:13 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin,
	Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On Thu, 3 Jul 2025 10:13:44 -0700
Dave Hansen <dave.hansen@intel.com> wrote:

> On 7/1/25 02:58, Kirill A. Shutemov wrote:
> > Extract memcpy and memset functions from copy_user_generic() and
> > __clear_user().
> > 
> > They can be used as inline memcpy and memset instead of the GCC builtins
> > whenever necessary. LASS requires them to handle text_poke.  
> 
> Why are we messing with the normal user copy functions? Code reuse is
> great, but as you're discovering, the user copy code is highly
> specialized and not that easy to reuse for other things.
> 
> Don't we just need a dirt simple chunk of code that does (logically):
> 
> 	stac();
> 	asm("rep stosq...");
> 	clac();
> 
> Performance doesn't matter for text poking, right? It could be stosq or
> anything else that you can inline. It could be a for() loop for all I
> care as long as the compiler doesn't transform it into some out-of-line
> memset. Right?
> 

It doesn't even really matter if there is an out-of-line memset.
All you need to do is 'teach' objtool it isn't a problem.

Is this for the boot-time asm-alternatives?
In that case I wonder why a 'low' address is being used?
With LASS enabled using a low address on a life kernel would make it
harder for another cpu to leverage the writable code page, but
that isn't a requirement of LASS.

If it is being used for later instruction patching you need the
very careful instruction sequences and cpu synchronisation.
In that case I suspect you need to add conditional stac/clac
to the existing patching code (and teach objtool it is all ok).

	David

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-02  2:06     ` H. Peter Anvin
  2025-07-02 10:17       ` Kirill A. Shutemov
  2025-07-02 23:42       ` Andrew Cooper
@ 2025-07-06  9:22       ` David Laight
  2025-07-06 15:07         ` H. Peter Anvin
  2 siblings, 1 reply; 60+ messages in thread
From: David Laight @ 2025-07-06  9:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sohil Mehta, Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin,
	Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On Tue, 01 Jul 2025 19:06:10 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:
...
> Note: for a FRED system, ERETU can generate #SS for a non-canonical user space
> RSP even in the absence of LASS, so if that is not currently handled that is an active bug.

Is that a fault in kernel space, or a fault in user space. 

Some of the traps for 'iret' happen after the transition to user space,
so the kernel doesn't have to handle them as special cases.
(I simplified (and fixed) one version of that code.)

	David


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS
  2025-07-06  9:22       ` David Laight
@ 2025-07-06 15:07         ` H. Peter Anvin
  0 siblings, 0 replies; 60+ messages in thread
From: H. Peter Anvin @ 2025-07-06 15:07 UTC (permalink / raw)
  To: David Laight
  Cc: Sohil Mehta, Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin,
	Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On July 6, 2025 2:22:13 AM PDT, David Laight <david.laight.linux@gmail.com> wrote:
>On Tue, 01 Jul 2025 19:06:10 -0700
>"H. Peter Anvin" <hpa@zytor.com> wrote:
>...
>> Note: for a FRED system, ERETU can generate #SS for a non-canonical user space
>> RSP even in the absence of LASS, so if that is not currently handled that is an active bug.
>
>Is that a fault in kernel space, or a fault in user space. 
>
>Some of the traps for 'iret' happen after the transition to user space,
>so the kernel doesn't have to handle them as special cases.
>(I simplified (and fixed) one version of that code.)
>
>	David
>

It's a fault in user space. I had a brain fart and managed to forget that RSP is technically a GPR and as such is not limited to the VA width of the machine.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-06  9:13     ` David Laight
@ 2025-07-07  8:02       ` Kirill A. Shutemov
  2025-07-07  9:33         ` David Laight
  0 siblings, 1 reply; 60+ messages in thread
From: Kirill A. Shutemov @ 2025-07-07  8:02 UTC (permalink / raw)
  To: David Laight
  Cc: Dave Hansen, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin,
	Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On Sun, Jul 06, 2025 at 10:13:42AM +0100, David Laight wrote:
> On Thu, 3 Jul 2025 10:13:44 -0700
> Dave Hansen <dave.hansen@intel.com> wrote:
> 
> > On 7/1/25 02:58, Kirill A. Shutemov wrote:
> > > Extract memcpy and memset functions from copy_user_generic() and
> > > __clear_user().
> > > 
> > > They can be used as inline memcpy and memset instead of the GCC builtins
> > > whenever necessary. LASS requires them to handle text_poke.  
> > 
> > Why are we messing with the normal user copy functions? Code reuse is
> > great, but as you're discovering, the user copy code is highly
> > specialized and not that easy to reuse for other things.
> > 
> > Don't we just need a dirt simple chunk of code that does (logically):
> > 
> > 	stac();
> > 	asm("rep stosq...");
> > 	clac();
> > 
> > Performance doesn't matter for text poking, right? It could be stosq or
> > anything else that you can inline. It could be a for() loop for all I
> > care as long as the compiler doesn't transform it into some out-of-line
> > memset. Right?
> > 
> 
> It doesn't even really matter if there is an out-of-line memset.
> All you need to do is 'teach' objtool it isn't a problem.

PeterZ was not fan of the idead;

https://lore.kernel.org/all/20241029113611.GS14555@noisy.programming.kicks-ass.net/

> Is this for the boot-time asm-alternatives?

Not only boot-time. static_branches are switchable at runtime.

> In that case I wonder why a 'low' address is being used?
> With LASS enabled using a low address on a life kernel would make it
> harder for another cpu to leverage the writable code page, but
> that isn't a requirement of LASS.

Because kernel side of address space is shared across all CPU and we don't
want kernel code to be writable to all CPUs

> If it is being used for later instruction patching you need the
> very careful instruction sequences and cpu synchronisation.
> In that case I suspect you need to add conditional stac/clac
> to the existing patching code (and teach objtool it is all ok).

STAC/CLAC is conditional in text poke on LASS presence on the machine.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset
  2025-07-07  8:02       ` Kirill A. Shutemov
@ 2025-07-07  9:33         ` David Laight
  0 siblings, 0 replies; 60+ messages in thread
From: David Laight @ 2025-07-07  9:33 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin,
	Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On Mon, 7 Jul 2025 11:02:06 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> On Sun, Jul 06, 2025 at 10:13:42AM +0100, David Laight wrote:
> > On Thu, 3 Jul 2025 10:13:44 -0700
> > Dave Hansen <dave.hansen@intel.com> wrote:
> >   
> > > On 7/1/25 02:58, Kirill A. Shutemov wrote:  
> > > > Extract memcpy and memset functions from copy_user_generic() and
> > > > __clear_user().
> > > > 
> > > > They can be used as inline memcpy and memset instead of the GCC builtins
> > > > whenever necessary. LASS requires them to handle text_poke.    
> > > 
> > > Why are we messing with the normal user copy functions? Code reuse is
> > > great, but as you're discovering, the user copy code is highly
> > > specialized and not that easy to reuse for other things.
> > > 
> > > Don't we just need a dirt simple chunk of code that does (logically):
> > > 
> > > 	stac();
> > > 	asm("rep stosq...");
> > > 	clac();
> > > 
> > > Performance doesn't matter for text poking, right? It could be stosq or
> > > anything else that you can inline. It could be a for() loop for all I
> > > care as long as the compiler doesn't transform it into some out-of-line
> > > memset. Right?
> > >   
> > 
> > It doesn't even really matter if there is an out-of-line memset.
> > All you need to do is 'teach' objtool it isn't a problem.  
> 
> PeterZ was not fan of the idead;
> 
> https://lore.kernel.org/all/20241029113611.GS14555@noisy.programming.kicks-ass.net/
> 
> > Is this for the boot-time asm-alternatives?  
> 
> Not only boot-time. static_branches are switchable at runtime.
> 
> > In that case I wonder why a 'low' address is being used?
> > With LASS enabled using a low address on a life kernel would make it
> > harder for another cpu to leverage the writable code page, but
> > that isn't a requirement of LASS.  
> 
> Because kernel side of address space is shared across all CPU and we don't
> want kernel code to be writable to all CPUs

So, as I said, it isn't a requirement for LASS.
Just something that LASS lets you do.
Although I'm sure there will be some odd effect of putting a 'supervisor'
page in the middle of 'user' pages.

Isn't there also (something like) kmap_local_page() that updates the local
page tables but doesn't broadcast the change?

> 
> > If it is being used for later instruction patching you need the
> > very careful instruction sequences and cpu synchronisation.
> > In that case I suspect you need to add conditional stac/clac
> > to the existing patching code (and teach objtool it is all ok).  
> 
> STAC/CLAC is conditional in text poke on LASS presence on the machine.

So just change the code to use byte copy loops with a volatile
destination pointer and all will be fine.

	David
 


^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2025-07-07  9:33 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01  9:58 [PATCHv8 00/17] x86: Enable Linear Address Space Separation support Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 01/17] x86/cpu: Enumerate the LASS feature bits Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 02/17] x86/asm: Introduce inline memcpy and memset Kirill A. Shutemov
2025-07-03  8:44   ` David Laight
2025-07-03 10:39     ` Kirill A. Shutemov
2025-07-03 12:15       ` David Laight
2025-07-03 13:33         ` Vegard Nossum
2025-07-03 16:52           ` David Laight
2025-07-03 14:10         ` Kirill A. Shutemov
2025-07-03 17:02           ` David Laight
2025-07-03 17:13   ` Dave Hansen
2025-07-04  9:04     ` Kirill A. Shutemov
2025-07-06  9:13     ` David Laight
2025-07-07  8:02       ` Kirill A. Shutemov
2025-07-07  9:33         ` David Laight
2025-07-01  9:58 ` [PATCHv8 03/17] x86/alternatives: Disable LASS when patching kernel alternatives Kirill A. Shutemov
2025-07-01 18:44   ` Sohil Mehta
2025-07-01  9:58 ` [PATCHv8 04/17] x86/cpu: Defer CR pinning setup until after EFI initialization Kirill A. Shutemov
2025-07-01 19:03   ` Sohil Mehta
2025-07-02  9:47     ` Kirill A. Shutemov
2025-07-01 23:10   ` Dave Hansen
2025-07-02 10:05     ` Kirill A. Shutemov
2025-07-04 12:23       ` Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 05/17] efi: Disable LASS around set_virtual_address_map() EFI call Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 06/17] x86/vsyscall: Do not require X86_PF_INSTR to emulate vsyscall Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 07/17] x86/vsyscall: Reorganize the #PF emulation code Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 08/17] x86/traps: Consolidate user fixups in exc_general_protection() Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 09/17] x86/vsyscall: Add vsyscall emulation for #GP Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 10/17] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 11/17] x86/cpu: Set LASS CR4 bit as pinning sensitive Kirill A. Shutemov
2025-07-01 22:51   ` Sohil Mehta
2025-07-01  9:58 ` [PATCHv8 12/17] x86/traps: Communicate a LASS violation in #GP message Kirill A. Shutemov
2025-07-02  0:36   ` Sohil Mehta
2025-07-02 10:10     ` Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 13/17] x86/traps: Generalize #GP address decode and hint code Kirill A. Shutemov
2025-07-02  0:54   ` Sohil Mehta
2025-07-01  9:58 ` [PATCHv8 14/17] x86/traps: Handle LASS thrown #SS Kirill A. Shutemov
2025-07-02  1:35   ` Sohil Mehta
2025-07-02  2:00     ` H. Peter Anvin
2025-07-02  2:06     ` H. Peter Anvin
2025-07-02 10:17       ` Kirill A. Shutemov
2025-07-02 14:37         ` H. Peter Anvin
2025-07-02 14:47           ` Kirill A. Shutemov
2025-07-02 17:10             ` H. Peter Anvin
2025-07-02 23:42       ` Andrew Cooper
2025-07-03  0:44         ` H. Peter Anvin
2025-07-06  9:22       ` David Laight
2025-07-06 15:07         ` H. Peter Anvin
2025-07-02 13:27     ` Kirill A. Shutemov
2025-07-02 17:56       ` Sohil Mehta
2025-07-03 10:40         ` Kirill A. Shutemov
2025-07-02 20:05       ` Sohil Mehta
2025-07-03 11:31         ` Kirill A. Shutemov
2025-07-03 20:12           ` Sohil Mehta
2025-07-04  9:23             ` Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 15/17] x86/cpu: Make LAM depend on LASS Kirill A. Shutemov
2025-07-01 23:03   ` Sohil Mehta
2025-07-01  9:58 ` [PATCHv8 16/17] x86/cpu: Enable LASS during CPU initialization Kirill A. Shutemov
2025-07-01  9:58 ` [PATCHv8 17/17] x86: Re-enable Linear Address Masking Kirill A. Shutemov
2025-07-01 23:13   ` Sohil Mehta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).