linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 00/15] x86: Enable Linear Address Space Separation support
@ 2025-10-07  6:51 Sohil Mehta
  2025-10-07  6:51 ` [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits Sohil Mehta
                   ` (15 more replies)
  0 siblings, 16 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

Linear Address Space Separation (LASS) is a security feature [1] that
works pre-paging to prevent a class of side-channel attacks that rely on
speculative access across the user/kernel boundary.

Change of personnel
-------------------
I am picking up this series from Kiryl. The patches have switched hands
multiple times over the last couple of years. I have refreshed the
commit tags since most of the patches have gone around a full circle.

Many thanks to Kiryl and Alex for taking these patches forward. Would
highly appreciate your reviews tags on the updated series.

Changes in v10
--------------
- Use the simplified versions of inline memcpy/memset (patch 2)
- New patch to fix an issue during Kexec relocate kernel (patch 7)
- Dropped the LAM re-enabling patch (will post separately)
- Reworded some of the commit messages
- Minor updates to code formatting and code comments

v9: https://lore.kernel.org/lkml/20250707080317.3791624-1-kirill.shutemov@linux.intel.com/

Patch structure
---------------
Patch     1: Enumerate LASS
Patch   2-3: Update text poking
Patch   4-5: CR pinning changes
Patch   6-7: Update EFI and kexec flows
Patch  8-11: Vsyscall impact
Patch 12-14: LASS hints during #GP and #SS
Patch    15: Enable LASS

The series is maturing, as reflected by the limited incremental changes.
Please consider providing review tags/acks for patches that seem ready.

Background
----------
Privilege mode based access protection already exists today with paging
and features such as SMEP and SMAP. However, to enforce these
protections, the processor must traverse the paging structures in
memory.  An attacker can use timing information resulting from this
traversal to determine details about the paging structures, and to
determine the layout of the kernel memory.

The LASS mechanism provides the same mode-based protections as paging,
but without traversing the paging structures. Because the protections
enforced by LASS are applied before paging, an attacker will not be able
to derive timing information from the various caching structures such as
the TLBs, mid-level caches, page walkers, data caches, etc. LASS can
prevent probing using double page faults, TLB flush and reload, and
software prefetch instructions. See [2], [3], and [4] for research
on the related attack vectors.

Though LASS was developed in response to Meltdown, in hindsight, it
alone could have mitigated Meltdown had it been available. In addition,
LASS prevents an attack vector targeting Linear Address Masking (LAM)
described in the Spectre LAM (SLAM) whitepaper [5].

LASS enforcement relies on the typical kernel implementation dividing
the 64-bit virtual address space into two halves:
  Addr[63]=0 -> User address space
  Addr[63]=1 -> Kernel address space
Any data access or code execution across address spaces typically
results in a #GP, with an #SS generated in some rare cases.

Kernel accesses
---------------
When there are valid reasons for the kernel to access memory in the user
half, it can temporarily suspend LASS enforcement by toggling the
RFLAGS.AC bit. Most of these cases are already covered today through the
stac()/clac() pairs, which avoid SMAP violations. However, there are
kernel usages, such as text poking, that access mappings (!_PAGE_USER)
in the lower half of the address space. LASS-specific AC bit toggling is
added for these cases.

There are a couple of cases where instruction fetches are done from a
lower address. Toggling the AC bit is not sufficient here because it
only manages data accesses. Therefore, CR4.LASS is modified in the case
of EFI set_virtual_address_map() and kexec relocate_kernel() to avoid
LASS violations. To let EFI modify CR4 during boot, CR pinning
enforcement is deferred until late_initcall().

Exception handling
------------------
With LASS enabled, NULL pointer dereferences generate a #GP instead of a
#PF. Due to the limited error information available during #GP, some of
the helpful hints would no longer be printed. The patches enchance the
#GP address decoding logic to identify LASS violations and NULL pointer
exceptions.

For example, two invalid userspace accesses would generate:
#PF (without LASS):
  BUG: kernel NULL pointer dereference, address: 0000000000000000
  BUG: unable to handle page fault for address: 0000000000100000

#GP (with LASS):
  Oops: general protection fault, kernel NULL pointer dereference 0x0: 0000
  Oops: general protection fault, probably LASS violation for address 0x100000: 0000

Similar debug hints are added for the #SS handling as well.

Userspace accesses
------------------
Userspace attempts to access any kernel address generate a #GP when LASS
is enabled. Unfortunately, legacy vsyscall functions are located in the
address range 0xffffffffff600000 - 0xffffffffff601000. Prior to LASS,
default access (XONLY) to the vsyscall page would generate a page fault
and the access would be emulated in the kernel. To avoid breaking user
applications when LASS is enabled, the patches extend vsyscall emulation
in XONLY mode to the #GP handler.

In contrast, the vsyscall EMULATE mode is deprecated and not expected to
be used by anyone. Supporting EMULATE mode with LASS would require
complex instruction decoding in the #GP fault handler, which is probably
not worth the effort. For now, LASS is disabled in the rare case when
someone absolutely needs to enable vsyscall=emulate via the command
line.

Links
-----
[1]: "Linear-Address Pre-Processing", Intel SDM (June 2025), Vol 3, Chapter 4.
[2]: "Practical Timing Side Channel Attacks against Kernel Space ASLR", https://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[3]: "Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR", http://doi.acm.org/10.1145/2976749.2978356
[4]: "Harmful prefetch on Intel", https://ioactive.com/harmful-prefetch-on-intel/ (H/T Anders)
[5]: "Spectre LAM", https://download.vusec.net/papers/slam_sp24.pdf


Alexander Shishkin (2):
  x86/efi: Disable LASS while mapping the EFI runtime services
  x86/traps: Communicate a LASS violation in #GP message

Kirill A. Shutemov (2):
  x86/traps: Generalize #GP address decode and hint code
  x86/traps: Provide additional hints for a kernel stack segment fault

Peter Zijlstra (Intel) (1):
  x86/asm: Introduce inline memcpy and memset

Sohil Mehta (9):
  x86/cpu: Enumerate the LASS feature bits
  x86/alternatives: Disable LASS when patching kernel alternatives
  x86/cpu: Defer CR pinning enforcement until late_initcall()
  x86/kexec: Disable LASS during relocate kernel
  x86/vsyscall: Reorganize the page fault emulation code
  x86/traps: Consolidate user fixups in exc_general_protection()
  x86/vsyscall: Add vsyscall emulation for #GP
  x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  x86/cpu: Enable LASS by default during CPU initialization

Yian Chen (1):
  x86/cpu: Set LASS CR4 bit as pinning sensitive

 .../admin-guide/kernel-parameters.txt         |   4 +-
 arch/x86/Kconfig.cpufeatures                  |   4 +
 arch/x86/entry/vsyscall/vsyscall_64.c         |  83 +++++++-----
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/smap.h                   |  35 ++++-
 arch/x86/include/asm/string.h                 |  26 ++++
 arch/x86/include/asm/vsyscall.h               |  13 +-
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/alternative.c                 |  18 ++-
 arch/x86/kernel/cpu/common.c                  |  30 ++--
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/dumpstack.c                   |   6 +-
 arch/x86/kernel/relocate_kernel_64.S          |   7 +-
 arch/x86/kernel/traps.c                       | 128 +++++++++++++-----
 arch/x86/kernel/umip.c                        |   3 +
 arch/x86/mm/fault.c                           |   2 +-
 arch/x86/platform/efi/efi.c                   |  14 +-
 17 files changed, 284 insertions(+), 93 deletions(-)


base-commit: fd94619c43360eb44d28bd3ef326a4f85c600a07
-- 
2.43.0


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 18:19   ` Edgecombe, Rick P
  2025-10-16 15:35   ` Borislav Petkov
  2025-10-07  6:51 ` [PATCH v10 02/15] x86/asm: Introduce inline memcpy and memset Sohil Mehta
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

Linear Address Space Separation (LASS) is a security feature that
intends to prevent malicious virtual address space accesses across
user/kernel mode.

Such mode based access protection already exists today with paging and
features such as SMEP and SMAP. However, to enforce these protections,
the processor must traverse the paging structures in memory. An attacker
can use timing information resulting from this traversal to determine
details about the paging structures, and to determine the layout of the
kernel memory.

LASS provides the same mode-based protections as paging but without
traversing the paging structures. Because the protections are enforced
pre-paging, an attacker will not be able to derive paging-based timing
information from the various caching structures such as the TLBs,
mid-level caches, page walker, data caches, etc.

LASS enforcement relies on the kernel implementation to divide the
64-bit virtual address space into two halves:
  Addr[63]=0 -> User address space
  Addr[63]=1 -> Kernel address space

Any data access or code execution across address spaces typically
results in a #GP fault. The LASS enforcement for kernel data accesses is
dependent on CR4.SMAP being set. The enforcement can be disabled by
toggling the RFLAGS.AC bit similar to SMAP.

Define the CPU feature bits to enumerate LASS and add a dependency on
SMAP.

LASS mitigates a class of side-channel speculative attacks, such as
Spectre LAM [1]. Add the "lass" flag to /proc/cpuinfo to indicate that
the feature is supported by hardware and enabled by the kernel.  This
allows userspace to determine if the system is secure against such
attacks.

Link: https://download.vusec.net/papers/slam_sp24.pdf [1]
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
---
v10:
 - Do not modify tools/**/cpufeatures.h as those are synced separately.
---
 arch/x86/Kconfig.cpufeatures                | 4 ++++
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 8 insertions(+)

diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures
index 250c10627ab3..733d5aff2456 100644
--- a/arch/x86/Kconfig.cpufeatures
+++ b/arch/x86/Kconfig.cpufeatures
@@ -124,6 +124,10 @@ config X86_DISABLED_FEATURE_PCID
 	def_bool y
 	depends on !X86_64
 
+config X86_DISABLED_FEATURE_LASS
+	def_bool y
+	depends on X86_32
+
 config X86_DISABLED_FEATURE_PKU
 	def_bool y
 	depends on !X86_INTEL_MEMORY_PROTECTION_KEYS
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b2a562217d3f..1283f3bdda0d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -314,6 +314,7 @@
 #define X86_FEATURE_SM4			(12*32+ 2) /* SM4 instructions */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* "avx_vnni" AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* "avx512_bf16" AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LASS		(12*32+ 6) /* "lass" Linear Address Space Separation */
 #define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* CMPccXADD instructions */
 #define X86_FEATURE_ARCH_PERFMON_EXT	(12*32+ 8) /* Intel Architectural PerfMon Extension */
 #define X86_FEATURE_FZRM		(12*32+10) /* Fast zero-length REP MOVSB */
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index f1a4adc78272..81d0c8bf1137 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -136,6 +136,8 @@
 #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
 #define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
 #define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_LASS_BIT	27 /* enable Linear Address Space Separation support */
+#define X86_CR4_LASS		_BITUL(X86_CR4_LASS_BIT)
 #define X86_CR4_LAM_SUP_BIT	28 /* LAM for supervisor pointers */
 #define X86_CR4_LAM_SUP		_BITUL(X86_CR4_LAM_SUP_BIT)
 
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 46efcbd6afa4..98d0cdd82574 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -89,6 +89,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_FRED,			X86_FEATURE_LKGS      },
 	{ X86_FEATURE_SPEC_CTRL_SSBD,		X86_FEATURE_SPEC_CTRL },
+	{ X86_FEATURE_LASS,			X86_FEATURE_SMAP      },
 	{}
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 02/15] x86/asm: Introduce inline memcpy and memset
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
  2025-10-07  6:51 ` [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-21 12:47   ` Borislav Petkov
  2025-10-07  6:51 ` [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

Provide inline memcpy and memset functions that can be used instead of
the GCC builtins when necessary. The immediate use case is for the text
poking functions to avoid the standard memcpy()/memset() calls within an
RFLAGS.AC=1 context.

Some user copy functions such as copy_user_generic() and __clear_user()
have similar rep_{movs,stos} usages. But, those are highly specialized
and hard to combine/reuse for other things. Define these new helpers for
all other usages that need a completely unoptimized, strictly inline
version of memcpy() or memset().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - Reintroduce the simpler inline patch (dropped in v8).
---
 arch/x86/include/asm/string.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
index c3c2c1914d65..9cb5aae7fba9 100644
--- a/arch/x86/include/asm/string.h
+++ b/arch/x86/include/asm/string.h
@@ -1,6 +1,32 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_STRING_H
+#define _ASM_X86_STRING_H
+
 #ifdef CONFIG_X86_32
 # include <asm/string_32.h>
 #else
 # include <asm/string_64.h>
 #endif
+
+static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
+{
+	void *ret = to;
+
+	asm volatile("rep movsb"
+		     : "+D" (to), "+S" (from), "+c" (len)
+		     : : "memory");
+	return ret;
+}
+
+static __always_inline void *__inline_memset(void *s, int v, size_t n)
+{
+	void *ret = s;
+
+	asm volatile("rep stosb"
+		     : "+D" (s), "+c" (n)
+		     : "a" ((uint8_t)v)
+		     : "memory");
+	return ret;
+}
+
+#endif /* _ASM_X86_STRING_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
  2025-10-07  6:51 ` [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits Sohil Mehta
  2025-10-07  6:51 ` [PATCH v10 02/15] x86/asm: Introduce inline memcpy and memset Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 16:55   ` Edgecombe, Rick P
  2025-10-21 20:03   ` Borislav Petkov
  2025-10-07  6:51 ` [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive Sohil Mehta
                   ` (12 subsequent siblings)
  15 siblings, 2 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

For patching, the kernel initializes a temporary mm area in the lower
half of the address range. Disable LASS enforcement by toggling the
RFLAGS.AC bit during patching to avoid triggering a #GP fault.

Introduce LASS-specific stac()/clac() helpers along with comments to
clarify their usage versus the existing stac()/clac() helpers for SMAP.

Text poking functions used while patching kernel alternatives use the
standard memcpy()/memset(). However, objtool complains about calling
dynamic functions within an AC=1 region.

One workaround is to add memcpy() and memset() to the list of functions
allowed by objtool. However, that would provide a blanket exemption for
all usages of memcpy() and memset().

Instead, replace the standard memcpy() and memset() calls in the text
poking functions with their unoptimized inline versions. Considering
that patching is usually small, there is no performance impact expected.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - Revert to the inline functions instead of open-coding in assembly.
 - Simplify code comments.
---
 arch/x86/include/asm/smap.h   | 35 +++++++++++++++++++++++++++++++++--
 arch/x86/kernel/alternative.c | 18 ++++++++++++++++--
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 4f84d421d1cf..3ecb4b0de1f9 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -23,18 +23,49 @@
 
 #else /* __ASSEMBLER__ */
 
+/*
+ * The CLAC/STAC instructions toggle the enforcement of X86_FEATURE_SMAP
+ * and X86_FEATURE_LASS.
+ *
+ * SMAP enforcement is based on the _PAGE_BIT_USER bit in the page
+ * tables. The kernel is not allowed to touch pages with the bit set
+ * unless the AC bit is set.
+ *
+ * Use stac()/clac() when accessing userspace (_PAGE_USER) mappings,
+ * regardless of location.
+ *
+ * Note: a barrier is implicit in alternative().
+ */
+
 static __always_inline void clac(void)
 {
-	/* Note: a barrier is implicit in alternative() */
 	alternative("", "clac", X86_FEATURE_SMAP);
 }
 
 static __always_inline void stac(void)
 {
-	/* Note: a barrier is implicit in alternative() */
 	alternative("", "stac", X86_FEATURE_SMAP);
 }
 
+/*
+ * LASS enforcement is based on bit 63 of the virtual address. The
+ * kernel is not allowed to touch memory in the lower half of the
+ * virtual address space unless the AC bit is set.
+ *
+ * Use lass_stac()/lass_clac() when accessing kernel mappings
+ * (!_PAGE_USER) in the lower half of the address space.
+ */
+
+static __always_inline void lass_clac(void)
+{
+	alternative("", "clac", X86_FEATURE_LASS);
+}
+
+static __always_inline void lass_stac(void)
+{
+	alternative("", "stac", X86_FEATURE_LASS);
+}
+
 static __always_inline unsigned long smap_save(void)
 {
 	unsigned long flags;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 79ae9cb50019..dc90b421d760 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2409,16 +2409,30 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
+/*
+ * Text poking creates and uses a mapping in the lower half of the
+ * address space. Relax LASS enforcement when accessing the poking
+ * address.
+ *
+ * Also, objtool enforces a strict policy of "no function calls within
+ * AC=1 regions". Adhere to the policy by using inline versions of
+ * memcpy()/memset() that will never result in a function call.
+ */
+
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
-	memcpy(dst, src, len);
+	lass_stac();
+	__inline_memcpy(dst, src, len);
+	lass_clac();
 }
 
 static void text_poke_memset(void *dst, const void *src, size_t len)
 {
 	int c = *(const int *)src;
 
-	memset(dst, c, len);
+	lass_stac();
+	__inline_memset(dst, c, len);
+	lass_clac();
 }
 
 typedef void text_poke_f(void *dst, const void *src, size_t len);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (2 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 18:24   ` Edgecombe, Rick P
  2025-10-07  6:51 ` [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall() Sohil Mehta
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

From: Yian Chen <yian.chen@intel.com>

Security features such as LASS are not expected to be disabled once
initialized. Add LASS to the CR4 pinned mask.

Signed-off-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - No change.
---
 arch/x86/kernel/cpu/common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c7d3512914ca..61ab332eaf73 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -403,7 +403,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
-					     X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED;
+					     X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED |
+					     X86_CR4_LASS;
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (3 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 17:23   ` Edgecombe, Rick P
  2025-10-17 19:28   ` Sohil Mehta
  2025-10-07  6:51 ` [PATCH v10 06/15] x86/efi: Disable LASS while mapping the EFI runtime services Sohil Mehta
                   ` (10 subsequent siblings)
  15 siblings, 2 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

Problem
-------
In order to map the EFI runtime services, set_virtual_address_map()
needs to be called, which resides in the lower half of the address
space. This means that LASS needs to be temporarily disabled around this
call.

Wrapping efi_enter_virtual_mode() with lass_stac()/clac() is not enough,
because the AC flag only gates data accesses, not instruction fetches.
Clearing the CR4.LASS bit is required to make this work.

However, pinned CR4 bits are not expected to be modified after
boot CPU init, resulting in a kernel warning.

Solution
--------
One option is to move the CR pinning setup immediately after the runtime
services have been mapped. However, that is a narrow fix that would
require revisiting if something else needs to modify a pinned CR bit.

CR pinning mainly prevents exploits from trivially modifying
security-sensitive CR bits. There is limited benefit to enabling CR
pinning before userspace comes up. Defer CR pinning enforcement until
late_initcall() to allow EFI and future users to modify the CR bits
without any concern for CR pinning.

Save the pinned bits while initializing the boot CPU because they are
needed later to program the value on APs when they come up.

Note
----
This introduces a small window between the boot CPU being initialized
and CR pinning being enforced, where any in-kernel clearing of the
pinned bits could go unnoticed. Later, when enforcement begins, a
warning is triggered as soon as any CR4 bit is modified, such as
X86_CR4_PGE during a TLB flush.

Currently, this is a purely theoretical concern. There are multiple ways
to resolve it [1] if it becomes a problem in practice.

Link: https://lore.kernel.org/lkml/c59aa7ac-62a6-45ec-b626-de518b25f7d9@intel.com/ [1]
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - Split recording pinned bits and enabling pinning into two functions.
 - Defer pinning until userspace comes up.

This patch does not include any changes to harden the CR pinning
implementation, as that is beyond the scope of this series.
---
 arch/x86/kernel/cpu/common.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 61ab332eaf73..57d5824465b0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -476,8 +476,8 @@ void cr4_init(void)
 
 	if (boot_cpu_has(X86_FEATURE_PCID))
 		cr4 |= X86_CR4_PCIDE;
-	if (static_branch_likely(&cr_pinning))
-		cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
+
+	cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
 
 	__write_cr4(cr4);
 
@@ -487,14 +487,21 @@ void cr4_init(void)
 
 /*
  * Once CPU feature detection is finished (and boot params have been
- * parsed), record any of the sensitive CR bits that are set, and
- * enable CR pinning.
+ * parsed), record any of the sensitive CR bits that are set.
  */
-static void __init setup_cr_pinning(void)
+static void __init record_cr_pinned_bits(void)
 {
 	cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
+}
+
+/* Enables enforcement of the CR pinned bits */
+static int __init enable_cr_pinning(void)
+{
 	static_key_enable(&cr_pinning.key);
+
+	return 0;
 }
+late_initcall(enable_cr_pinning);
 
 static __init int x86_nofsgsbase_setup(char *arg)
 {
@@ -2119,7 +2126,7 @@ static __init void identify_boot_cpu(void)
 	enable_sep_cpu();
 #endif
 	cpu_detect_tlb(&boot_cpu_data);
-	setup_cr_pinning();
+	record_cr_pinned_bits();
 
 	tsx_init();
 	tdx_init();
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 06/15] x86/efi: Disable LASS while mapping the EFI runtime services
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (4 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall() Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07  6:51 ` [PATCH v10 07/15] x86/kexec: Disable LASS during relocate kernel Sohil Mehta
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>

While mapping EFI runtime services, set_virtual_address_map() is called
at its lower mapping, which LASS prohibits. Wrapping the EFI call with
lass_stac()/clac() is not enough, because the AC flag only gates data
accesses, and not instruction fetches.

Use the big hammer and toggle the CR4.LASS bit to make this work.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - Reword code comments
---
 arch/x86/platform/efi/efi.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 463b784499a8..cc00a7e6599e 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -786,8 +786,8 @@ static void __init __efi_enter_virtual_mode(void)
 {
 	int count = 0, pg_shift = 0;
 	void *new_memmap = NULL;
+	unsigned long pa, lass;
 	efi_status_t status;
-	unsigned long pa;
 
 	if (efi_alloc_page_tables()) {
 		pr_err("Failed to allocate EFI page tables\n");
@@ -825,11 +825,23 @@ static void __init __efi_enter_virtual_mode(void)
 
 	efi_sync_low_kernel_mappings();
 
+	/*
+	 * LASS complains because set_virtual_address_map() is located
+	 * at a lower address. To pause enforcement, flipping RFLAGS.AC
+	 * is not sufficient here, as it only permits data access and
+	 * not instruction fetch. Disable the entire LASS mechanism.
+	 */
+	lass = cr4_read_shadow() & X86_CR4_LASS;
+	cr4_clear_bits(lass);
+
 	status = efi_set_virtual_address_map(efi.memmap.desc_size * count,
 					     efi.memmap.desc_size,
 					     efi.memmap.desc_version,
 					     (efi_memory_desc_t *)pa,
 					     efi_systab_phys);
+
+	cr4_set_bits(lass);
+
 	if (status != EFI_SUCCESS) {
 		pr_err("Unable to switch EFI into virtual mode (status=%lx)!\n",
 		       status);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 07/15] x86/kexec: Disable LASS during relocate kernel
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (5 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 06/15] x86/efi: Disable LASS while mapping the EFI runtime services Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 17:43   ` Edgecombe, Rick P
  2025-10-07  6:51 ` [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code Sohil Mehta
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

Relocate kernel uses identity mapping to copy the new kernel which
leads to an LASS violation. To avoid issues, disable LASS after the
original CR4 value has been saved but before jumping to the identity
mapped page.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - New patch to fix an issue detected during internal testing.
---
 arch/x86/kernel/relocate_kernel_64.S | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 11e20bb13aca..4ffba68dc57b 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -95,9 +95,12 @@ SYM_CODE_START_NOALIGN(relocate_kernel)
 	/* Leave CR4 in %r13 to enable the right paging mode later. */
 	movq	%cr4, %r13
 
-	/* Disable global pages immediately to ensure this mapping is RWX */
+	/*
+	 * Disable global pages immediately to ensure this mapping is RWX.
+	 * Disable LASS before jumping to the identity mapped page.
+	 */
 	movq	%r13, %r12
-	andq	$~(X86_CR4_PGE), %r12
+	andq	$~(X86_CR4_PGE | X86_CR4_LASS), %r12
 	movq	%r12, %cr4
 
 	/* Save %rsp and CRs. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (6 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 07/15] x86/kexec: Disable LASS during relocate kernel Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 18:37   ` Edgecombe, Rick P
  2025-10-07  6:51 ` [PATCH v10 09/15] x86/traps: Consolidate user fixups in exc_general_protection() Sohil Mehta
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

Separate out the actual vsyscall emulation from the #PF specific
handling in preparation for the upcoming #GP emulation.

No functional change intended.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
v10:
 - Modify the code flow slightly to make it easier to follow.
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 63 ++++++++++++++-------------
 arch/x86/include/asm/vsyscall.h       |  7 ++-
 arch/x86/mm/fault.c                   |  2 +-
 3 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 6e6c0a740837..4c3f49bf39e6 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -112,43 +112,13 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
 	}
 }
 
-bool emulate_vsyscall(unsigned long error_code,
-		      struct pt_regs *regs, unsigned long address)
+static bool __emulate_vsyscall(struct pt_regs *regs, unsigned long address)
 {
 	unsigned long caller;
 	int vsyscall_nr, syscall_nr, tmp;
 	long ret;
 	unsigned long orig_dx;
 
-	/* Write faults or kernel-privilege faults never get fixed up. */
-	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
-		return false;
-
-	/*
-	 * Assume that faults at regs->ip are because of an
-	 * instruction fetch. Return early and avoid
-	 * emulation for faults during data accesses:
-	 */
-	if (address != regs->ip) {
-		/* Failed vsyscall read */
-		if (vsyscall_mode == EMULATE)
-			return false;
-
-		/*
-		 * User code tried and failed to read the vsyscall page.
-		 */
-		warn_bad_vsyscall(KERN_INFO, regs, "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
-		return false;
-	}
-
-	/*
-	 * X86_PF_INSTR is only set when NX is supported.  When
-	 * available, use it to double-check that the emulation code
-	 * is only being used for instruction fetches:
-	 */
-	if (cpu_feature_enabled(X86_FEATURE_NX))
-		WARN_ON_ONCE(!(error_code & X86_PF_INSTR));
-
 	/*
 	 * No point in checking CS -- the only way to get here is a user mode
 	 * trap to a high address, which means that we're in 64-bit user code.
@@ -281,6 +251,37 @@ bool emulate_vsyscall(unsigned long error_code,
 	return true;
 }
 
+bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
+			 unsigned long address)
+{
+	/* Write faults or kernel-privilege faults never get fixed up. */
+	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
+		return false;
+
+	/*
+	 * Assume that faults at regs->ip are because of an instruction
+	 * fetch. Return early and avoid emulation for faults during
+	 * data accesses:
+	 */
+	if (address != regs->ip) {
+		 /* User code tried and failed to read the vsyscall page. */
+		if (vsyscall_mode != EMULATE)
+			warn_bad_vsyscall(KERN_INFO, regs, "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
+
+		return false;
+	}
+
+	/*
+	 * X86_PF_INSTR is only set when NX is supported.  When
+	 * available, use it to double-check that the emulation code
+	 * is only being used for instruction fetches:
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_NX))
+		WARN_ON_ONCE(!(error_code & X86_PF_INSTR));
+
+	return __emulate_vsyscall(regs, address);
+}
+
 /*
  * A pseudo VMA to allow ptrace access for the vsyscall page.  This only
  * covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index 472f0263dbc6..f34902364972 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -14,12 +14,11 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
  * Called on instruction fetch fault in vsyscall page.
  * Returns true if handled.
  */
-extern bool emulate_vsyscall(unsigned long error_code,
-			     struct pt_regs *regs, unsigned long address);
+bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs, unsigned long address);
 #else
 static inline void map_vsyscall(void) {}
-static inline bool emulate_vsyscall(unsigned long error_code,
-				    struct pt_regs *regs, unsigned long address)
+static inline bool emulate_vsyscall_pf(unsigned long error_code,
+				       struct pt_regs *regs, unsigned long address)
 {
 	return false;
 }
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 998bd807fc7b..fbcc2da75fd6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1316,7 +1316,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * to consider the PF_PK bit.
 	 */
 	if (is_vsyscall_vaddr(address)) {
-		if (emulate_vsyscall(error_code, regs, address))
+		if (emulate_vsyscall_pf(error_code, regs, address))
 			return;
 	}
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 09/15] x86/traps: Consolidate user fixups in exc_general_protection()
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (7 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 17:46   ` Edgecombe, Rick P
  2025-10-07  6:51 ` [PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP Sohil Mehta
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

Move the UMIP exception fixup along with the other user mode fixups,
that is, under the common "if (user_mode(regs))" condition where the
rest of the fixups reside.

No functional change intended.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
---
v10:
 - No change.
---
 arch/x86/kernel/traps.c | 8 +++-----
 arch/x86/kernel/umip.c  | 3 +++
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 36354b470590..25b45193eb19 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -800,11 +800,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 
 	cond_local_irq_enable(regs);
 
-	if (static_cpu_has(X86_FEATURE_UMIP)) {
-		if (user_mode(regs) && fixup_umip_exception(regs))
-			goto exit;
-	}
-
 	if (v8086_mode(regs)) {
 		local_irq_enable();
 		handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
@@ -819,6 +814,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 		if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
 			goto exit;
 
+		if (fixup_umip_exception(regs))
+			goto exit;
+
 		gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
 		goto exit;
 	}
diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
index d432f3824f0c..3ce99cbcf187 100644
--- a/arch/x86/kernel/umip.c
+++ b/arch/x86/kernel/umip.c
@@ -354,6 +354,9 @@ bool fixup_umip_exception(struct pt_regs *regs)
 	void __user *uaddr;
 	struct insn insn;
 
+	if (!cpu_feature_enabled(X86_FEATURE_UMIP))
+		return false;
+
 	if (!regs)
 		return false;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (8 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 09/15] x86/traps: Consolidate user fixups in exc_general_protection() Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07  6:51 ` [PATCH v10 11/15] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Sohil Mehta
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

The legacy vsyscall page is mapped at a fixed address in the kernel
address range 0xffffffffff600000-0xffffffffff601000. Prior to LASS, a
vsyscall page access from userspace would always generate a #PF. The
kernel emulates the execute (XONLY) accesses in the #PF handler and
returns the appropriate values to userspace.

With LASS, these accesses are intercepted before the paging structures
are traversed triggering a #GP instead of a #PF. However, the #GP
doesn't provide much information in terms of the error code.

Emulate the vsyscall access without going through complex instruction
decoding. Use the faulting RIP which is preserved in the user registers
to determine if the #GP was triggered due to a vsyscall access.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - No change.
---
 arch/x86/entry/vsyscall/vsyscall_64.c | 14 +++++++++++++-
 arch/x86/include/asm/vsyscall.h       |  6 ++++++
 arch/x86/kernel/traps.c               |  4 ++++
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 4c3f49bf39e6..ff319d7e778c 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -23,7 +23,7 @@
  * soon be no new userspace code that will ever use a vsyscall.
  *
  * The code in this file emulates vsyscalls when notified of a page
- * fault to a vsyscall address.
+ * fault or a general protection fault to a vsyscall address.
  */
 
 #include <linux/kernel.h>
@@ -282,6 +282,18 @@ bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
 	return __emulate_vsyscall(regs, address);
 }
 
+bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_LASS))
+		return false;
+
+	/* Emulate only if the RIP points to the vsyscall address */
+	if (!is_vsyscall_vaddr(regs->ip))
+		return false;
+
+	return __emulate_vsyscall(regs, regs->ip);
+}
+
 /*
  * A pseudo VMA to allow ptrace access for the vsyscall page.  This only
  * covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index f34902364972..538053b1656a 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -15,6 +15,7 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
  * Returns true if handled.
  */
 bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs, unsigned long address);
+bool emulate_vsyscall_gp(struct pt_regs *regs);
 #else
 static inline void map_vsyscall(void) {}
 static inline bool emulate_vsyscall_pf(unsigned long error_code,
@@ -22,6 +23,11 @@ static inline bool emulate_vsyscall_pf(unsigned long error_code,
 {
 	return false;
 }
+
+static inline bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+	return false;
+}
 #endif
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 25b45193eb19..59bfbdf0a1a0 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -69,6 +69,7 @@
 #include <asm/tdx.h>
 #include <asm/cfi.h>
 #include <asm/msr.h>
+#include <asm/vsyscall.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -817,6 +818,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 		if (fixup_umip_exception(regs))
 			goto exit;
 
+		if (emulate_vsyscall_gp(regs))
+			goto exit;
+
 		gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
 		goto exit;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 11/15] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (9 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 18:43   ` Edgecombe, Rick P
  2025-10-07  6:51 ` [PATCH v10 12/15] x86/traps: Communicate a LASS violation in #GP message Sohil Mehta
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

The EMULATE mode of vsyscall maps the vsyscall page with a high kernel
address directly into user address space. Reading the vsyscall page in
EMULATE mode would cause LASS to trigger a #GP.

Fixing the LASS violation in EMULATE mode would require complex
instruction decoding because the resulting #GP does not include any
useful error information, and the vsyscall address is not readily
available in the RIP.

The EMULATE mode has been deprecated since 2022 and can only be enabled
using the command line parameter vsyscall=emulate. See commit
bf00745e7791 ("x86/vsyscall: Remove CONFIG_LEGACY_VSYSCALL_EMULATE") for
details. At this point, no one is expected to be using this insecure
mode. The rare usages that need it obviously do not care about security.

Disable LASS when EMULATE mode is requested to avoid breaking legacy
user software. Also, update the vsyscall documentation to reflect this.
LASS will only be supported if vsyscall mode is set to XONLY (default)
or NONE.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - No significant change. Minor changes to code formatting.

Eventually, we want to get rid of the EMULATE mode altogether. Linus and
AndyL seem to be okay with such a change. However, those changes are
beyond the scope of this series.
---
 Documentation/admin-guide/kernel-parameters.txt | 4 +++-
 arch/x86/entry/vsyscall/vsyscall_64.c           | 6 ++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3edc5ce0e2a3..29a2ee9e1001 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -8079,7 +8079,9 @@
 
 			emulate     Vsyscalls turn into traps and are emulated
 			            reasonably safely.  The vsyscall page is
-				    readable.
+				    readable.  This disables the Linear
+				    Address Space Separation (LASS) security
+				    feature and makes the system less secure.
 
 			xonly       [default] Vsyscalls turn into traps and are
 			            emulated reasonably safely.  The vsyscall
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index ff319d7e778c..57498609b1f0 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -63,6 +63,12 @@ static int __init vsyscall_setup(char *str)
 		else
 			return -EINVAL;
 
+		if (cpu_feature_enabled(X86_FEATURE_LASS) && vsyscall_mode == EMULATE) {
+			cr4_clear_bits(X86_CR4_LASS);
+			setup_clear_cpu_cap(X86_FEATURE_LASS);
+			pr_warn_once("x86/cpu: Disabling LASS due to vsyscall=emulate\n");
+		}
+
 		return 0;
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 12/15] x86/traps: Communicate a LASS violation in #GP message
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (10 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 11/15] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 18:07   ` Edgecombe, Rick P
  2025-10-07  6:51 ` [PATCH v10 13/15] x86/traps: Generalize #GP address decode and hint code Sohil Mehta
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

From: Alexander Shishkin <alexander.shishkin@linux.intel.com>

A LASS violation typically results in a #GP. Provide a more helpful
message for a #GP when a kernel-side LASS violation is detected.

Currently, a kernel NULL pointer dereference triggers a #PF, which
prints a helpful message. Because LASS enforcement is pre-paging,
accesses to the first page frame would now be reported as a #GP, with an
LASS violation hint. Add a special case to print a friendly message
specifically for kernel NULL pointer dereferences.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - Minor improvement to code comments and hints.
---
 arch/x86/kernel/traps.c | 45 ++++++++++++++++++++++++++++++-----------
 1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 59bfbdf0a1a0..a5d10f7ae038 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -636,13 +636,23 @@ DEFINE_IDTENTRY(exc_bounds)
 enum kernel_gp_hint {
 	GP_NO_HINT,
 	GP_NON_CANONICAL,
-	GP_CANONICAL
+	GP_CANONICAL,
+	GP_LASS_VIOLATION,
+	GP_NULL_POINTER,
+};
+
+static const char * const kernel_gp_hint_help[] = {
+	[GP_NON_CANONICAL]	= "probably for non-canonical address",
+	[GP_CANONICAL]		= "maybe for address",
+	[GP_LASS_VIOLATION]	= "probably LASS violation for address",
+	[GP_NULL_POINTER]	= "kernel NULL pointer dereference",
 };
 
 /*
  * When an uncaught #GP occurs, try to determine the memory address accessed by
  * the instruction and return that address to the caller. Also, try to figure
- * out whether any part of the access to that address was non-canonical.
+ * out whether any part of the access to that address was non-canonical or
+ * across privilege levels.
  */
 static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
 						 unsigned long *addr)
@@ -664,14 +674,27 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
 		return GP_NO_HINT;
 
 #ifdef CONFIG_X86_64
-	/*
-	 * Check that:
-	 *  - the operand is not in the kernel half
-	 *  - the last byte of the operand is not in the user canonical half
-	 */
-	if (*addr < ~__VIRTUAL_MASK &&
-	    *addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
+	/* Operand is in the kernel half */
+	if (*addr >= ~__VIRTUAL_MASK)
+		return GP_CANONICAL;
+
+	/* The last byte of the operand is not in the user canonical half */
+	if (*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
 		return GP_NON_CANONICAL;
+
+	/*
+	 * If LASS is active, a NULL pointer dereference generates a #GP
+	 * instead of a #PF.
+	 */
+	if (*addr < PAGE_SIZE)
+		return GP_NULL_POINTER;
+
+	/*
+	 * Assume that LASS caused the exception, because the address is
+	 * canonical and in the user half.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_LASS))
+		return GP_LASS_VIOLATION;
 #endif
 
 	return GP_CANONICAL;
@@ -835,9 +858,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 
 	if (hint != GP_NO_HINT)
 		snprintf(desc, sizeof(desc), GPFSTR ", %s 0x%lx",
-			 (hint == GP_NON_CANONICAL) ? "probably for non-canonical address"
-						    : "maybe for address",
-			 gp_addr);
+			 kernel_gp_hint_help[hint], gp_addr);
 
 	/*
 	 * KASAN is interested only in the non-canonical case, clear it
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 13/15] x86/traps: Generalize #GP address decode and hint code
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (11 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 12/15] x86/traps: Communicate a LASS violation in #GP message Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 18:43   ` Edgecombe, Rick P
  2025-10-07  6:51 ` [PATCH v10 14/15] x86/traps: Provide additional hints for a kernel stack segment fault Sohil Mehta
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

In most cases, an access causing a LASS violation results in a #GP, for
stack accesses (those due to stack-oriented instructions, as well as
accesses that implicitly or explicitly use the SS segment register), a
stack segment fault (#SS) is generated.

Handlers for #GP and #SS will soon share code to decode the exception
address and retrieve the exception hint string. Rename the helper
function as well as the enum and array names to reflect that they are no
longer specific to #GP.

No functional change intended.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - No change.
---
 arch/x86/kernel/dumpstack.c |  6 ++--
 arch/x86/kernel/traps.c     | 60 ++++++++++++++++++-------------------
 2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 71ee20102a8a..e0f85214e92f 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -441,14 +441,14 @@ void die(const char *str, struct pt_regs *regs, long err)
 	oops_end(flags, regs, sig);
 }
 
-void die_addr(const char *str, struct pt_regs *regs, long err, long gp_addr)
+void die_addr(const char *str, struct pt_regs *regs, long err, long addr)
 {
 	unsigned long flags = oops_begin();
 	int sig = SIGSEGV;
 
 	__die_header(str, regs, err);
-	if (gp_addr)
-		kasan_non_canonical_hook(gp_addr);
+	if (addr)
+		kasan_non_canonical_hook(addr);
 	if (__die_body(str, regs, err))
 		sig = 0;
 	oops_end(flags, regs, sig);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a5d10f7ae038..3ee8a36a4e6a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -633,29 +633,29 @@ DEFINE_IDTENTRY(exc_bounds)
 	cond_local_irq_disable(regs);
 }
 
-enum kernel_gp_hint {
-	GP_NO_HINT,
-	GP_NON_CANONICAL,
-	GP_CANONICAL,
-	GP_LASS_VIOLATION,
-	GP_NULL_POINTER,
+enum kernel_exc_hint {
+	EXC_NO_HINT,
+	EXC_NON_CANONICAL,
+	EXC_CANONICAL,
+	EXC_LASS_VIOLATION,
+	EXC_NULL_POINTER,
 };
 
-static const char * const kernel_gp_hint_help[] = {
-	[GP_NON_CANONICAL]	= "probably for non-canonical address",
-	[GP_CANONICAL]		= "maybe for address",
-	[GP_LASS_VIOLATION]	= "probably LASS violation for address",
-	[GP_NULL_POINTER]	= "kernel NULL pointer dereference",
+static const char * const kernel_exc_hint_help[] = {
+	[EXC_NON_CANONICAL]	= "probably for non-canonical address",
+	[EXC_CANONICAL]		= "maybe for address",
+	[EXC_LASS_VIOLATION]	= "probably LASS violation for address",
+	[EXC_NULL_POINTER]	= "kernel NULL pointer dereference",
 };
 
 /*
- * When an uncaught #GP occurs, try to determine the memory address accessed by
- * the instruction and return that address to the caller. Also, try to figure
- * out whether any part of the access to that address was non-canonical or
- * across privilege levels.
+ * When an uncaught #GP/#SS occurs, try to determine the memory address
+ * accessed by the instruction and return that address to the caller.
+ * Also, try to figure out whether any part of the access to that
+ * address was non-canonical or across privilege levels.
  */
-static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
-						 unsigned long *addr)
+static enum kernel_exc_hint get_kernel_exc_address(struct pt_regs *regs,
+						   unsigned long *addr)
 {
 	u8 insn_buf[MAX_INSN_SIZE];
 	struct insn insn;
@@ -663,41 +663,41 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
 
 	if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip,
 			MAX_INSN_SIZE))
-		return GP_NO_HINT;
+		return EXC_NO_HINT;
 
 	ret = insn_decode_kernel(&insn, insn_buf);
 	if (ret < 0)
-		return GP_NO_HINT;
+		return EXC_NO_HINT;
 
 	*addr = (unsigned long)insn_get_addr_ref(&insn, regs);
 	if (*addr == -1UL)
-		return GP_NO_HINT;
+		return EXC_NO_HINT;
 
 #ifdef CONFIG_X86_64
 	/* Operand is in the kernel half */
 	if (*addr >= ~__VIRTUAL_MASK)
-		return GP_CANONICAL;
+		return EXC_CANONICAL;
 
 	/* The last byte of the operand is not in the user canonical half */
 	if (*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
-		return GP_NON_CANONICAL;
+		return EXC_NON_CANONICAL;
 
 	/*
 	 * If LASS is active, a NULL pointer dereference generates a #GP
 	 * instead of a #PF.
 	 */
 	if (*addr < PAGE_SIZE)
-		return GP_NULL_POINTER;
+		return EXC_NULL_POINTER;
 
 	/*
 	 * Assume that LASS caused the exception, because the address is
 	 * canonical and in the user half.
 	 */
 	if (cpu_feature_enabled(X86_FEATURE_LASS))
-		return GP_LASS_VIOLATION;
+		return EXC_LASS_VIOLATION;
 #endif
 
-	return GP_CANONICAL;
+	return EXC_CANONICAL;
 }
 
 #define GPFSTR "general protection fault"
@@ -816,7 +816,7 @@ static void gp_user_force_sig_segv(struct pt_regs *regs, int trapnr,
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 {
 	char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
-	enum kernel_gp_hint hint = GP_NO_HINT;
+	enum kernel_exc_hint hint = EXC_NO_HINT;
 	unsigned long gp_addr;
 
 	if (user_mode(regs) && try_fixup_enqcmd_gp())
@@ -854,17 +854,17 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 	if (error_code)
 		snprintf(desc, sizeof(desc), "segment-related " GPFSTR);
 	else
-		hint = get_kernel_gp_address(regs, &gp_addr);
+		hint = get_kernel_exc_address(regs, &gp_addr);
 
-	if (hint != GP_NO_HINT)
+	if (hint != EXC_NO_HINT)
 		snprintf(desc, sizeof(desc), GPFSTR ", %s 0x%lx",
-			 kernel_gp_hint_help[hint], gp_addr);
+			 kernel_exc_hint_help[hint], gp_addr);
 
 	/*
 	 * KASAN is interested only in the non-canonical case, clear it
 	 * otherwise.
 	 */
-	if (hint != GP_NON_CANONICAL)
+	if (hint != EXC_NON_CANONICAL)
 		gp_addr = 0;
 
 	die_addr(desc, regs, error_code, gp_addr);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 14/15] x86/traps: Provide additional hints for a kernel stack segment fault
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (12 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 13/15] x86/traps: Generalize #GP address decode and hint code Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07  6:51 ` [PATCH v10 15/15] x86/cpu: Enable LASS by default during CPU initialization Sohil Mehta
  2025-10-07 16:23 ` [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Edgecombe, Rick P
  15 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

Kernel triggered #SS exceptions are rare, and the faulting instruction
may not even have a memory operand. In cases where it does, hints about
the cause of the fault can be useful.

LASS throws a #GP for any violation except for stack register access,
which instead triggers a #SS. Handle a kernel #SS similarly to a #GP and
reuse the address decode logic to provide additional hints, such as a
non-canonical address or an LASS violation.

In case of FRED, before handling #SS as a kernel violation, check if
there's a fixup for the exception. Redirect the #SS due to invalid user
context on ERETU to userspace. See commit 5105e7687ad3 ("x86/fred: Fixup
fault on ERETU by jumping to fred_entrypoint_user") for details.

Originally-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - Remove the LASS feature check to always provide hints independent of
   LASS being enabled.
 - Update printk to use KERN_DEFAULT (checkpatch warning).
 - Add code comments.
---
 arch/x86/kernel/traps.c | 43 +++++++++++++++++++++++++++++++++++------
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3ee8a36a4e6a..864c614cddab 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -418,12 +418,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_segment_not_present)
 		      SIGBUS, 0, NULL);
 }
 
-DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
-{
-	do_error_trap(regs, error_code, "stack segment", X86_TRAP_SS, SIGBUS,
-		      0, NULL);
-}
-
 DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
 {
 	char *str = "alignment check";
@@ -873,6 +867,43 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 	cond_local_irq_disable(regs);
 }
 
+#define SSFSTR "stack segment fault"
+
+DEFINE_IDTENTRY_ERRORCODE(exc_stack_segment)
+{
+	enum kernel_exc_hint hint;
+	unsigned long exc_addr;
+
+	if (user_mode(regs))
+		goto error_trap;
+
+	/*
+	 * With FRED enabled, an invalid user context can lead to an #SS
+	 * trap on ERETU. Fixup the exception and redirect the fault to
+	 * userspace in that case.
+	 */
+	if (cpu_feature_enabled(X86_FEATURE_FRED) &&
+	    fixup_exception(regs, X86_TRAP_SS, error_code, 0))
+		return;
+
+	if (notify_die(DIE_TRAP, SSFSTR, regs, error_code, X86_TRAP_SS, SIGBUS) == NOTIFY_STOP)
+		return;
+
+	hint = get_kernel_exc_address(regs, &exc_addr);
+	if (hint != EXC_NO_HINT)
+		printk(KERN_DEFAULT SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint], exc_addr);
+
+	/* KASAN only cares about the non-canonical case, clear it otherwise */
+	if (hint != EXC_NON_CANONICAL)
+		exc_addr = 0;
+
+	die_addr(SSFSTR, regs, error_code, exc_addr);
+	return;
+
+error_trap:
+	do_error_trap(regs, error_code, SSFSTR, X86_TRAP_SS, SIGBUS, 0, NULL);
+}
+
 static bool do_int3(struct pt_regs *regs)
 {
 	int res;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH v10 15/15] x86/cpu: Enable LASS by default during CPU initialization
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (13 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 14/15] x86/traps: Provide additional hints for a kernel stack segment fault Sohil Mehta
@ 2025-10-07  6:51 ` Sohil Mehta
  2025-10-07 18:42   ` Edgecombe, Rick P
  2025-10-07 16:23 ` [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Edgecombe, Rick P
  15 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07  6:51 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Sohil Mehta,
	Xin Li, David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, David Laight, Randy Dunlap,
	Geert Uytterhoeven, Kees Cook, Tony Luck, Alexander Shishkin,
	linux-doc, linux-kernel, linux-efi

Linear Address Space Separation (LASS) mitigates a class of side-channel
attacks that rely on speculative access across the user/kernel boundary.
Enable it by default if the platform supports it.

While at it, remove the comment above the SMAP/SMEP/UMIP/LASS setup
instead of updating it, as the whole sequence is quite self-explanatory.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10
 - No change.
---
 arch/x86/kernel/cpu/common.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 57d5824465b0..7f0f1b56cbe7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -401,6 +401,12 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
+static __always_inline void setup_lass(struct cpuinfo_x86 *c)
+{
+	if (cpu_feature_enabled(X86_FEATURE_LASS))
+		cr4_set_bits(X86_CR4_LASS);
+}
+
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 					     X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED |
@@ -2019,10 +2025,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_lass(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 00/15] x86: Enable Linear Address Space Separation support
  2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
                   ` (14 preceding siblings ...)
  2025-10-07  6:51 ` [PATCH v10 15/15] x86/cpu: Enable LASS by default during CPU initialization Sohil Mehta
@ 2025-10-07 16:23 ` Edgecombe, Rick P
  2025-10-17 19:52   ` Sohil Mehta
  15 siblings, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 16:23 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> > Userspace accesses
> > ------------------
> > Userspace attempts to access any kernel address generate a #GP when LASS
> > is enabled. Unfortunately, legacy vsyscall functions are located in the
> > address range 0xffffffffff600000 - 0xffffffffff601000. Prior to LASS,
> > default access (XONLY) to the vsyscall page would generate a page fault
> > and the access would be emulated in the kernel. To avoid breaking user
> > applications when LASS is enabled, the patches extend vsyscall emulation
> > in XONLY mode to the #GP handler.
> > 
> > In contrast, the vsyscall EMULATE mode is deprecated and not expected to
> > be used by anyone. Supporting EMULATE mode with LASS would require
> > complex instruction decoding in the #GP fault handler, which is probably
> > not worth the effort. For now, LASS is disabled in the rare case when
> > someone absolutely needs to enable vsyscall=emulate via the command
> > line.

There is also an expected harmless UABI change around SIG_SEGV. For a user mode
kernel address range access, the kernel can deliver a signal that provides the
exception type and the address. Before it was #PF, now a #GP with no address.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-07  6:51 ` [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
@ 2025-10-07 16:55   ` Edgecombe, Rick P
  2025-10-07 22:28     ` Sohil Mehta
  2025-10-21 20:03   ` Borislav Petkov
  1 sibling, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 16:55 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

It's not just used for alternatives anymore. bpf, kprobes, etc use it too. Maybe
drop "alternatives" from the subject?

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> For patching, the kernel initializes a temporary mm area in the lower
> half of the address range. Disable LASS enforcement by toggling the
> RFLAGS.AC bit during patching to avoid triggering a #GP fault.
> 
> Introduce LASS-specific stac()/clac() helpers along with comments to
> clarify their usage versus the existing stac()/clac() helpers for SMAP.
> 
> Text poking functions used while patching kernel alternatives use the
> standard memcpy()/memset(). However, objtool complains about calling
> dynamic functions within an AC=1 region.
> 
> One workaround is to add memcpy() and memset() to the list of functions
> allowed by objtool. However, that would provide a blanket exemption for
> all usages of memcpy() and memset().
> 
> Instead, replace the standard memcpy() and memset() calls in the text
> poking functions with their unoptimized inline versions. Considering
> that patching is usually small, there is no performance impact expected.
> 
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
> v10:
>  - Revert to the inline functions instead of open-coding in assembly.
>  - Simplify code comments.
> ---
>  arch/x86/include/asm/smap.h   | 35 +++++++++++++++++++++++++++++++++--
>  arch/x86/kernel/alternative.c | 18 ++++++++++++++++--
>  2 files changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
> index 4f84d421d1cf..3ecb4b0de1f9 100644
> --- a/arch/x86/include/asm/smap.h
> +++ b/arch/x86/include/asm/smap.h
> @@ -23,18 +23,49 @@
>  
>  #else /* __ASSEMBLER__ */
>  
> +/*
> + * The CLAC/STAC instructions toggle the enforcement of X86_FEATURE_SMAP
> + * and X86_FEATURE_LASS.
> + *
> + * SMAP enforcement is based on the _PAGE_BIT_USER bit in the page
> + * tables. The kernel is not allowed to touch pages with the bit set
> + * unless the AC bit is set.
> + *
> + * Use stac()/clac() when accessing userspace (_PAGE_USER) mappings,
> + * regardless of location.
> + *
> + * Note: a barrier is implicit in alternative().
> + */
> +
>  static __always_inline void clac(void)
>  {
> -	/* Note: a barrier is implicit in alternative() */
>  	alternative("", "clac", X86_FEATURE_SMAP);
>  }
>  
>  static __always_inline void stac(void)
>  {
> -	/* Note: a barrier is implicit in alternative() */
>  	alternative("", "stac", X86_FEATURE_SMAP);
>  }
>  
> +/*
> + * LASS enforcement is based on bit 63 of the virtual address. The
> + * kernel is not allowed to touch memory in the lower half of the
> + * virtual address space unless the AC bit is set.
> + *
> + * Use lass_stac()/lass_clac() when accessing kernel mappings
> + * (!_PAGE_USER) in the lower half of the address space.
> + */
> +

The above variant has "a barrier is implicit in alternative", is it not needed
here too? Actually, not sure what that comment is trying to convey anyway.

> +static __always_inline void lass_clac(void)
> +{
> +	alternative("", "clac", X86_FEATURE_LASS);
> +}
> +
> +static __always_inline void lass_stac(void)
> +{
> +	alternative("", "stac", X86_FEATURE_LASS);
> +}
> +

Not a strong opinion, but the naming of stac()/clac() lass_stac()/lass_clac() is
a bit confusing to me. stac/clac instructions now has LASS and SMAP behavior.
Why keep the smap behavior implicit and give LASS a special variant?

The other odd aspect is that calling stac()/clac() is needed for LASS in some
places too, right? But stac()/clac() depend on X86_FEATURE_SMAP not
X86_FEATURE_LASS. A reader might wonder, why do we not need the lass variant
there too.

I'd expect in the real world LASS won't be found without SMAP. Maybe it could be
worth just improving the comment around stac()/clac() to include some nod that
it is doing LASS stuff too, or that it relies on that USER mappings are only
found in the lower half, but KERNEL mappings are not only found upper half.

>  static __always_inline unsigned long smap_save(void)
>  {
>  	unsigned long flags;
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 79ae9cb50019..dc90b421d760 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -2409,16 +2409,30 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
>  __ro_after_init struct mm_struct *text_poke_mm;
>  __ro_after_init unsigned long text_poke_mm_addr;
>  
> +/*
> + * Text poking creates and uses a mapping in the lower half of the
> + * address space. Relax LASS enforcement when accessing the poking
> + * address.
> + *
> + * Also, objtool enforces a strict policy of "no function calls within
> + * AC=1 regions". Adhere to the policy by using inline versions of
> + * memcpy()/memset() that will never result in a function call.

Is "Also, ..." here really a separate issue? What is the connection to
lass_stac/clac()?

> + */
> +
>  static void text_poke_memcpy(void *dst, const void *src, size_t len)
>  {
> -	memcpy(dst, src, len);
> +	lass_stac();
> +	__inline_memcpy(dst, src, len);
> +	lass_clac();
>  }
>  
>  static void text_poke_memset(void *dst, const void *src, size_t len)
>  {
>  	int c = *(const int *)src;
>  
> -	memset(dst, c, len);
> +	lass_stac();
> +	__inline_memset(dst, c, len);
> +	lass_clac();
>  }
>  
>  typedef void text_poke_f(void *dst, const void *src, size_t len);


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
  2025-10-07  6:51 ` [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall() Sohil Mehta
@ 2025-10-07 17:23   ` Edgecombe, Rick P
  2025-10-07 23:05     ` Sohil Mehta
  2025-10-17 19:28   ` Sohil Mehta
  1 sibling, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 17:23 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> Problem
> -------
> In order to map the EFI runtime services, set_virtual_address_map()
> needs to be called, which resides in the lower half of the address
> space. This means that LASS needs to be temporarily disabled around this
> call.

Possibly naive EFI question...

efi_runtime_services_32_t has: 
   typedef struct {
   	efi_table_hdr_t hdr;
   	u32 get_time;
   	u32 set_time;
   	u32 get_wakeup_time;
   	u32 set_wakeup_time;
   	u32 set_virtual_address_map;
   	u32 convert_pointer;
   	u32 get_variable;
   	u32 get_next_variable;
   	u32 set_variable;
   	u32 get_next_high_mono_count;
   	u32 reset_system;
   	u32 update_capsule;
   	u32 query_capsule_caps;
   	u32 query_variable_info;
   } efi_runtime_services_32_t;
   

Why is only set_virtual_address_map problematic? Some of the other ones get
called after boot by a bunch of modules by the looks of it.

> 
> Wrapping efi_enter_virtual_mode() with lass_stac()/clac() is not enough,
> because the AC flag only gates data accesses, not instruction fetches.
> Clearing the CR4.LASS bit is required to make this work.
> 
> However, pinned CR4 bits are not expected to be modified after
> boot CPU init, resulting in a kernel warning.
> 
> Solution
> --------
> One option is to move the CR pinning setup immediately after the runtime
> services have been mapped. However, that is a narrow fix that would
> require revisiting if something else needs to modify a pinned CR bit.
> 
> CR pinning mainly prevents exploits from trivially modifying
> security-sensitive CR bits. There is limited benefit to enabling CR
> pinning before userspace comes up. Defer CR pinning enforcement until
> late_initcall() to allow EFI and future users to modify the CR bits
> without any concern for CR pinning.
> 
> Save the pinned bits while initializing the boot CPU because they are
> needed later to program the value on APs when they come up.
> 
> Note
> ----
> This introduces a small window between the boot CPU being initialized
> and CR pinning being enforced, where any in-kernel clearing of the
> pinned bits could go unnoticed. Later, when enforcement begins, a
> warning is triggered as soon as any CR4 bit is modified, such as
> X86_CR4_PGE during a TLB flush.
> 
> Currently, this is a purely theoretical concern. There are multiple ways
> to resolve it [1] if it becomes a problem in practice.
> 
> Link: https://lore.kernel.org/lkml/c59aa7ac-62a6-45ec-b626-de518b25f7d9@intel.com/ [1]
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
> v10:
>  - Split recording pinned bits and enabling pinning into two functions.
>  - Defer pinning until userspace comes up.
> 
> This patch does not include any changes to harden the CR pinning
> implementation, as that is beyond the scope of this series.
> ---
>  arch/x86/kernel/cpu/common.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 61ab332eaf73..57d5824465b0 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -476,8 +476,8 @@ void cr4_init(void)
>  
>  	if (boot_cpu_has(X86_FEATURE_PCID))
>  		cr4 |= X86_CR4_PCIDE;
> -	if (static_branch_likely(&cr_pinning))
> -		cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
> +
> +	cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;


Can you explain why this change is needed? It relies on cr4_pinned_bits to be
already set, and kind of is "enforcement", but no longer depends on
enable_cr_pinning() being called.


>  
>  	__write_cr4(cr4);
>  
> @@ -487,14 +487,21 @@ void cr4_init(void)
>  
>  /*
>   * Once CPU feature detection is finished (and boot params have been
> - * parsed), record any of the sensitive CR bits that are set, and
> - * enable CR pinning.
> + * parsed), record any of the sensitive CR bits that are set.
>   */
> -static void __init setup_cr_pinning(void)
> +static void __init record_cr_pinned_bits(void)
>  {
>  	cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
> +}
> +
> +/* Enables enforcement of the CR pinned bits */
> +static int __init enable_cr_pinning(void)
> +{
>  	static_key_enable(&cr_pinning.key);
> +
> +	return 0;
>  }
> +late_initcall(enable_cr_pinning);
>  
>  static __init int x86_nofsgsbase_setup(char *arg)
>  {
> @@ -2119,7 +2126,7 @@ static __init void identify_boot_cpu(void)
>  	enable_sep_cpu();
>  #endif
>  	cpu_detect_tlb(&boot_cpu_data);
> -	setup_cr_pinning();
> +	record_cr_pinned_bits();
>  
>  	tsx_init();
>  	tdx_init();


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 07/15] x86/kexec: Disable LASS during relocate kernel
  2025-10-07  6:51 ` [PATCH v10 07/15] x86/kexec: Disable LASS during relocate kernel Sohil Mehta
@ 2025-10-07 17:43   ` Edgecombe, Rick P
  2025-10-07 22:33     ` Sohil Mehta
  0 siblings, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 17:43 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> Relocate kernel uses identity mapping to copy the new kernel which
> leads to an LASS violation. To avoid issues, disable LASS after the
> original CR4 value has been saved but before jumping to the identity
> mapped page.

It could help to expand on this a bit. Something like... We need to disable LASS
before we jump to the identity map because otherwise it will immediately die
trying to execute a low address. But if the kexec flavor gets to virtual_mapped,
we want LASS restored, so we need to disable LASS after CR4 is saved. We also
can't disable it where CET get's disabled because that is too late. So disable
it along with PGE.


> 
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
> v10:
>  - New patch to fix an issue detected during internal testing.
> ---
>  arch/x86/kernel/relocate_kernel_64.S | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> index 11e20bb13aca..4ffba68dc57b 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -95,9 +95,12 @@ SYM_CODE_START_NOALIGN(relocate_kernel)
>  	/* Leave CR4 in %r13 to enable the right paging mode later. */
>  	movq	%cr4, %r13
>  
> -	/* Disable global pages immediately to ensure this mapping is RWX */
> +	/*
> +	 * Disable global pages immediately to ensure this mapping is RWX.
> +	 * Disable LASS before jumping to the identity mapped page.
> +	 */
>  	movq	%r13, %r12
> -	andq	$~(X86_CR4_PGE), %r12
> +	andq	$~(X86_CR4_PGE | X86_CR4_LASS), %r12
>  	movq	%r12, %cr4
>  
>  	/* Save %rsp and CRs. */


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 09/15] x86/traps: Consolidate user fixups in exc_general_protection()
  2025-10-07  6:51 ` [PATCH v10 09/15] x86/traps: Consolidate user fixups in exc_general_protection() Sohil Mehta
@ 2025-10-07 17:46   ` Edgecombe, Rick P
  2025-10-07 22:41     ` Sohil Mehta
  0 siblings, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 17:46 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> Move the UMIP exception fixup along with the other user mode fixups,
> that is, under the common "if (user_mode(regs))" condition where the
> rest of the fixups reside.

Can you mention that it also drops static_cpu_has(X86_FEATURE_UMIP) check
because fixup_umip_exception() already checks
cpu_feature_enabled(X86_FEATURE_UMIP)?

> 
> No functional change intended.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 12/15] x86/traps: Communicate a LASS violation in #GP message
  2025-10-07  6:51 ` [PATCH v10 12/15] x86/traps: Communicate a LASS violation in #GP message Sohil Mehta
@ 2025-10-07 18:07   ` Edgecombe, Rick P
  0 siblings, 0 replies; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 18:07 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> 
> A LASS violation typically results in a #GP. Provide a more helpful
> message for a #GP when a kernel-side LASS violation is detected.
> 
> Currently, a kernel NULL pointer dereference triggers a #PF, which
> prints a helpful message. Because LASS enforcement is pre-paging,
> accesses to the first page frame would now be reported as a #GP, with an
> LASS violation hint. Add a special case to print a friendly message
> specifically for kernel NULL pointer dereferences.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
> v10:
>  - Minor improvement to code comments and hints.
> ---
>  arch/x86/kernel/traps.c | 45 ++++++++++++++++++++++++++++++-----------
>  1 file changed, 33 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 59bfbdf0a1a0..a5d10f7ae038 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -636,13 +636,23 @@ DEFINE_IDTENTRY(exc_bounds)
>  enum kernel_gp_hint {
>  	GP_NO_HINT,
>  	GP_NON_CANONICAL,
> -	GP_CANONICAL
> +	GP_CANONICAL,
> +	GP_LASS_VIOLATION,
> +	GP_NULL_POINTER,
> +};
> +
> +static const char * const kernel_gp_hint_help[] = {
> +	[GP_NON_CANONICAL]	= "probably for non-canonical address",
> +	[GP_CANONICAL]		= "maybe for address",
> +	[GP_LASS_VIOLATION]	= "probably LASS violation for address",
> +	[GP_NULL_POINTER]	= "kernel NULL pointer dereference",
>  };
>  
>  /*
>   * When an uncaught #GP occurs, try to determine the memory address accessed by
>   * the instruction and return that address to the caller. Also, try to figure
> - * out whether any part of the access to that address was non-canonical.
> + * out whether any part of the access to that address was non-canonical or
> + * across privilege levels.
>   */
>  static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
>  						 unsigned long *addr)
> @@ -664,14 +674,27 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
>  		return GP_NO_HINT;

The patch looks good. I'm not sure how fancy we want to get here, but an idea if
you want it...

Above this hunk is:

	if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip,
			MAX_INSN_SIZE))
		return EXC_NO_HINT;

If cpu_feature_enabled(X86_FEATURE_LASS) and regs->ip is in the lower half, we
could make a pretty strong guess that it was a LASS violation. I guess the same
argument could be made for the canonical guess work, but maybe calling a NULL
function pointer is more likely.

>  
>  #ifdef CONFIG_X86_64
> -	/*
> -	 * Check that:
> -	 *  - the operand is not in the kernel half
> -	 *  - the last byte of the operand is not in the user canonical half
> -	 */
> -	if (*addr < ~__VIRTUAL_MASK &&
> -	    *addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
> +	/* Operand is in the kernel half */
> +	if (*addr >= ~__VIRTUAL_MASK)
> +		return GP_CANONICAL;
> +
> +	/* The last byte of the operand is not in the user canonical half */
> +	if (*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
>  		return GP_NON_CANONICAL;
> +
> +	/*
> +	 * If LASS is active, a NULL pointer dereference generates a #GP
> +	 * instead of a #PF.
> +	 */
> +	if (*addr < PAGE_SIZE)
> +		return GP_NULL_POINTER;
> +
> +	/*
> +	 * Assume that LASS caused the exception, because the address is
> +	 * canonical and in the user half.
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_LASS))
> +		return GP_LASS_VIOLATION;
>  #endif
>  
>  	return GP_CANONICAL;
> @@ -835,9 +858,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>  
>  	if (hint != GP_NO_HINT)
>  		snprintf(desc, sizeof(desc), GPFSTR ", %s 0x%lx",
> -			 (hint == GP_NON_CANONICAL) ? "probably for non-canonical address"
> -						    : "maybe for address",
> -			 gp_addr);
> +			 kernel_gp_hint_help[hint], gp_addr);
>  
>  	/*
>  	 * KASAN is interested only in the non-canonical case, clear it


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07  6:51 ` [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits Sohil Mehta
@ 2025-10-07 18:19   ` Edgecombe, Rick P
  2025-10-07 18:28     ` Dave Hansen
  2025-10-07 20:49     ` Sohil Mehta
  2025-10-16 15:35   ` Borislav Petkov
  1 sibling, 2 replies; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 18:19 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> Linear Address Space Separation (LASS) is a security feature that
> intends to prevent malicious virtual address space accesses across
> user/kernel mode.
> 
> Such mode based access protection already exists today with paging and
> features such as SMEP and SMAP. However, to enforce these protections,
> the processor must traverse the paging structures in memory. An attacker
> can use timing information resulting from this traversal to determine
> details about the paging structures, and to determine the layout of the
> kernel memory.
> 
> LASS provides the same mode-based protections as paging but without
> traversing the paging structures. Because the protections are enforced
> pre-paging, an attacker will not be able to derive paging-based timing

Nit: pre-page walk maybe? Otherwise it could sound like before paging is enabled
during boot.

> information from the various caching structures such as the TLBs,
> mid-level caches, page walker, data caches, etc.
> 
> LASS enforcement relies on the kernel implementation to divide the
> 64-bit virtual address space into two halves:
>   Addr[63]=0 -> User address space
>   Addr[63]=1 -> Kernel address space
> 
> Any data access or code execution across address spaces typically
> results in a #GP fault. The LASS enforcement for kernel data accesses is
> dependent on CR4.SMAP being set. The enforcement can be disabled by
> toggling the RFLAGS.AC bit similar to SMAP.
> 
> Define the CPU feature bits to enumerate LASS and add a dependency on
> SMAP.
> 
> LASS mitigates a class of side-channel speculative attacks, such as
> Spectre LAM [1]. Add the "lass" flag to /proc/cpuinfo to indicate that
> the feature is supported by hardware and enabled by the kernel.  This
> allows userspace to determine if the system is secure against such
> attacks.
> 
> Link: https://download.vusec.net/papers/slam_sp24.pdf [1]
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
> Reviewed-by: Xin Li (Intel) <xin@zytor.com>
> ---
> v10:
>  - Do not modify tools/**/cpufeatures.h as those are synced separately.
> ---
>  arch/x86/Kconfig.cpufeatures                | 4 ++++
>  arch/x86/include/asm/cpufeatures.h          | 1 +
>  arch/x86/include/uapi/asm/processor-flags.h | 2 ++
>  arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
>  4 files changed, 8 insertions(+)
> 
> diff --git a/arch/x86/Kconfig.cpufeatures b/arch/x86/Kconfig.cpufeatures
> index 250c10627ab3..733d5aff2456 100644
> --- a/arch/x86/Kconfig.cpufeatures
> +++ b/arch/x86/Kconfig.cpufeatures
> @@ -124,6 +124,10 @@ config X86_DISABLED_FEATURE_PCID
>  	def_bool y
>  	depends on !X86_64
>  
> +config X86_DISABLED_FEATURE_LASS
> +	def_bool y
> +	depends on X86_32
> +

All the other ones in the file are !X86_64. Why do this one X86_32?

>  config X86_DISABLED_FEATURE_PKU
>  	def_bool y
>  	depends on !X86_INTEL_MEMORY_PROTECTION_KEYS
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index b2a562217d3f..1283f3bdda0d 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -314,6 +314,7 @@
>  #define X86_FEATURE_SM4			(12*32+ 2) /* SM4 instructions */
>  #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* "avx_vnni" AVX VNNI instructions */
>  #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* "avx512_bf16" AVX512 BFLOAT16 instructions */
> +#define X86_FEATURE_LASS		(12*32+ 6) /* "lass" Linear Address Space Separation */
>  #define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* CMPccXADD instructions */
>  #define X86_FEATURE_ARCH_PERFMON_EXT	(12*32+ 8) /* Intel Architectural PerfMon Extension */
>  #define X86_FEATURE_FZRM		(12*32+10) /* Fast zero-length REP MOVSB */
> diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
> index f1a4adc78272..81d0c8bf1137 100644
> --- a/arch/x86/include/uapi/asm/processor-flags.h
> +++ b/arch/x86/include/uapi/asm/processor-flags.h
> @@ -136,6 +136,8 @@
>  #define X86_CR4_PKE		_BITUL(X86_CR4_PKE_BIT)
>  #define X86_CR4_CET_BIT		23 /* enable Control-flow Enforcement Technology */
>  #define X86_CR4_CET		_BITUL(X86_CR4_CET_BIT)
> +#define X86_CR4_LASS_BIT	27 /* enable Linear Address Space Separation support */
> +#define X86_CR4_LASS		_BITUL(X86_CR4_LASS_BIT)
>  #define X86_CR4_LAM_SUP_BIT	28 /* LAM for supervisor pointers */
>  #define X86_CR4_LAM_SUP		_BITUL(X86_CR4_LAM_SUP_BIT)
>  
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index 46efcbd6afa4..98d0cdd82574 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -89,6 +89,7 @@ static const struct cpuid_dep cpuid_deps[] = {
>  	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
>  	{ X86_FEATURE_FRED,			X86_FEATURE_LKGS      },
>  	{ X86_FEATURE_SPEC_CTRL_SSBD,		X86_FEATURE_SPEC_CTRL },
> +	{ X86_FEATURE_LASS,			X86_FEATURE_SMAP      },

Aha! So SMAP is required for LASS. This makes the stac/clac patch make more
sense. Please those comments less seriously. Although I think a comment is still
not unwarranted.

>  	{}
>  };
>  


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive
  2025-10-07  6:51 ` [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive Sohil Mehta
@ 2025-10-07 18:24   ` Edgecombe, Rick P
  2025-10-07 23:11     ` Sohil Mehta
  0 siblings, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 18:24 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> From: Yian Chen <yian.chen@intel.com>
> 
> Security features such as LASS are not expected to be disabled once
> initialized. Add LASS to the CR4 pinned mask.
> 
> Signed-off-by: Yian Chen <yian.chen@intel.com>
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>

I was debating whether we really need this, given the LASS and CR pinning threat
models. CR pinning seems to be about after an attacker has already hijacked a
control flow and is looking to escalate it into more control. We could maybe get
away with dropping this and the following patch. But it would still be good to
get a warning if it gets turned off inadvertently I think. It might be worth
adding justification like that to the log.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07 18:19   ` Edgecombe, Rick P
@ 2025-10-07 18:28     ` Dave Hansen
  2025-10-07 20:20       ` Sohil Mehta
  2025-10-07 20:49     ` Sohil Mehta
  1 sibling, 1 reply; 74+ messages in thread
From: Dave Hansen @ 2025-10-07 18:28 UTC (permalink / raw)
  To: Edgecombe, Rick P, Mehta, Sohil, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org,
	dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/25 11:19, Edgecombe, Rick P wrote:
>>  	{ X86_FEATURE_SPEC_CTRL_SSBD,		X86_FEATURE_SPEC_CTRL },
>> +	{ X86_FEATURE_LASS,			X86_FEATURE_SMAP      },
> Aha! So SMAP is required for LASS. This makes the stac/clac patch make more
> sense. Please those comments less seriously. Although I think a comment is still
> not unwarranted.

STAC/CLAC are also architected to:

	#UD - If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0.

So, even though LASS _technically_ doesn't require SMAP, it would be a
real pain without SMAP and STAC/CLAC. Thus, this series relies on SMAP
being present.

Actually, it might be worth breaking this dependency hunk out into its
own patch, just so there's a nice clean place to discuss this.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-07  6:51 ` [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code Sohil Mehta
@ 2025-10-07 18:37   ` Edgecombe, Rick P
  2025-10-07 18:48     ` Dave Hansen
  0 siblings, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 18:37 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> Separate out the actual vsyscall emulation from the #PF specific
> handling in preparation for the upcoming #GP emulation.
> 
> No functional change intended.
> 
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> ---
> v10:
>  - Modify the code flow slightly to make it easier to follow.
> ---
>  arch/x86/entry/vsyscall/vsyscall_64.c | 63 ++++++++++++++-------------
>  arch/x86/include/asm/vsyscall.h       |  7 ++-
>  arch/x86/mm/fault.c                   |  2 +-
>  3 files changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 6e6c0a740837..4c3f49bf39e6 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -112,43 +112,13 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
>  	}
>  }
>  
> -bool emulate_vsyscall(unsigned long error_code,
> -		      struct pt_regs *regs, unsigned long address)
> +static bool __emulate_vsyscall(struct pt_regs *regs, unsigned long address)
>  {
>  	unsigned long caller;
>  	int vsyscall_nr, syscall_nr, tmp;
>  	long ret;
>  	unsigned long orig_dx;
>  
> -	/* Write faults or kernel-privilege faults never get fixed up. */
> -	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
> -		return false;
> -
> -	/*
> -	 * Assume that faults at regs->ip are because of an
> -	 * instruction fetch. Return early and avoid
> -	 * emulation for faults during data accesses:
> -	 */
> -	if (address != regs->ip) {
> -		/* Failed vsyscall read */
> -		if (vsyscall_mode == EMULATE)
> -			return false;
> -
> -		/*
> -		 * User code tried and failed to read the vsyscall page.
> -		 */
> -		warn_bad_vsyscall(KERN_INFO, regs, "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
> -		return false;
> -	}
> -
> -	/*
> -	 * X86_PF_INSTR is only set when NX is supported.  When
> -	 * available, use it to double-check that the emulation code
> -	 * is only being used for instruction fetches:
> -	 */
> -	if (cpu_feature_enabled(X86_FEATURE_NX))
> -		WARN_ON_ONCE(!(error_code & X86_PF_INSTR));
> -
>  	/*
>  	 * No point in checking CS -- the only way to get here is a user mode
>  	 * trap to a high address, which means that we're in 64-bit user code.

I don't know. Is this as true any more? We are now sometimes guessing based on
regs->ip of a #GP. What if the kernel accidentally tries to jump to the vsyscall
address? Then we are reading the kernel stack and strange things. Maybe it's
worth replacing the comment with a check? Feel free to call this paranoid.


> @@ -281,6 +251,37 @@ bool emulate_vsyscall(unsigned long error_code,
>  	return true;
>  }
>  
> +bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
> +			 unsigned long address)
> +{
> +	/* Write faults or kernel-privilege faults never get fixed up. */
> +	if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
> +		return false;
> +
> +	/*
> +	 * Assume that faults at regs->ip are because of an instruction
> +	 * fetch. Return early and avoid emulation for faults during
> +	 * data accesses:
> +	 */
> +	if (address != regs->ip) {
> +		 /* User code tried and failed to read the vsyscall page. */
> +		if (vsyscall_mode != EMULATE)
> +			warn_bad_vsyscall(KERN_INFO, regs, "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
> +
> +		return false;
> +	}
> +
> +	/*
> +	 * X86_PF_INSTR is only set when NX is supported.  When
> +	 * available, use it to double-check that the emulation code
> +	 * is only being used for instruction fetches:
> +	 */
> +	if (cpu_feature_enabled(X86_FEATURE_NX))
> +		WARN_ON_ONCE(!(error_code & X86_PF_INSTR));
> +
> +	return __emulate_vsyscall(regs, address);
> +}
> +
>  /*
>   * A pseudo VMA to allow ptrace access for the vsyscall page.  This only
>   * covers the 64bit vsyscall page now. 32bit has a real VMA now and does
> diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
> index 472f0263dbc6..f34902364972 100644
> --- a/arch/x86/include/asm/vsyscall.h
> +++ b/arch/x86/include/asm/vsyscall.h
> @@ -14,12 +14,11 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
>   * Called on instruction fetch fault in vsyscall page.
>   * Returns true if handled.
>   */
> -extern bool emulate_vsyscall(unsigned long error_code,
> -			     struct pt_regs *regs, unsigned long address);
> +bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs, unsigned long address);
>  #else
>  static inline void map_vsyscall(void) {}
> -static inline bool emulate_vsyscall(unsigned long error_code,
> -				    struct pt_regs *regs, unsigned long address)
> +static inline bool emulate_vsyscall_pf(unsigned long error_code,
> +				       struct pt_regs *regs, unsigned long address)
>  {
>  	return false;
>  }
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 998bd807fc7b..fbcc2da75fd6 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1316,7 +1316,7 @@ void do_user_addr_fault(struct pt_regs *regs,
>  	 * to consider the PF_PK bit.
>  	 */
>  	if (is_vsyscall_vaddr(address)) {
> -		if (emulate_vsyscall(error_code, regs, address))
> +		if (emulate_vsyscall_pf(error_code, regs, address))
>  			return;
>  	}
>  #endif


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 15/15] x86/cpu: Enable LASS by default during CPU initialization
  2025-10-07  6:51 ` [PATCH v10 15/15] x86/cpu: Enable LASS by default during CPU initialization Sohil Mehta
@ 2025-10-07 18:42   ` Edgecombe, Rick P
  0 siblings, 0 replies; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 18:42 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> Linear Address Space Separation (LASS) mitigates a class of side-channel
> attacks that rely on speculative access across the user/kernel boundary.
> Enable it by default if the platform supports it.
> 
> While at it, remove the comment above the SMAP/SMEP/UMIP/LASS setup
> instead of updating it, as the whole sequence is quite self-explanatory.
> 
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 11/15] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  2025-10-07  6:51 ` [PATCH v10 11/15] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Sohil Mehta
@ 2025-10-07 18:43   ` Edgecombe, Rick P
  0 siblings, 0 replies; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 18:43 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> The EMULATE mode of vsyscall maps the vsyscall page with a high kernel
> address directly into user address space. Reading the vsyscall page in
> EMULATE mode would cause LASS to trigger a #GP.
> 
> Fixing the LASS violation in EMULATE mode would require complex
> instruction decoding because the resulting #GP does not include any
> useful error information, and the vsyscall address is not readily
> available in the RIP.
> 
> The EMULATE mode has been deprecated since 2022 and can only be enabled
> using the command line parameter vsyscall=emulate. See commit
> bf00745e7791 ("x86/vsyscall: Remove CONFIG_LEGACY_VSYSCALL_EMULATE") for
> details. At this point, no one is expected to be using this insecure
> mode. The rare usages that need it obviously do not care about security.
> 
> Disable LASS when EMULATE mode is requested to avoid breaking legacy
> user software. Also, update the vsyscall documentation to reflect this.
> LASS will only be supported if vsyscall mode is set to XONLY (default)
> or NONE.
> 
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

> ---
> v10:
>  - No significant change. Minor changes to code formatting.
> 
> Eventually, we want to get rid of the EMULATE mode altogether. Linus and
> AndyL seem to be okay with such a change. However, those changes are
> beyond the scope of this series.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 13/15] x86/traps: Generalize #GP address decode and hint code
  2025-10-07  6:51 ` [PATCH v10 13/15] x86/traps: Generalize #GP address decode and hint code Sohil Mehta
@ 2025-10-07 18:43   ` Edgecombe, Rick P
  0 siblings, 0 replies; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 18:43 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> 
> In most cases, an access causing a LASS violation results in a #GP, for
> stack accesses (those due to stack-oriented instructions, as well as
> accesses that implicitly or explicitly use the SS segment register), a
> stack segment fault (#SS) is generated.
> 
> Handlers for #GP and #SS will soon share code to decode the exception
> address and retrieve the exception hint string. Rename the helper
> function as well as the enum and array names to reflect that they are no
> longer specific to #GP.
> 
> No functional change intended.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-07 18:37   ` Edgecombe, Rick P
@ 2025-10-07 18:48     ` Dave Hansen
  2025-10-07 19:53       ` Edgecombe, Rick P
  2025-10-30 16:58       ` Andy Lutomirski
  0 siblings, 2 replies; 74+ messages in thread
From: Dave Hansen @ 2025-10-07 18:48 UTC (permalink / raw)
  To: Edgecombe, Rick P, Mehta, Sohil, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org,
	dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/25 11:37, Edgecombe, Rick P wrote:
>>  	/*
>>  	 * No point in checking CS -- the only way to get here is a user mode
>>  	 * trap to a high address, which means that we're in 64-bit user code.
> I don't know. Is this as true any more? We are now sometimes guessing based on
> regs->ip of a #GP. What if the kernel accidentally tries to jump to the vsyscall
> address? Then we are reading the kernel stack and strange things. Maybe it's
> worth replacing the comment with a check? Feel free to call this paranoid.

The first check in emulate_vsyscall() is:

       /* Write faults or kernel-privilege faults never get fixed up. */
       if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
               return false;

If the kernel jumped to the vsyscall page, it would end up there, return
false, and never reach the code near the "No point in checking CS" comment.

Right? Or am I misunderstanding the scenario you're calling out?

If I'm understanding it right, I'd be a bit reluctant to add a CS check
as well.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-07 18:48     ` Dave Hansen
@ 2025-10-07 19:53       ` Edgecombe, Rick P
  2025-10-07 22:52         ` Sohil Mehta
  2025-10-30 16:58       ` Andy Lutomirski
  1 sibling, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 19:53 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, Hansen, Dave,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On Tue, 2025-10-07 at 11:48 -0700, Dave Hansen wrote:
> On 10/7/25 11:37, Edgecombe, Rick P wrote:
> > >   	/*
> > >   	 * No point in checking CS -- the only way to get here is a user mode
> > >   	 * trap to a high address, which means that we're in 64-bit user code.
> > I don't know. Is this as true any more? We are now sometimes guessing based on
> > regs->ip of a #GP. What if the kernel accidentally tries to jump to the vsyscall
> > address? Then we are reading the kernel stack and strange things. Maybe it's
> > worth replacing the comment with a check? Feel free to call this paranoid.
> 
> The first check in emulate_vsyscall() is:
> 
>        /* Write faults or kernel-privilege faults never get fixed up. */
>        if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
>                return false;
> 
> If the kernel jumped to the vsyscall page, it would end up there, return
> false, and never reach the code near the "No point in checking CS" comment.
> 
> Right? Or am I misunderstanding the scenario you're calling out?
> 
> If I'm understanding it right, I'd be a bit reluctant to add a CS check
> as well.

Sorry, I could have been clearer. Yes, I assumed that the comment was talking
about that check you quote.

But I'm looking at this applied. The following patches (which don't include that
hunk), add another call site:

bool emulate_vsyscall_gp(struct pt_regs *regs)
{
	if (!cpu_feature_enabled(X86_FEATURE_LASS))
		return false;

	/* Emulate only if the RIP points to the vsyscall address */
	if (!is_vsyscall_vaddr(regs->ip))
		return false;

	return __emulate_vsyscall(regs, regs->ip);
}

If indeed we should add a check, it should probably go in one of the later
patches and not this one.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07 18:28     ` Dave Hansen
@ 2025-10-07 20:20       ` Sohil Mehta
  2025-10-07 20:38         ` Edgecombe, Rick P
  2025-10-16  3:10         ` H. Peter Anvin
  0 siblings, 2 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 20:20 UTC (permalink / raw)
  To: Dave Hansen, Edgecombe, Rick P, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	x86@kernel.org

On 10/7/2025 11:28 AM, Dave Hansen wrote:

> STAC/CLAC are also architected to:
> 
> 	#UD - If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0.
> 
> So, even though LASS _technically_ doesn't require SMAP, it would be a
> real pain without SMAP and STAC/CLAC. Thus, this series relies on SMAP
> being present.
> 

The spec says,
"A supervisor-mode data access causes a LASS violation if it would
access a linear address of which bit 63 is 0, supervisor-mode access
protection is enabled (by setting CR4.SMAP), and either RFLAGS.AC = 0 or
the access is an implicit supervisor-mode access."

One could argue that the LASS hardware enforcement of the kernel data
accesses *depends* on SMAP being enabled.

> Actually, it might be worth breaking this dependency hunk out into its
> own patch, just so there's a nice clean place to discuss this.

Sure, we can talk about the above wording in the spec, as well as the
STAC/CLAC dependency in a separate patch.

I included some information in the cover letter to explain that:

When there are valid reasons for the kernel to access memory in the user
half, it can temporarily suspend LASS enforcement by toggling the
RFLAGS.AC bit. Most of these cases are already covered today through the
stac()/clac() pairs, which avoid SMAP violations. However, there are
kernel usages, such as text poking, that access mappings (!_PAGE_USER)
in the lower half of the address space. LASS-specific AC bit toggling is
added for these cases.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07 20:20       ` Sohil Mehta
@ 2025-10-07 20:38         ` Edgecombe, Rick P
  2025-10-07 20:53           ` Sohil Mehta
  2025-10-16  3:10         ` H. Peter Anvin
  1 sibling, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-07 20:38 UTC (permalink / raw)
  To: Mehta, Sohil, Hansen, Dave, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	tglx@linutronix.de, bp@alien8.de, mingo@redhat.com,
	geert@linux-m68k.org, x86@kernel.org

On Tue, 2025-10-07 at 13:20 -0700, Sohil Mehta wrote:
> > STAC/CLAC are also architected to:
> > 
> >  	#UD - If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0.
> > 
> > So, even though LASS _technically_ doesn't require SMAP, it would be a
> > real pain without SMAP and STAC/CLAC. Thus, this series relies on SMAP
> > being present.

Ah, ok.

> 
> 
> The spec says,
> "A supervisor-mode data access causes a LASS violation if it would
> access a linear address of which bit 63 is 0, supervisor-mode access
> protection is enabled (by setting CR4.SMAP), and either RFLAGS.AC = 0 or
> the access is an implicit supervisor-mode access."
> 
> One could argue that the LASS hardware enforcement of the kernel data
> accesses *depends* on SMAP being enabled.

The fetch part doesn't though?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07 18:19   ` Edgecombe, Rick P
  2025-10-07 18:28     ` Dave Hansen
@ 2025-10-07 20:49     ` Sohil Mehta
  2025-10-07 23:16       ` Xin Li
  1 sibling, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 20:49 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/2025 11:19 AM, Edgecombe, Rick P wrote:
>> LASS provides the same mode-based protections as paging but without
>> traversing the paging structures. Because the protections are enforced
>> pre-paging, an attacker will not be able to derive paging-based timing
> 
> Nit: pre-page walk maybe? Otherwise it could sound like before paging is enabled
> during boot.
> 

How about: ... protections are enforced prior to page walks, an ...

>>  
>> +config X86_DISABLED_FEATURE_LASS
>> +	def_bool y
>> +	depends on X86_32
>> +
> 
> All the other ones in the file are !X86_64. Why do this one X86_32?
> 

The double negation (DISABLED and !X86_64) was harder to follow when
this was initially posted.

https://lore.kernel.org/lkml/73796800-819b-4433-b0ef-db852336d7a4@zytor.com/
https://lore.kernel.org/lkml/756e93a2-7e42-4323-ae21-a5437e71148e@infradead.org/

I don't have a strong preference. I guess the inconsistency makes it
confusing as well. Will change it back to !X86_64 unless Xin objects.



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07 20:38         ` Edgecombe, Rick P
@ 2025-10-07 20:53           ` Sohil Mehta
  0 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 20:53 UTC (permalink / raw)
  To: Edgecombe, Rick P, Hansen, Dave, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	tglx@linutronix.de, bp@alien8.de, mingo@redhat.com,
	geert@linux-m68k.org, x86@kernel.org

On 10/7/2025 1:38 PM, Edgecombe, Rick P wrote:
>> One could argue that the LASS hardware enforcement of the kernel data
>> accesses *depends* on SMAP being enabled.
> 
> The fetch part doesn't though?

That's right. The instruction fetches could have depended on SMEP but
the spec explicitly calls out that it does not.

"A supervisor-mode instruction fetch causes a LASS violation if it would
accesses a linear address of which bit 63 is 0. (Unlike paging, this
behavior of LASS applies regardless of the setting of CR4.SMEP.)"

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-07 16:55   ` Edgecombe, Rick P
@ 2025-10-07 22:28     ` Sohil Mehta
  2025-10-08 16:22       ` Edgecombe, Rick P
  0 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 22:28 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/2025 9:55 AM, Edgecombe, Rick P wrote:
> It's not just used for alternatives anymore. bpf, kprobes, etc use it too. Maybe
> drop "alternatives" from the subject?
> 

Yeah, I was just being lazy. The file is still called alternatives.c and
that's probably what most folks are familiar with.

How about:
x86/text-patching: Disable LASS when patching kernel code

> 
> The above variant has "a barrier is implicit in alternative", is it not needed
> here too? Actually, not sure what that comment is trying to convey anyway.
> 

Yes, the same implication holds true for the LASS versions as well.
I assume it is to let users know that a separate memory barrier is not
needed to prevent the memory accesses following the STAC/CLAC
instructions from getting reordered.

I will add a similar note to the lass_clac()/stac() comments as well.

> Not a strong opinion, but the naming of stac()/clac() lass_stac()/lass_clac() is
> a bit confusing to me. stac/clac instructions now has LASS and SMAP behavior.
> Why keep the smap behavior implicit and give LASS a special variant?
> 
> The other odd aspect is that calling stac()/clac() is needed for LASS in some
> places too, right? But stac()/clac() depend on X86_FEATURE_SMAP not
> X86_FEATURE_LASS. A reader might wonder, why do we not need the lass variant
> there too.
> 
> I'd expect in the real world LASS won't be found without SMAP. Maybe it could be
> worth just improving the comment around stac()/clac() to include some nod that
> it is doing LASS stuff too, or that it relies on that USER mappings are only
> found in the lower half, but KERNEL mappings are not only found upper half.
> 

LASS (data access) depends on SMAP in the hardware as well as the
kernel. The STAC/CLAC instructions toggle LASS alongwith SMAP. One
option is to use the current stac()/clac() instruction for all cases.
However, that would mean unnecessary AC bit toggling during
text-patching on systems without LASS.

The code comments mainly describe how these helpers should be used,
rather than why they exist the way they do.

>> /* Use stac()/clac() when accessing userspace (_PAGE_USER)
>> mappings, regardless of location. */
>> 
>> /* Use lass_stac()/lass_clac() when accessing kernel mappings (!
>> _PAGE_USER) in the lower half of the address space. */
Does this look accurate? The difference is subtle. Also, there is some
potential for incorrect usage, but Dave would prefer to track them
separately.

I can add more explanation to the commit message. Any preferred wording?
Also, the separate patch that Dave recommended would help clarify things
as well.

>>  
>> +/*
>> + * Text poking creates and uses a mapping in the lower half of the
>> + * address space. Relax LASS enforcement when accessing the poking
>> + * address.
>> + *
>> + * Also, objtool enforces a strict policy of "no function calls within
>> + * AC=1 regions". Adhere to the policy by using inline versions of
>> + * memcpy()/memset() that will never result in a function call.
> 
> Is "Also, ..." here really a separate issue? What is the connection to
> lass_stac/clac()?
> 

The issues are interdependent. We need the STAC/CLAC because text poking
accesses special memory. We require the inline memcpy/memset because we
have now added the STAC/CLAC usage and objtool guards against the
potential misuse of STAC/CLAC.

Were you looking for any specific change to the wording?

>>  static void text_poke_memcpy(void *dst, const void *src, size_t len)
>>  {
>> -	memcpy(dst, src, len);
>> +	lass_stac();
>> +	__inline_memcpy(dst, src, len);
>> +	lass_clac();
>>  }
>>  



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 07/15] x86/kexec: Disable LASS during relocate kernel
  2025-10-07 17:43   ` Edgecombe, Rick P
@ 2025-10-07 22:33     ` Sohil Mehta
  0 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 22:33 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/2025 10:43 AM, Edgecombe, Rick P wrote:
> On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
>> Relocate kernel uses identity mapping to copy the new kernel which
>> leads to an LASS violation. To avoid issues, disable LASS after the
>> original CR4 value has been saved but before jumping to the identity
>> mapped page.
> 
> It could help to expand on this a bit. Something like... We need to disable LASS
> before we jump to the identity map because otherwise it will immediately die
> trying to execute a low address. But if the kexec flavor gets to virtual_mapped,
> we want LASS restored, so we need to disable LASS after CR4 is saved. We also
> can't disable it where CET get's disabled because that is too late. So disable
> it along with PGE.
> 

Sure, will add the detailed reasoning.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 09/15] x86/traps: Consolidate user fixups in exc_general_protection()
  2025-10-07 17:46   ` Edgecombe, Rick P
@ 2025-10-07 22:41     ` Sohil Mehta
  2025-10-08 17:43       ` Edgecombe, Rick P
  0 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 22:41 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/2025 10:46 AM, Edgecombe, Rick P wrote:
> On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
>> Move the UMIP exception fixup along with the other user mode fixups,
>> that is, under the common "if (user_mode(regs))" condition where the
>> rest of the fixups reside.
> 
> Can you mention that it also drops static_cpu_has(X86_FEATURE_UMIP) check
> because fixup_umip_exception() already checks
> cpu_feature_enabled(X86_FEATURE_UMIP)?
> 

There is no existing check. The current patch moves the X86_FEATURE_UMIP
check to fixup_umip_exception().

I can add a sentence to say that the current check is split into two
separate locations. But, is it not obvious from the diff?



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-07 19:53       ` Edgecombe, Rick P
@ 2025-10-07 22:52         ` Sohil Mehta
  2025-10-08 17:42           ` Edgecombe, Rick P
  0 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 22:52 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	Hansen, Dave, dave.hansen@linux.intel.com, bp@alien8.de,
	x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On 10/7/2025 12:53 PM, Edgecombe, Rick P wrote:

> But I'm looking at this applied. The following patches (which don't include that
> hunk), add another call site:
> 
> bool emulate_vsyscall_gp(struct pt_regs *regs)
> {
> 	if (!cpu_feature_enabled(X86_FEATURE_LASS))
> 		return false;
> 
> 	/* Emulate only if the RIP points to the vsyscall address */
> 	if (!is_vsyscall_vaddr(regs->ip))
> 		return false;
> 
> 	return __emulate_vsyscall(regs, regs->ip);
> }
> 
> If indeed we should add a check, it should probably go in one of the later
> patches and not this one.

We already check CS before calling emulate_vsyscall_gp().

if (user_mode(regs)) {

...
	if (emulate_vsyscall_gp(regs))
		goto exit;

...
}

	

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
  2025-10-07 17:23   ` Edgecombe, Rick P
@ 2025-10-07 23:05     ` Sohil Mehta
  2025-10-08 17:36       ` Edgecombe, Rick P
  0 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 23:05 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/2025 10:23 AM, Edgecombe, Rick P wrote:

> 
> Why is only set_virtual_address_map problematic? Some of the other ones get
> called after boot by a bunch of modules by the looks of it.
> 

AFAIU, efi_enter_virtual_mode()->set_virtual_address_map maps the
runtime services from physical mode into virtual mode.

After that, all the other runtime services can get called using virtual
addressing. I can find out more if you still have concerns.

>> @@ -476,8 +476,8 @@ void cr4_init(void)
>>  
>>  	if (boot_cpu_has(X86_FEATURE_PCID))
>>  		cr4 |= X86_CR4_PCIDE;
>> -	if (static_branch_likely(&cr_pinning))
>> -		cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
>> +
>> +	cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
> 
> 
> Can you explain why this change is needed? It relies on cr4_pinned_bits to be
> already set, and kind of is "enforcement", but no longer depends on
> enable_cr_pinning() being called.
> 

cr4_init() is only called from APs during bring up. The pinned bits are
saved on the BSP and then used to program the CR4 on the APs. It is
independent of pinning *enforcement* which warns when these bits get
modified.

> 
>>  
>>  	__write_cr4(cr4);
>>  

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive
  2025-10-07 18:24   ` Edgecombe, Rick P
@ 2025-10-07 23:11     ` Sohil Mehta
  2025-10-08 16:52       ` Edgecombe, Rick P
  0 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-07 23:11 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/2025 11:24 AM, Edgecombe, Rick P wrote:
>> Security features such as LASS are not expected to be disabled once
>> initialized. Add LASS to the CR4 pinned mask.
>>
> 
> I was debating whether we really need this, given the LASS and CR pinning threat
> models. CR pinning seems to be about after an attacker has already hijacked a
> control flow and is looking to escalate it into more control.

Can you please explain more? How is LASS different from SMAP and SMEP
for which the CR pinning code was initially added?

> We could maybe get
> away with dropping this and the following patch. But it would still be good to
> get a warning if it gets turned off inadvertently I think. It might be worth
> adding justification like that to the log.

My understanding from the previous discussions was that CR pinning
deferral might be beneficial independent of LASS.
https://lore.kernel.org/lkml/c59aa7ac-62a6-45ec-b626-de518b25f7d9@intel.com/

The pinning enforcement provides the warning and reprograms the bit.
Maybe, I've misunderstood your comment.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07 20:49     ` Sohil Mehta
@ 2025-10-07 23:16       ` Xin Li
  2025-10-08 16:00         ` Edgecombe, Rick P
  0 siblings, 1 reply; 74+ messages in thread
From: Xin Li @ 2025-10-07 23:16 UTC (permalink / raw)
  To: Sohil Mehta, Edgecombe, Rick P, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org,
	dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kees@kernel.org, hpa@zytor.com,
	peterz@infradead.org, linux-efi@vger.kernel.org,
	geert@linux-m68k.org

On 10/8/2025 4:49 AM, Sohil Mehta wrote:
>>>   
>>> +config X86_DISABLED_FEATURE_LASS
>>> +	def_bool y
>>> +	depends on X86_32
>>> +
>> All the other ones in the file are !X86_64. Why do this one X86_32?
>>
> The double negation (DISABLED and !X86_64) was harder to follow when
> this was initially posted.
> 
> https://lore.kernel.org/lkml/73796800-819b-4433-b0ef-db852336d7a4@zytor.com/
> https://lore.kernel.org/lkml/756e93a2-7e42-4323-ae21- 
> a5437e71148e@infradead.org/
> 
> I don't have a strong preference. I guess the inconsistency makes it
> confusing as well. Will change it back to !X86_64 unless Xin objects.

I prefer to use X86_32, which is more direct.

Now the only disabled feature when !X86_64 is X86_DISABLED_FEATURE_PCID.
And I would expect the disabled features due to lack of 32-bit enabling 
will keep growing until we remove 32-bit kernel code.  I was also thinking
to move all such disabled features to a dedicated file when the total
reaches 3.  But hopefully removing 32-bit will happen first.






^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07 23:16       ` Xin Li
@ 2025-10-08 16:00         ` Edgecombe, Rick P
  0 siblings, 0 replies; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-08 16:00 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com,
	x86@kernel.org, dave.hansen@linux.intel.com, xin@zytor.com,
	bp@alien8.de
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kees@kernel.org, hpa@zytor.com,
	peterz@infradead.org, geert@linux-m68k.org

On Wed, 2025-10-08 at 07:16 +0800, Xin Li wrote:
> > 
> > I don't have a strong preference. I guess the inconsistency makes it
> > confusing as well. Will change it back to !X86_64 unless Xin objects.
> 
> I prefer to use X86_32, which is more direct.

Fine by me, I was just noticing the asymmetry. I do think that anything like
that, that sticks out, is good to mention in the log.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-07 22:28     ` Sohil Mehta
@ 2025-10-08 16:22       ` Edgecombe, Rick P
  2025-10-10 17:10         ` Sohil Mehta
  0 siblings, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-08 16:22 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On Tue, 2025-10-07 at 15:28 -0700, Sohil Mehta wrote:
> On 10/7/2025 9:55 AM, Edgecombe, Rick P wrote:
> > It's not just used for alternatives anymore. bpf, kprobes, etc use it too. Maybe
> > drop "alternatives" from the subject?
> > 
> 
> Yeah, I was just being lazy. The file is still called alternatives.c and
> that's probably what most folks are familiar with.
> 
> How about:
> x86/text-patching: Disable LASS when patching kernel code

"x86/alternatives: Disable LASS when patching kernel alternatives"

I meant the last reference there ^. I think the "x86/alternatives:" part should
match the rest of the commits to the file.

"x86/alternatives: Disable LASS when patching kernel kernel code" sounds more
accurate to me.

> 
> > 
> > The above variant has "a barrier is implicit in alternative", is it not needed
> > here too? Actually, not sure what that comment is trying to convey anyway.
> > 
> 
> Yes, the same implication holds true for the LASS versions as well.
> I assume it is to let users know that a separate memory barrier is not
> needed to prevent the memory accesses following the STAC/CLAC
> instructions from getting reordered.
> 
> I will add a similar note to the lass_clac()/stac() comments as well.

Up to you.

> 
> > Not a strong opinion, but the naming of stac()/clac() lass_stac()/lass_clac() is
> > a bit confusing to me. stac/clac instructions now has LASS and SMAP behavior.
> > Why keep the smap behavior implicit and give LASS a special variant?
> > 
> > The other odd aspect is that calling stac()/clac() is needed for LASS in some
> > places too, right? But stac()/clac() depend on X86_FEATURE_SMAP not
> > X86_FEATURE_LASS. A reader might wonder, why do we not need the lass variant
> > there too.
> > 
> > I'd expect in the real world LASS won't be found without SMAP. Maybe it could be
> > worth just improving the comment around stac()/clac() to include some nod that
> > it is doing LASS stuff too, or that it relies on that USER mappings are only
> > found in the lower half, but KERNEL mappings are not only found upper half.
> > 
> 
> LASS (data access) depends on SMAP in the hardware as well as the
> kernel. The STAC/CLAC instructions toggle LASS alongwith SMAP. One
> option is to use the current stac()/clac() instruction for all cases.
> However, that would mean unnecessary AC bit toggling during
> text-patching on systems without LASS.

Honestly, just unconditionally doing stac/clac doesn't sound that bad to me. We
already unconditionally enable SMAP, right? If there was some big slowdown for a
single copy, people would want an option to disable it. And with text patching
it's part a heavier operation already.

Was there previous feedback on that option?

> 
> The code comments mainly describe how these helpers should be used,
> rather than why they exist the way they do.
> 
> > > /* Use stac()/clac() when accessing userspace (_PAGE_USER)
> > > mappings, regardless of location. */
> > > 
> > > /* Use lass_stac()/lass_clac() when accessing kernel mappings (!
> > > _PAGE_USER) in the lower half of the address space. */
> Does this look accurate? The difference is subtle. Also, there is some
> potential for incorrect usage, but Dave would prefer to track them
> separately.
> 
> I can add more explanation to the commit message. Any preferred wording?
> Also, the separate patch that Dave recommended would help clarify things
> as well.

No preference. The main comment was just that, as someone looking at this part
fresh, it was a little unclear. Especially with the non-obvious SMAP-LASS link.

> 
> > >  
> > > +/*
> > > + * Text poking creates and uses a mapping in the lower half of the
> > > + * address space. Relax LASS enforcement when accessing the poking
> > > + * address.
> > > + *
> > > + * Also, objtool enforces a strict policy of "no function calls within
> > > + * AC=1 regions". Adhere to the policy by using inline versions of
> > > + * memcpy()/memset() that will never result in a function call.
> > 
> > Is "Also, ..." here really a separate issue? What is the connection to
> > lass_stac/clac()?
> > 
> 
> The issues are interdependent. We need the STAC/CLAC because text poking
> accesses special memory. We require the inline memcpy/memset because we
> have now added the STAC/CLAC usage and objtool guards against the
> potential misuse of STAC/CLAC.
> 
> Were you looking for any specific change to the wording?

Ah ok, but the compiler could have always uninlined the existing memcpy calls
right? So there is an existing theoretical problem, I would think.

But that link sounds strong enough to do it in one patch. If it were me I would
nod at the existing theoretical issue.

> 
> > >  static void text_poke_memcpy(void *dst, const void *src, size_t len)
> > >  {
> > > -	memcpy(dst, src, len);
> > > +	lass_stac();
> > > +	__inline_memcpy(dst, src, len);
> > > +	lass_clac();
> > >  }
> > >  
> 
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive
  2025-10-07 23:11     ` Sohil Mehta
@ 2025-10-08 16:52       ` Edgecombe, Rick P
  2025-10-10 19:03         ` Sohil Mehta
  0 siblings, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-08 16:52 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On Tue, 2025-10-07 at 16:11 -0700, Sohil Mehta wrote:
> On 10/7/2025 11:24 AM, Edgecombe, Rick P wrote:
> > > Security features such as LASS are not expected to be disabled once
> > > initialized. Add LASS to the CR4 pinned mask.
> > > 
> > 
> > I was debating whether we really need this, given the LASS and CR pinning threat
> > models. CR pinning seems to be about after an attacker has already hijacked a
> > control flow and is looking to escalate it into more control.
> 
> Can you please explain more? How is LASS different from SMAP and SMEP
> for which the CR pinning code was initially added?

The next patch says "CR pinning mainly prevents exploits from trivially
modifying security-sensitive CR bits."

My understanding of that attack is that attacker already has enough control in
the kernel to call CR4 writing functions, or otherwise control the CR4 write.
They disable SMAP or something to help do ROP for the next step of their attack.

I *think* the observation that lead to CR pinning was that before SMAP attacks
would prep a ROP chain (stack) in userspace memory, then use a function pointer
highjack to call a function that switched the stack to userspace memory. After
SMAP this was blocked, and attacks had to do the longer step of forming and
finding the ROP stack in kernel memory. But then some observed attacks were just
first calling CR4 writing functions to disable SMAP and then continue the
userspace based attack like normal. So CR pinning could block this and force
them the other way. Then as long as we had the infrastructure, any CR bits that
might help were added to the mask because why not. (I think this is the history,
but please don't take it as authoritative)

Over SMAP, LASS has speculative benefits. Usually a speculative attack doesn't
involve non-speculative control flow highjack. If you already have that, you
probably don't need to mess with a speculative attack. (hand waiving a bit)

So I was thinking that the CR pinning of LASS doesn't really help that reasoning
from the next patch. And unlike the other bits that just got added easily, this
one required infrastructure changes and extra patch. So wondered, hmm, is it
worth it to do the extra patches?

> 
> > We could maybe get
> > away with dropping this and the following patch. But it would still be good to
> > get a warning if it gets turned off inadvertently I think. It might be worth
> > adding justification like that to the log.
> 
> My understanding from the previous discussions was that CR pinning
> deferral might be beneficial independent of LASS.
> https://lore.kernel.org/lkml/c59aa7ac-62a6-45ec-b626-de518b25f7d9@intel.com/
> 
> The pinning enforcement provides the warning and reprograms the bit.
> Maybe, I've misunderstood your comment.
> 

Yea, I agree it would be good to get a warning. The write may be triggered
accidentally by a kernel bug. I agree with the patch, but just commenting my
reasoning for the sake of discussion. Maybe we can tighten the reasoning in the
log. I tend to think that if I have to go into a long chain of analysis to
decide I agree with the patch, that the log should have helped me get there. Of
course this can also just be because it went over my head. Please take it as a
soft comment.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
  2025-10-07 23:05     ` Sohil Mehta
@ 2025-10-08 17:36       ` Edgecombe, Rick P
  2025-10-10 20:45         ` Sohil Mehta
  0 siblings, 1 reply; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-08 17:36 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On Tue, 2025-10-07 at 16:05 -0700, Sohil Mehta wrote:
> On 10/7/2025 10:23 AM, Edgecombe, Rick P wrote:
> 
> > 
> > Why is only set_virtual_address_map problematic? Some of the other ones get
> > called after boot by a bunch of modules by the looks of it.
> > 
> 
> AFAIU, efi_enter_virtual_mode()->set_virtual_address_map maps the
> runtime services from physical mode into virtual mode.
> 
> After that, all the other runtime services can get called using virtual
> addressing. I can find out more if you still have concerns.

Ah, looking into this more I see the:
ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping
space

It looks like the services get mapped there in a special MM. The calls switch to
it, so should stay in high address space. But I also saw this snippet:
   /*
    * Certain firmware versions are way too sentimental and still believe
    * they are exclusive and unquestionable owners of the first physical page,
    * even though they explicitly mark it as EFI_CONVENTIONAL_MEMORY
    * (but then write-access it later during SetVirtualAddressMap()).
    *
    * Create a 1:1 mapping for this page, to avoid triple faults during early
    * boot with such firmware. We are free to hand this page to the BIOS,
    * as trim_bios_range() will reserve the first page and isolate it away
    * from memory allocators anyway.
    */
   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
   	pr_err("Failed to create 1:1 mapping for the first page!\n");
   	return 1;
   }

By leaving LASS enabled it seems the kernel will be constraining BIOSs to not do
strange things. Which seems reasonable.

> 
> > > @@ -476,8 +476,8 @@ void cr4_init(void)
> > >  
> > >  	if (boot_cpu_has(X86_FEATURE_PCID))
> > >  		cr4 |= X86_CR4_PCIDE;
> > > -	if (static_branch_likely(&cr_pinning))
> > > -		cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
> > > +
> > > +	cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
> > 
> > 
> > Can you explain why this change is needed? It relies on cr4_pinned_bits to be
> > already set, and kind of is "enforcement", but no longer depends on
> > enable_cr_pinning() being called.
> > 
> 
> cr4_init() is only called from APs during bring up. The pinned bits are
> saved on the BSP and then used to program the CR4 on the APs. It is
> independent of pinning *enforcement* which warns when these bits get
> modified.

Sorry, still not following. How is it independent of CR pinning enforcement if
the enforcement is still taking place in this function. And if we don't need to
enforce pinning, why drop the branch?

> 
> > 
> > >  
> > >  	__write_cr4(cr4);
> > >  


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-07 22:52         ` Sohil Mehta
@ 2025-10-08 17:42           ` Edgecombe, Rick P
  0 siblings, 0 replies; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-08 17:42 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com, Hansen, Dave,
	x86@kernel.org, dave.hansen@linux.intel.com, bp@alien8.de
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, Luck, Tony,
	alexander.shishkin@linux.intel.com, andrew.cooper3@citrix.com,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	seanjc@google.com, xin@zytor.com, kas@kernel.org,
	vegard.nossum@oracle.com, dwmw@amazon.co.uk,
	linux-doc@vger.kernel.org, rdunlap@infradead.org, kees@kernel.org,
	hpa@zytor.com, peterz@infradead.org, geert@linux-m68k.org

On Tue, 2025-10-07 at 15:52 -0700, Sohil Mehta wrote:
> > 
> > If indeed we should add a check, it should probably go in one of the later
> > patches and not this one.
> 
> We already check CS before calling emulate_vsyscall_gp().
> 
> if (user_mode(regs)) {
> 
> ...
> 	if (emulate_vsyscall_gp(regs))
> 		goto exit;
> 
> ...
> }

Ah, right, I missed that. But in the new code, the way to get there is by taking
a GP with an RIP in the vsyscall range. Does this seem a bit stale though?

	/*
	 * No point in checking CS -- the only way to get here is a user mode
	 * trap to a high address, which means that we're in 64-bit user code.
	 */

For one, "No point in checking CS", while true kind of implies that CS wasn't
already checked. The second half I guess is still true if you call the fetch #GP
a trap, and actually maybe more accurate for LASS then it was for the older
paradigm with the "high address" verbiage.

I'm fine either way.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 09/15] x86/traps: Consolidate user fixups in exc_general_protection()
  2025-10-07 22:41     ` Sohil Mehta
@ 2025-10-08 17:43       ` Edgecombe, Rick P
  0 siblings, 0 replies; 74+ messages in thread
From: Edgecombe, Rick P @ 2025-10-08 17:43 UTC (permalink / raw)
  To: Mehta, Sohil, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On Tue, 2025-10-07 at 15:41 -0700, Sohil Mehta wrote:
> On 10/7/2025 10:46 AM, Edgecombe, Rick P wrote:
> > On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> > > Move the UMIP exception fixup along with the other user mode fixups,
> > > that is, under the common "if (user_mode(regs))" condition where the
> > > rest of the fixups reside.
> > 
> > Can you mention that it also drops static_cpu_has(X86_FEATURE_UMIP) check
> > because fixup_umip_exception() already checks
> > cpu_feature_enabled(X86_FEATURE_UMIP)?
> > 
> 
> There is no existing check. The current patch moves the X86_FEATURE_UMIP
> check to fixup_umip_exception().

Doh!

> 
> I can add a sentence to say that the current check is split into two
> separate locations. But, is it not obvious from the diff?

The log is pretty thin, I'd add it.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-08 16:22       ` Edgecombe, Rick P
@ 2025-10-10 17:10         ` Sohil Mehta
  0 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-10 17:10 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On 10/8/2025 9:22 AM, Edgecombe, Rick P wrote:
> Honestly, just unconditionally doing stac/clac doesn't sound that bad to me. We
> already unconditionally enable SMAP, right? If there was some big slowdown for a
> single copy, people would want an option to disable it. And with text patching
> it's part a heavier operation already.
> 
> Was there previous feedback on that option?
> 

Yes. Boris had expressed some concern about the extra toggles.

Dave and PeterZ mainly wanted to keep it separate for code isolation and
better understanding.

https://lore.kernel.org/lkml/7bbf9cae-6392-47a4-906c-7c27b1b1223d@intel.com/

I'll leave them as separate.
>> The issues are interdependent. We need the STAC/CLAC because text poking
>> accesses special memory. We require the inline memcpy/memset because we
>> have now added the STAC/CLAC usage and objtool guards against the
>> potential misuse of STAC/CLAC.
>>
>> Were you looking for any specific change to the wording?
> 
> Ah ok, but the compiler could have always uninlined the existing memcpy calls
> right? So there is an existing theoretical problem, I would think.
> 

What theoretical problem?

The existing text_poke_memcpy() is a wrapper over the kernel standard
memcpy(). That is an exported function call which shouldn't be inlined
(or uninlined), right?




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive
  2025-10-08 16:52       ` Edgecombe, Rick P
@ 2025-10-10 19:03         ` Sohil Mehta
  0 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-10 19:03 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On 10/8/2025 9:52 AM, Edgecombe, Rick P wrote:
> And unlike the other bits that just got added easily, this
> one required infrastructure changes and extra patch. So wondered, hmm, is it
> worth it to do the extra patches?
> 

Yeah, I reconsidered its usefulness as well.

Though, adding it to CR4 pinned mask is a net positive, even with the
extra patch. As part of the pinning enforcement, we get the warning if
LASS is turned off by accident. The bit gets reprogrammed automatically
which, even if not useful, wouldn't harm in any way.

Also, the changes to defer CR pinning enforcement seem to be useful
independent of LASS. Plus, the patch turned out to be simpler than I had
imagined.

> Yea, I agree it would be good to get a warning. The write may be triggered
> accidentally by a kernel bug. I agree with the patch, but just commenting my
> reasoning for the sake of discussion. Maybe we can tighten the reasoning in the
> log.
Will do.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
  2025-10-08 17:36       ` Edgecombe, Rick P
@ 2025-10-10 20:45         ` Sohil Mehta
  2025-10-15 21:17           ` Sohil Mehta
  0 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-10 20:45 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On 10/8/2025 10:36 AM, Edgecombe, Rick P wrote:
>> cr4_init() is only called from APs during bring up. The pinned bits are
>> saved on the BSP and then used to program the CR4 on the APs. It is
>> independent of pinning *enforcement* which warns when these bits get
>> modified.
> 
> Sorry, still not following. How is it independent of CR pinning enforcement if
> the enforcement is still taking place in this function. And if we don't need to
> enforce pinning, why drop the branch?
> 

It depends on how we define "enforcement". The pinned bit verification
as well the warning happens in native_write_cr4().

When APs start, we need to program *a* CR4 value for it. Currently, with
early CR pinning, we use the saved pinned bits on the BSP along with
X86_CR4_PCIDE. Because cr4_init() is only called during boot, the static
branch is always going to be false with deferred pinning. Your
suggestion implies that we only use X86_CR4_PCIDE as the initial CR4
value. It could work, but I have some doubts because CR4 initialization
has had multiple issues in the past.

Not directly related but see commits:
7652ac920185 ("x86/asm: Move native_write_cr0/4() out of line")
c7ad5ad297e6 ("x86/mm/64: Initialize CR4.PCIDE early")

As we discussed in another thread, CR pinning has expanded beyond the
original security related bits. They have become bits that are never
expected to be modified once initialized. I wonder whether we could run
into issues if the initial CR4 value on the APs doesn't have one of the
pinned bits set. From a cursory look, everything should be fine (except
maybe FRED). I could give it a try.

But, is there a preference here? There is no additional cost of setting
the pinned bits because we definitely need to program X86_CR4_PCIDE. Do
we set the pinned bits along with that, or wait for the AP to go through
the init flow and set them one by one?



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
  2025-10-10 20:45         ` Sohil Mehta
@ 2025-10-15 21:17           ` Sohil Mehta
  0 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-15 21:17 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, bp@alien8.de, x86@kernel.org
  Cc: corbet@lwn.net, ardb@kernel.org, andrew.cooper3@citrix.com,
	alexander.shishkin@linux.intel.com, luto@kernel.org,
	david.laight.linux@gmail.com, jpoimboe@kernel.org, Luck, Tony,
	linux-efi@vger.kernel.org, kas@kernel.org, seanjc@google.com,
	dwmw@amazon.co.uk, rdunlap@infradead.org,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	geert@linux-m68k.org

On 10/10/2025 1:45 PM, Sohil Mehta wrote:
> 
> As we discussed in another thread, CR pinning has expanded beyond the
> original security related bits. They have become bits that are never
> expected to be modified once initialized. I wonder whether we could run
> into issues if the initial CR4 value on the APs doesn't have one of the
> pinned bits set. From a cursory look, everything should be fine (except
> maybe FRED). I could give it a try.
> 

I tried this. Getting rid of the cr4_pinned_bits setting during
cr_init() seems to be working fine.

Xin says that there may be an existing issue with FRED, as CR4.FRED is
set before programming the FRED config MSRs in
cpu_init_fred_exceptions(). Any exceptions during that brief window,
though unlikely, would cause a triple fault. I think not setting
CR4.FRED might help the issue, but, I am not sure. I'll let Xin or Peter
evaluate this.


> But, is there a preference here? There is no additional cost of setting
> the pinned bits because we definitely need to program X86_CR4_PCIDE. Do
> we set the pinned bits along with that, or wait for the AP to go through
> the init flow and set them one by one?
> 

As we are planning to defer CR pinning enforcement, I am inclining
towards getting rid of the following check in cr4_init().

	if (static_branch_likely(&cr_pinning))
		cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;

AFAIU, this change shouldn't harm FRED. Resolving the existing FRED
issue can be done in a separate patch.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07 20:20       ` Sohil Mehta
  2025-10-07 20:38         ` Edgecombe, Rick P
@ 2025-10-16  3:10         ` H. Peter Anvin
  1 sibling, 0 replies; 74+ messages in thread
From: H. Peter Anvin @ 2025-10-16  3:10 UTC (permalink / raw)
  To: Sohil Mehta, Dave Hansen, Edgecombe, Rick P,
	dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, peterz@infradead.org, linux-efi@vger.kernel.org,
	geert@linux-m68k.org, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org

On 2025-10-07 13:20, Sohil Mehta wrote:
> 
> The spec says,
> "A supervisor-mode data access causes a LASS violation if it would
> access a linear address of which bit 63 is 0, supervisor-mode access
> protection is enabled (by setting CR4.SMAP), and either RFLAGS.AC = 0 or
> the access is an implicit supervisor-mode access."
> 
> One could argue that the LASS hardware enforcement of the kernel data
> accesses *depends* on SMAP being enabled.
> 
>> Actually, it might be worth breaking this dependency hunk out into its
>> own patch, just so there's a nice clean place to discuss this.
> 
> Sure, we can talk about the above wording in the spec, as well as the
> STAC/CLAC dependency in a separate patch.
> 
> I included some information in the cover letter to explain that:
> 
> When there are valid reasons for the kernel to access memory in the user
> half, it can temporarily suspend LASS enforcement by toggling the
> RFLAGS.AC bit. Most of these cases are already covered today through the
> stac()/clac() pairs, which avoid SMAP violations. However, there are
> kernel usages, such as text poking, that access mappings (!_PAGE_USER)
> in the lower half of the address space. LASS-specific AC bit toggling is
> added for these cases.

Just to be clear: there is no reason to spend any time whatsoever on
supporting LASS without SMAP, because no such hardware is ever expected to
exist. The CPU feature dependencies are not all necessarily architectural, but
also Linux implementation choices -- Linux is in no way somehow required to be
optimized for every combination of features, and a lot of the time it makes
perfect sense to say "you don't have X, so I won't use Y either."

	-hpa


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-07  6:51 ` [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits Sohil Mehta
  2025-10-07 18:19   ` Edgecombe, Rick P
@ 2025-10-16 15:35   ` Borislav Petkov
  2025-10-21 18:03     ` Sohil Mehta
  1 sibling, 1 reply; 74+ messages in thread
From: Borislav Petkov @ 2025-10-16 15:35 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Jonathan Corbet,
	H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On Mon, Oct 06, 2025 at 11:51:05PM -0700, Sohil Mehta wrote:
> Link: https://download.vusec.net/papers/slam_sp24.pdf [1]

Just give the full paper name and people can search for it. Links tend to get
stale over time.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
  2025-10-07  6:51 ` [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall() Sohil Mehta
  2025-10-07 17:23   ` Edgecombe, Rick P
@ 2025-10-17 19:28   ` Sohil Mehta
  1 sibling, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-17 19:28 UTC (permalink / raw)
  To: x86, Dave Hansen, Thomas Gleixner, Rick Edgecombe, Kees Cook
  Cc: Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Xin Li,
	David Woodhouse, Sean Christopherson, Vegard Nossum,
	Andrew Cooper, David Laight, Randy Dunlap, Geert Uytterhoeven,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi,
	Ingo Molnar, Borislav Petkov

On 10/6/2025 11:51 PM, Sohil Mehta wrote:
> Save the pinned bits while initializing the boot CPU because they are
> needed later to program the value on APs when they come up.
> 

Because we are deferring CR pinning, there is no need to program the APs
with the pinned bits. The pinned bits would get enabled during AP bring
up like the rest of CR4 features that are not pinned. This patch can be
simplified to:

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 61ab332eaf73..d041f04c1969 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -476,8 +476,6 @@ void cr4_init(void)

        if (boot_cpu_has(X86_FEATURE_PCID))
                cr4 |= X86_CR4_PCIDE;
-       if (static_branch_likely(&cr_pinning))
-               cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;

        __write_cr4(cr4);

@@ -486,15 +484,17 @@ void cr4_init(void)
 }

 /*
- * Once CPU feature detection is finished (and boot params have been
- * parsed), record any of the sensitive CR bits that are set, and
- * enable CR pinning.
+ * Before userspace starts, record any of the sensitive CR bits that
+ * are set, and enable CR pinning.
  */
-static void __init setup_cr_pinning(void)
+static int __init setup_cr_pinning(void)
 {
        cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
        static_key_enable(&cr_pinning.key);
+
+       return 0;
 }
+late_initcall(setup_cr_pinning);

 static __init int x86_nofsgsbase_setup(char *arg)
 {
@@ -2119,7 +2119,6 @@ static __init void identify_boot_cpu(void)
        enable_sep_cpu();
 #endif
        cpu_detect_tlb(&boot_cpu_data);
-       setup_cr_pinning();

        tsx_init();
        tdx_init();


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 00/15] x86: Enable Linear Address Space Separation support
  2025-10-07 16:23 ` [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Edgecombe, Rick P
@ 2025-10-17 19:52   ` Sohil Mehta
  0 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-17 19:52 UTC (permalink / raw)
  To: Edgecombe, Rick P, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, x86@kernel.org, dave.hansen@linux.intel.com
  Cc: corbet@lwn.net, ardb@kernel.org, david.laight.linux@gmail.com,
	luto@kernel.org, jpoimboe@kernel.org, andrew.cooper3@citrix.com,
	Luck, Tony, alexander.shishkin@linux.intel.com, kas@kernel.org,
	seanjc@google.com, rdunlap@infradead.org, dwmw@amazon.co.uk,
	vegard.nossum@oracle.com, xin@zytor.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kees@kernel.org, hpa@zytor.com, peterz@infradead.org,
	linux-efi@vger.kernel.org, geert@linux-m68k.org

On 10/7/2025 9:23 AM, Edgecombe, Rick P wrote:
> There is also an expected harmless UABI change around SIG_SEGV. For a user mode
> kernel address range access, the kernel can deliver a signal that provides the
> exception type and the address. Before it was #PF, now a #GP with no address.
> 

That's right. With LASS, the SIG_SEGV will be delivered with without any
address information to userspace. But these are illegal accesses anyway,
so it probably doesn't matter much. I'll include a note in the patch
that updates the #GP handler.

Thanks Rick, for the review tags and general thoughts on the series.

Apart from the minor change in Patch 5 (regarding CR pinning), I have
mainly gathered improvements to the commit message and code comments.

Is there any other feedback?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 02/15] x86/asm: Introduce inline memcpy and memset
  2025-10-07  6:51 ` [PATCH v10 02/15] x86/asm: Introduce inline memcpy and memset Sohil Mehta
@ 2025-10-21 12:47   ` Borislav Petkov
  2025-10-21 13:48     ` David Laight
  2025-10-21 18:06     ` Sohil Mehta
  0 siblings, 2 replies; 74+ messages in thread
From: Borislav Petkov @ 2025-10-21 12:47 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Jonathan Corbet,
	H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On Mon, Oct 06, 2025 at 11:51:06PM -0700, Sohil Mehta wrote:
> From: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> 
> Provide inline memcpy and memset functions that can be used instead of
> the GCC builtins when necessary. The immediate use case is for the text
> poking functions to avoid the standard memcpy()/memset() calls within an
> RFLAGS.AC=1 context.

... because objtool does not allow function calls with AC=1 because... see
objtool/Documentation/objtool.txt, warning type 9, yadda yadda...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 02/15] x86/asm: Introduce inline memcpy and memset
  2025-10-21 12:47   ` Borislav Petkov
@ 2025-10-21 13:48     ` David Laight
  2025-10-21 18:06     ` Sohil Mehta
  1 sibling, 0 replies; 74+ messages in thread
From: David Laight @ 2025-10-21 13:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sohil Mehta, x86, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Xin Li,
	David Woodhouse, Sean Christopherson, Rick Edgecombe,
	Vegard Nossum, Andrew Cooper, Randy Dunlap, Geert Uytterhoeven,
	Kees Cook, Tony Luck, Alexander Shishkin, linux-doc, linux-kernel,
	linux-efi

On Tue, 21 Oct 2025 14:47:51 +0200
Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Oct 06, 2025 at 11:51:06PM -0700, Sohil Mehta wrote:
> > From: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> > 
> > Provide inline memcpy and memset functions that can be used instead of
> > the GCC builtins when necessary. The immediate use case is for the text
> > poking functions to avoid the standard memcpy()/memset() calls within an
> > RFLAGS.AC=1 context.  
> 
> ... because objtool does not allow function calls with AC=1 because... see
> objtool/Documentation/objtool.txt, warning type 9, yadda yadda...
> 

But for the purpose of code patching they don't need to be 'rep movsb'.
An inline function with a C byte copy loop is fine - provided you do
something to stop gcc pessimising it.

Obvious options are a volatile pointer (or READ_ONCE()) or a barrier().

	David

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits
  2025-10-16 15:35   ` Borislav Petkov
@ 2025-10-21 18:03     ` Sohil Mehta
  0 siblings, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-21 18:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Jonathan Corbet,
	H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On 10/16/2025 8:35 AM, Borislav Petkov wrote:
> On Mon, Oct 06, 2025 at 11:51:05PM -0700, Sohil Mehta wrote:
>> Link: https://download.vusec.net/papers/slam_sp24.pdf [1]
> 
> Just give the full paper name and people can search for it. Links tend to get
> stale over time.
> 

Sure, will do.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 02/15] x86/asm: Introduce inline memcpy and memset
  2025-10-21 12:47   ` Borislav Petkov
  2025-10-21 13:48     ` David Laight
@ 2025-10-21 18:06     ` Sohil Mehta
  1 sibling, 0 replies; 74+ messages in thread
From: Sohil Mehta @ 2025-10-21 18:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Jonathan Corbet,
	H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On 10/21/2025 5:47 AM, Borislav Petkov wrote:
> On Mon, Oct 06, 2025 at 11:51:06PM -0700, Sohil Mehta wrote:
>> From: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>>
>> Provide inline memcpy and memset functions that can be used instead of
>> the GCC builtins when necessary. The immediate use case is for the text
>> poking functions to avoid the standard memcpy()/memset() calls within an
>> RFLAGS.AC=1 context.
> 
> ... because objtool does not allow function calls with AC=1 because... see
> objtool/Documentation/objtool.txt, warning type 9, yadda yadda...
> 

Sure, will add some notes here as well as in the next patch where it is
used.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-07  6:51 ` [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
  2025-10-07 16:55   ` Edgecombe, Rick P
@ 2025-10-21 20:03   ` Borislav Petkov
  2025-10-21 20:55     ` Sohil Mehta
  2025-10-22  8:25     ` Peter Zijlstra
  1 sibling, 2 replies; 74+ messages in thread
From: Borislav Petkov @ 2025-10-21 20:03 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Jonathan Corbet,
	H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On Mon, Oct 06, 2025 at 11:51:07PM -0700, Sohil Mehta wrote:
> +static __always_inline void lass_clac(void)
> +{
> +	alternative("", "clac", X86_FEATURE_LASS);
> +}
> +
> +static __always_inline void lass_stac(void)
> +{
> +	alternative("", "stac", X86_FEATURE_LASS);
> +}

So I probably missed the whole discussion on how we arrived at
lass_{stac,clac}() but just in case, those names sound silly.

IOW, I'd do this ontop:

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 3ecb4b0de1f9..066d83a6b1ff 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -55,16 +55,8 @@ static __always_inline void stac(void)
  * Use lass_stac()/lass_clac() when accessing kernel mappings
  * (!_PAGE_USER) in the lower half of the address space.
  */
-
-static __always_inline void lass_clac(void)
-{
-	alternative("", "clac", X86_FEATURE_LASS);
-}
-
-static __always_inline void lass_stac(void)
-{
-	alternative("", "stac", X86_FEATURE_LASS);
-}
+#define lass_disable()		stac()
+#define lass_enable()		clac()
 
 static __always_inline unsigned long smap_save(void)
 {
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 6a96dbc60bf1..6cdf5c226c51 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2487,18 +2487,18 @@ __ro_after_init unsigned long text_poke_mm_addr;
 
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
-	lass_stac();
+	lass_disable();
 	__inline_memcpy(dst, src, len);
-	lass_clac();
+	lass_enable();
 }
 
 static void text_poke_memset(void *dst, const void *src, size_t len)
 {
 	int c = *(const int *)src;
 
-	lass_stac();
+	lass_disable();
 	__inline_memset(dst, c, len);
-	lass_clac();
+	lass_enable();
 }
 
 typedef void text_poke_f(void *dst, const void *src, size_t len);

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-21 20:03   ` Borislav Petkov
@ 2025-10-21 20:55     ` Sohil Mehta
  2025-10-22  9:56       ` Borislav Petkov
  2025-10-22  8:25     ` Peter Zijlstra
  1 sibling, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-21 20:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Jonathan Corbet,
	H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On 10/21/2025 1:03 PM, Borislav Petkov wrote:
> On Mon, Oct 06, 2025 at 11:51:07PM -0700, Sohil Mehta wrote:
>> +static __always_inline void lass_clac(void)
>> +{
>> +	alternative("", "clac", X86_FEATURE_LASS);
>> +}
>> +
>> +static __always_inline void lass_stac(void)
>> +{
>> +	alternative("", "stac", X86_FEATURE_LASS);
>> +}
> 
> So I probably missed the whole discussion on how we arrived at
> lass_{stac,clac}() but just in case, those names sound silly.
> 

I am okay with lass_enable()/lass_disable() if we can all agree on it.

PeterZ didn't like lass_enable_enforcement()/lass_enable_enforcement()
when it was proposed. But your suggestion is much shorter, so maybe it
would work for him.
https://lore.kernel.org/lkml/20250626134921.GK1613200@noisy.programming.kicks-ass.net/

Though, there is a slight semantic difference we need to be careful
about. LASS manages 2 types of kernel accesses: Data and Instruction fetch.

1) The STAC/CLAC only control the kernel *data* accesses into the lower
half.

2) CR4.LASS is what truly controls the entire mechanism. If an
instruction fetch needs to happen from a lower address, CR4.LASS must be
cleared to disable LASS completely. (See patch 6 and 7).

In the series, we directly write to the CR4 bits, so they don't have any
wrappers. But in the future, lass_enable()/lass_disable() could be
confusing if wrappers were added for the CR4 toggling.

> IOW, I'd do this ontop:
> 

> +#define lass_disable()		stac()
> +#define lass_enable()		clac()
>  

There is an issue here which you had originally objected to.
https://lore.kernel.org/lkml/20240710171836.GGZo7CbFJeZwLCZUAt@fat_crate.local/
https://lore.kernel.org/lkml/20240711012333.GAZo80FU30_x77otP4@fat_crate.local/

These new versions of lass_disable()/lass_enable() will toggle the AC
flag on older platforms without X86_FEATURE_LASS. It definitely makes
the code easier to read and maintain if we are okay with the minor
performance penalty.
>  static void text_poke_memcpy(void *dst, const void *src, size_t len)
>  {
> -	lass_stac();
> +	lass_disable();
>  	__inline_memcpy(dst, src, len);
> -	lass_clac();
> +	lass_enable();
>  }
>  
>  static void text_poke_memset(void *dst, const void *src, size_t len)
>  {
>  	int c = *(const int *)src;
>  
> -	lass_stac();
> +	lass_disable();
>  	__inline_memset(dst, c, len);
> -	lass_clac();
> +	lass_enable();
>  }

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-21 20:03   ` Borislav Petkov
  2025-10-21 20:55     ` Sohil Mehta
@ 2025-10-22  8:25     ` Peter Zijlstra
  2025-10-22  9:40       ` Borislav Petkov
  1 sibling, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2025-10-22  8:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sohil Mehta, x86, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On Tue, Oct 21, 2025 at 10:03:28PM +0200, Borislav Petkov wrote:
> On Mon, Oct 06, 2025 at 11:51:07PM -0700, Sohil Mehta wrote:
> > +static __always_inline void lass_clac(void)
> > +{
> > +	alternative("", "clac", X86_FEATURE_LASS);
> > +}
> > +
> > +static __always_inline void lass_stac(void)
> > +{
> > +	alternative("", "stac", X86_FEATURE_LASS);
> > +}
> 
> So I probably missed the whole discussion on how we arrived at
> lass_{stac,clac}() but just in case, those names sound silly.
> 

Initially the suggestion was to use stac/clac directly iirc; but that
looses the information these are for LASS only. Hence the LASS specific
ones.

(its an unfortunate arch detail that LASS and SMAP both use the AC flag
and all that)

> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
> index 3ecb4b0de1f9..066d83a6b1ff 100644
> --- a/arch/x86/include/asm/smap.h
> +++ b/arch/x86/include/asm/smap.h
> @@ -55,16 +55,8 @@ static __always_inline void stac(void)
>   * Use lass_stac()/lass_clac() when accessing kernel mappings
>   * (!_PAGE_USER) in the lower half of the address space.
>   */
> -
> -static __always_inline void lass_clac(void)
> -{
> -	alternative("", "clac", X86_FEATURE_LASS);
> -}
> -
> -static __always_inline void lass_stac(void)
> -{
> -	alternative("", "stac", X86_FEATURE_LASS);
> -}
> +#define lass_disable()		stac()
> +#define lass_enable()		clac()

But that's not the same, stac() and clac() are FEATURE_SMAP, these are
FEATURE_LASS.

If you really want the _disable _enable naming that's fine with me, but
then perhaps we should also s/clac/smap_disable/ and s/stac/smap_enable/
for consistency.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-22  8:25     ` Peter Zijlstra
@ 2025-10-22  9:40       ` Borislav Petkov
  2025-10-22 10:22         ` Peter Zijlstra
  0 siblings, 1 reply; 74+ messages in thread
From: Borislav Petkov @ 2025-10-22  9:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sohil Mehta, x86, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On Wed, Oct 22, 2025 at 10:25:41AM +0200, Peter Zijlstra wrote:
> Initially the suggestion was to use stac/clac directly iirc; but that
> looses the information these are for LASS only. Hence the LASS specific
> ones.

Yap.

> (its an unfortunate arch detail that LASS and SMAP both use the AC flag
> and all that)

That is an implementation detail and users of the interface shouldn't care.

> But that's not the same, stac() and clac() are FEATURE_SMAP, these are
> FEATURE_LASS.

So?

Are you thinking of toggling features and then something else getting disabled
in the process?

> If you really want the _disable _enable naming that's fine with me, but
> then perhaps we should also s/clac/smap_disable/ and s/stac/smap_enable/
> for consistency.

So the enable/disable thing is I think what makes this a lot more
understandable when you read it this way: "disable linear address separation
around this code". And that is regardless of how the underlying machinery does
that toggling of LASS.

As to stac/clac - I wouldn't touch them. They've been there forever so it'll
only be unnecessary churn.

Btw, if you need an example which already does that:

arch/x86/include/asm/uaccess.h:37:#define __uaccess_begin() stac()
arch/x86/include/asm/uaccess.h:38:#define __uaccess_end()   clac()

So the lass_{enable,disable} will be yet another incarnation of this pattern.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-21 20:55     ` Sohil Mehta
@ 2025-10-22  9:56       ` Borislav Petkov
  2025-10-22 19:49         ` Sohil Mehta
  0 siblings, 1 reply; 74+ messages in thread
From: Borislav Petkov @ 2025-10-22  9:56 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Jonathan Corbet,
	H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On Tue, Oct 21, 2025 at 01:55:51PM -0700, Sohil Mehta wrote:
> In the series, we directly write to the CR4 bits, so they don't have any
> wrappers. But in the future, lass_enable()/lass_disable() could be
> confusing if wrappers were added for the CR4 toggling.

Are you envisioning to export the CR4.LASS toggling to users like those two or
is former going to be done only at those two places?

Because CR4 toggling is expensive so you probably don't want to do that very
often.

> There is an issue here which you had originally objected to.
> https://lore.kernel.org/lkml/20240710171836.GGZo7CbFJeZwLCZUAt@fat_crate.local/
> https://lore.kernel.org/lkml/20240711012333.GAZo80FU30_x77otP4@fat_crate.local/
> 
> These new versions of lass_disable()/lass_enable() will toggle the AC
> flag on older platforms without X86_FEATURE_LASS. It definitely makes
> the code easier to read and maintain if we are okay with the minor
> performance penalty.

Hmm, we probably should measure that. The text poking should be a relatively
seldom operation but we should at least do a quick measurement to see whether
something registers on the radar...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-22  9:40       ` Borislav Petkov
@ 2025-10-22 10:22         ` Peter Zijlstra
  2025-10-22 10:52           ` Borislav Petkov
  0 siblings, 1 reply; 74+ messages in thread
From: Peter Zijlstra @ 2025-10-22 10:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Sohil Mehta, x86, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On Wed, Oct 22, 2025 at 11:40:19AM +0200, Borislav Petkov wrote:

> > But that's not the same, stac() and clac() are FEATURE_SMAP, these are
> > FEATURE_LASS.
> 
> So?

That's confusing. Just keep them separate alternatives.

> Are you thinking of toggling features and then something else getting disabled
> in the process?

I'm thinking that machines without LASS don't need the clac/stac in
these places. And when reading the asm, the FEATURE_LASS is a clue.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-22 10:22         ` Peter Zijlstra
@ 2025-10-22 10:52           ` Borislav Petkov
  0 siblings, 0 replies; 74+ messages in thread
From: Borislav Petkov @ 2025-10-22 10:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sohil Mehta, x86, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On Wed, Oct 22, 2025 at 12:22:53PM +0200, Peter Zijlstra wrote:
> I'm thinking that machines without LASS don't need the clac/stac in
> these places. And when reading the asm, the FEATURE_LASS is a clue.

Yeah, makes sense. The second alternative with LASS sounds like the optimal
thing.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-22  9:56       ` Borislav Petkov
@ 2025-10-22 19:49         ` Sohil Mehta
  2025-10-22 20:03           ` Luck, Tony
  0 siblings, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-22 19:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, Dave Hansen, Thomas Gleixner, Ingo Molnar, Jonathan Corbet,
	H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf, Peter Zijlstra,
	Ard Biesheuvel, Kirill A . Shutemov, Xin Li, David Woodhouse,
	Sean Christopherson, Rick Edgecombe, Vegard Nossum, Andrew Cooper,
	David Laight, Randy Dunlap, Geert Uytterhoeven, Kees Cook,
	Tony Luck, Alexander Shishkin, linux-doc, linux-kernel, linux-efi

On 10/22/2025 2:56 AM, Borislav Petkov wrote:
> On Tue, Oct 21, 2025 at 01:55:51PM -0700, Sohil Mehta wrote:
>> In the series, we directly write to the CR4 bits, so they don't have any
>> wrappers. But in the future, lass_enable()/lass_disable() could be
>> confusing if wrappers were added for the CR4 toggling.
> 
> Are you envisioning to export the CR4.LASS toggling to users like those two or
> is former going to be done only at those two places?
> 
> Because CR4 toggling is expensive so you probably don't want to do that very
> often.
> 

I agree. My expectation is that those won't grow much beyond the
existing ones.

My understanding from your discussion with PeterZ is that we would use
lass_enable()/_disable() with the LASS alternatives but leave the
existing stac()/clac() as-is.

Below is the updated patch with the rename and the text to clarify usages.

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 4f84d421d1cf..4f4a4e0efff5 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -23,18 +23,52 @@

 #else /* __ASSEMBLER__ */

+/*
+ * The CLAC/STAC instructions toggle the enforcement of
+ * X86_FEATURE_SMAP along with X86_FEATURE_LASS.
+ *
+ * SMAP enforcement is based on the _PAGE_BIT_USER bit in the page
+ * tables. The kernel is not allowed to touch pages with the bit set
+ * unless the AC bit is set.
+ *
+ * Use stac()/clac() when accessing userspace (_PAGE_USER) mappings,
+ * regardless of location.
+ *
+ * Note: a barrier is implicit in alternative().
+ */
+
 static __always_inline void clac(void)
 {
-	/* Note: a barrier is implicit in alternative() */
 	alternative("", "clac", X86_FEATURE_SMAP);
 }

 static __always_inline void stac(void)
 {
-	/* Note: a barrier is implicit in alternative() */
 	alternative("", "stac", X86_FEATURE_SMAP);
 }

+/*
+ * LASS enforcement is based on bit 63 of the virtual address. The
+ * kernel is not allowed to touch memory in the lower half of the
+ * virtual address space unless the AC bit is set.
+ *
+ * Use lass_disable()/lass_enable() when accessing kernel (!_PAGE_USER)
+ * mappings in the lower half of the address space that are blocked by
+ * LASS, but not by SMAP.
+ *
+ * Note: a barrier is implicit in alternative().
+ */
+
+static __always_inline void lass_enable(void)
+{
+	alternative("", "clac", X86_FEATURE_LASS);
+}
+
+static __always_inline void lass_disable(void)
+{
+	alternative("", "stac", X86_FEATURE_LASS);
+}
+
 static __always_inline unsigned long smap_save(void)
 {
 	unsigned long flags;
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8ee5ff547357..b38dbf08d5cd 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2469,16 +2469,30 @@ void __init_or_module text_poke_early(void
*addr, const void *opcode,
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;

+/*
+ * Text poking creates and uses a mapping in the lower half of the
+ * address space. Relax LASS enforcement when accessing the poking
+ * address.
+ *
+ * objtool enforces a strict policy of "no function calls within AC=1
+ * regions". Adhere to the policy by using inline versions of
+ * memcpy()/memset() that will never result in a function call.
+ */
+
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
-	memcpy(dst, src, len);
+	lass_disable();
+	__inline_memcpy(dst, src, len);
+	lass_enable();
 }

 static void text_poke_memset(void *dst, const void *src, size_t len)
 {
 	int c = *(const int *)src;

-	memset(dst, c, len);
+	lass_disable();
+	__inline_memset(dst, c, len);
+	lass_enable();
 }

 typedef void text_poke_f(void *dst, const void *src, size_t len);


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* RE: [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-10-22 19:49         ` Sohil Mehta
@ 2025-10-22 20:03           ` Luck, Tony
  0 siblings, 0 replies; 74+ messages in thread
From: Luck, Tony @ 2025-10-22 20:03 UTC (permalink / raw)
  To: Mehta, Sohil, Borislav Petkov
  Cc: x86@kernel.org, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Jonathan Corbet, H . Peter Anvin, Andy Lutomirski, Josh Poimboeuf,
	Peter Zijlstra, Ard Biesheuvel, Kirill A . Shutemov, Xin Li,
	David Woodhouse, Sean Christopherson, Edgecombe, Rick P,
	Vegard Nossum, andrew.cooper3@citrix.com, David Laight,
	Randy Dunlap, Geert Uytterhoeven, Kees Cook, Alexander Shishkin,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-efi@vger.kernel.org

> +/*
> + * LASS enforcement is based on bit 63 of the virtual address. The
> + * kernel is not allowed to touch memory in the lower half of the
> + * virtual address space unless the AC bit is set.
> + *
> + * Use lass_disable()/lass_enable() when accessing kernel (!_PAGE_USER)
> + * mappings in the lower half of the address space that are blocked by
> + * LASS, but not by SMAP.

Maybe "when accessing kernel data ..."

Also add that for instruction fetch CR4.LASS must be cleared.

-Tony

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-07 18:48     ` Dave Hansen
  2025-10-07 19:53       ` Edgecombe, Rick P
@ 2025-10-30 16:58       ` Andy Lutomirski
  2025-10-30 17:22         ` H. Peter Anvin
  2025-10-30 19:28         ` Sohil Mehta
  1 sibling, 2 replies; 74+ messages in thread
From: Andy Lutomirski @ 2025-10-30 16:58 UTC (permalink / raw)
  To: Dave Hansen, Rick P Edgecombe, Sohil Mehta, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, the arch/x86 maintainers,
	Dave Hansen
  Cc: Jonathan Corbet, Ard Biesheuvel, david.laight.linux@gmail.com,
	jpoimboe@kernel.org, Andrew Cooper, Tony Luck, Alexander Shishkin,
	Kirill A . Shutemov, Sean Christopherson, Randy Dunlap,
	David Woodhouse, Vegard Nossum, Xin Li, Linux Kernel Mailing List,
	linux-doc@vger.kernel.org, Kees Cook, H. Peter Anvin,
	Peter Zijlstra (Intel), linux-efi@vger.kernel.org,
	Geert Uytterhoeven



On Tue, Oct 7, 2025, at 11:48 AM, Dave Hansen wrote:
> On 10/7/25 11:37, Edgecombe, Rick P wrote:
>>>  	/*
>>>  	 * No point in checking CS -- the only way to get here is a user mode
>>>  	 * trap to a high address, which means that we're in 64-bit user code.
>> I don't know. Is this as true any more? We are now sometimes guessing based on
>> regs->ip of a #GP. What if the kernel accidentally tries to jump to the vsyscall
>> address? Then we are reading the kernel stack and strange things. Maybe it's
>> worth replacing the comment with a check? Feel free to call this paranoid.
>
> The first check in emulate_vsyscall() is:
>
>        /* Write faults or kernel-privilege faults never get fixed up. */
>        if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
>                return false;
>
> If the kernel jumped to the vsyscall page, it would end up there, return
> false, and never reach the code near the "No point in checking CS" comment.
>
> Right? Or am I misunderstanding the scenario you're calling out?
>
> If I'm understanding it right, I'd be a bit reluctant to add a CS check
> as well.

IMO it should boil down to exactly the same thing as the current code for the #PF case and, for #GP, there are two logical conditions that we care about:

1. Are we in user mode?

2. Are we using a 64-bit CS such that vsyscall emulation makes sense.

Now I'd be a tiny bit surprised if a CPU allows you to lretq or similar to a 32-bit CS with >2^63 RIP, but what do I know?  One could test this on a variety of machines, both Intel and AMD, to see what actually happens.

But the kernel wraps all this up as user_64bit_mode(regs).  If user_64bit_mode(regs) is true and RIP points to a vsyscall, then ISTM there aren't a whole lot of options.  Somehow we're in user mode, either via an exit from kernel mode or via CALL/JMP/whatever from user mode, and RIP is pointing at the vsyscall page, and CS is such that, in the absence of LASS, we would execute the vsyscall.  I suppose the #GP could be from some other cause than a LASS violation, but that doesn't seem worth worrying about.

So I think all that's needed is to update "[PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP" to check user_64bit_mode(regs) for the vsyscall case.  (As submitted, unless I missed something while composing the patches in my head, it's only checking user_mode(regs), and I think it's worth the single extra line of code to make the result a tiny bit more robust.)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-30 16:58       ` Andy Lutomirski
@ 2025-10-30 17:22         ` H. Peter Anvin
  2025-10-30 17:35           ` Andy Lutomirski
  2025-10-30 19:28         ` Sohil Mehta
  1 sibling, 1 reply; 74+ messages in thread
From: H. Peter Anvin @ 2025-10-30 17:22 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Hansen, Rick P Edgecombe, Sohil Mehta,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, Dave Hansen
  Cc: Jonathan Corbet, Ard Biesheuvel, david.laight.linux@gmail.com,
	jpoimboe@kernel.org, Andrew Cooper, Tony Luck, Alexander Shishkin,
	Kirill A . Shutemov, Sean Christopherson, Randy Dunlap,
	David Woodhouse, Vegard Nossum, Xin Li, Linux Kernel Mailing List,
	linux-doc@vger.kernel.org, Kees Cook, Peter Zijlstra (Intel),
	linux-efi@vger.kernel.org, Geert Uytterhoeven

On October 30, 2025 9:58:02 AM PDT, Andy Lutomirski <luto@kernel.org> wrote:
>
>
>On Tue, Oct 7, 2025, at 11:48 AM, Dave Hansen wrote:
>> On 10/7/25 11:37, Edgecombe, Rick P wrote:
>>>>  	/*
>>>>  	 * No point in checking CS -- the only way to get here is a user mode
>>>>  	 * trap to a high address, which means that we're in 64-bit user code.
>>> I don't know. Is this as true any more? We are now sometimes guessing based on
>>> regs->ip of a #GP. What if the kernel accidentally tries to jump to the vsyscall
>>> address? Then we are reading the kernel stack and strange things. Maybe it's
>>> worth replacing the comment with a check? Feel free to call this paranoid.
>>
>> The first check in emulate_vsyscall() is:
>>
>>        /* Write faults or kernel-privilege faults never get fixed up. */
>>        if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
>>                return false;
>>
>> If the kernel jumped to the vsyscall page, it would end up there, return
>> false, and never reach the code near the "No point in checking CS" comment.
>>
>> Right? Or am I misunderstanding the scenario you're calling out?
>>
>> If I'm understanding it right, I'd be a bit reluctant to add a CS check
>> as well.
>
>IMO it should boil down to exactly the same thing as the current code for the #PF case and, for #GP, there are two logical conditions that we care about:
>
>1. Are we in user mode?
>
>2. Are we using a 64-bit CS such that vsyscall emulation makes sense.
>
>Now I'd be a tiny bit surprised if a CPU allows you to lretq or similar to a 32-bit CS with >2^63 RIP, but what do I know?  One could test this on a variety of machines, both Intel and AMD, to see what actually happens.
>
>But the kernel wraps all this up as user_64bit_mode(regs).  If user_64bit_mode(regs) is true and RIP points to a vsyscall, then ISTM there aren't a whole lot of options.  Somehow we're in user mode, either via an exit from kernel mode or via CALL/JMP/whatever from user mode, and RIP is pointing at the vsyscall page, and CS is such that, in the absence of LASS, we would execute the vsyscall.  I suppose the #GP could be from some other cause than a LASS violation, but that doesn't seem worth worrying about.
>
>So I think all that's needed is to update "[PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP" to check user_64bit_mode(regs) for the vsyscall case.  (As submitted, unless I missed something while composing the patches in my head, it's only checking user_mode(regs), and I think it's worth the single extra line of code to make the result a tiny bit more robust.)

user_64bit_mode() is a CS check :)

There is that one extra check for PARAVIRT_XXL that *could* be gotten rid of by making the PV code report its 64-bit selector and patching it into the test, but it is on the error path anyway...


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-30 17:22         ` H. Peter Anvin
@ 2025-10-30 17:35           ` Andy Lutomirski
  0 siblings, 0 replies; 74+ messages in thread
From: Andy Lutomirski @ 2025-10-30 17:35 UTC (permalink / raw)
  To: H. Peter Anvin, Dave Hansen, Rick P Edgecombe, Sohil Mehta,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, Dave Hansen
  Cc: Jonathan Corbet, Ard Biesheuvel, david.laight.linux@gmail.com,
	jpoimboe@kernel.org, Andrew Cooper, Tony Luck, Alexander Shishkin,
	Kirill A . Shutemov, Sean Christopherson, Randy Dunlap,
	David Woodhouse, Vegard Nossum, Xin Li, Linux Kernel Mailing List,
	linux-doc@vger.kernel.org, Kees Cook, Peter Zijlstra (Intel),
	linux-efi@vger.kernel.org, Geert Uytterhoeven



On Thu, Oct 30, 2025, at 10:22 AM, H. Peter Anvin wrote:
> On October 30, 2025 9:58:02 AM PDT, Andy Lutomirski <luto@kernel.org> wrote:
>>
>>
>>On Tue, Oct 7, 2025, at 11:48 AM, Dave Hansen wrote:
>>> On 10/7/25 11:37, Edgecombe, Rick P wrote:
>>>>>  	/*
>>>>>  	 * No point in checking CS -- the only way to get here is a user mode
>>>>>  	 * trap to a high address, which means that we're in 64-bit user code.
>>>> I don't know. Is this as true any more? We are now sometimes guessing based on
>>>> regs->ip of a #GP. What if the kernel accidentally tries to jump to the vsyscall
>>>> address? Then we are reading the kernel stack and strange things. Maybe it's
>>>> worth replacing the comment with a check? Feel free to call this paranoid.
>>>
>>> The first check in emulate_vsyscall() is:
>>>
>>>        /* Write faults or kernel-privilege faults never get fixed up. */
>>>        if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
>>>                return false;
>>>
>>> If the kernel jumped to the vsyscall page, it would end up there, return
>>> false, and never reach the code near the "No point in checking CS" comment.
>>>
>>> Right? Or am I misunderstanding the scenario you're calling out?
>>>
>>> If I'm understanding it right, I'd be a bit reluctant to add a CS check
>>> as well.
>>
>>IMO it should boil down to exactly the same thing as the current code for the #PF case and, for #GP, there are two logical conditions that we care about:
>>
>>1. Are we in user mode?
>>
>>2. Are we using a 64-bit CS such that vsyscall emulation makes sense.
>>
>>Now I'd be a tiny bit surprised if a CPU allows you to lretq or similar to a 32-bit CS with >2^63 RIP, but what do I know?  One could test this on a variety of machines, both Intel and AMD, to see what actually happens.
>>
>>But the kernel wraps all this up as user_64bit_mode(regs).  If user_64bit_mode(regs) is true and RIP points to a vsyscall, then ISTM there aren't a whole lot of options.  Somehow we're in user mode, either via an exit from kernel mode or via CALL/JMP/whatever from user mode, and RIP is pointing at the vsyscall page, and CS is such that, in the absence of LASS, we would execute the vsyscall.  I suppose the #GP could be from some other cause than a LASS violation, but that doesn't seem worth worrying about.
>>
>>So I think all that's needed is to update "[PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP" to check user_64bit_mode(regs) for the vsyscall case.  (As submitted, unless I missed something while composing the patches in my head, it's only checking user_mode(regs), and I think it's worth the single extra line of code to make the result a tiny bit more robust.)
>
> user_64bit_mode() is a CS check :)
>
> There is that one extra check for PARAVIRT_XXL that *could* be gotten 
> rid of by making the PV code report its 64-bit selector and patching it 
> into the test, but it is on the error path anyway...

In the hopefully unlikely event that anyone cares about #GP performance, they should probably care far, far more about the absurd PASID fix up than anything else :)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-30 16:58       ` Andy Lutomirski
  2025-10-30 17:22         ` H. Peter Anvin
@ 2025-10-30 19:28         ` Sohil Mehta
  2025-10-30 21:37           ` David Laight
  1 sibling, 1 reply; 74+ messages in thread
From: Sohil Mehta @ 2025-10-30 19:28 UTC (permalink / raw)
  To: Andy Lutomirski, Dave Hansen, Rick P Edgecombe, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, the arch/x86 maintainers,
	Dave Hansen
  Cc: Jonathan Corbet, Ard Biesheuvel, david.laight.linux@gmail.com,
	jpoimboe@kernel.org, Andrew Cooper, Tony Luck, Alexander Shishkin,
	Kirill A . Shutemov, Sean Christopherson, Randy Dunlap,
	David Woodhouse, Vegard Nossum, Xin Li, Linux Kernel Mailing List,
	linux-doc@vger.kernel.org, Kees Cook, H. Peter Anvin,
	Peter Zijlstra (Intel), linux-efi@vger.kernel.org,
	Geert Uytterhoeven

Thank you for taking a look at these patches.

On 10/30/2025 9:58 AM, Andy Lutomirski wrote:

> So I think all that's needed is to update "[PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP" to check user_64bit_mode(regs) for the vsyscall case.  (As submitted, unless I missed something while composing the patches in my head, it's only checking user_mode(regs), and I think it's worth the single extra line of code to make the result a tiny bit more robust.)

I probably don't understand all the nuances here. But, the goal of the
check seems to ensure a 32-bit process running on a 64-bit kernel
doesn't ever go through this vsyscall emulation code, right?

I guess a user_64bit_mode(regs) check wouldn't harm. I'll add it when
the vsyscall series is posted.





^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code
  2025-10-30 19:28         ` Sohil Mehta
@ 2025-10-30 21:37           ` David Laight
  0 siblings, 0 replies; 74+ messages in thread
From: David Laight @ 2025-10-30 21:37 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Dave Hansen, Rick P Edgecombe, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, the arch/x86 maintainers,
	Dave Hansen, Jonathan Corbet, Ard Biesheuvel, jpoimboe@kernel.org,
	Andrew Cooper, Tony Luck, Alexander Shishkin, Kirill A . Shutemov,
	Sean Christopherson, Randy Dunlap, David Woodhouse, Vegard Nossum,
	Xin Li, Linux Kernel Mailing List, linux-doc@vger.kernel.org,
	Kees Cook, H. Peter Anvin, Peter Zijlstra (Intel),
	linux-efi@vger.kernel.org, Geert Uytterhoeven

On Thu, 30 Oct 2025 12:28:52 -0700
Sohil Mehta <sohil.mehta@intel.com> wrote:

> Thank you for taking a look at these patches.
> 
> On 10/30/2025 9:58 AM, Andy Lutomirski wrote:
> 
> > So I think all that's needed is to update "[PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP" to check user_64bit_mode(regs) for the vsyscall case.  (As submitted, unless I missed something while composing the patches in my head, it's only checking user_mode(regs), and I think it's worth the single extra line of code to make the result a tiny bit more robust.)  
> 
> I probably don't understand all the nuances here. But, the goal of the
> check seems to ensure a 32-bit process running on a 64-bit kernel
> doesn't ever go through this vsyscall emulation code, right?

Do remember that there is no such thing as a '32-bit process'.
Changing to/from 'long mode' isn't privileged.
OTOH in 32-bit mode you can't generate an address above 4G.
(But I've no idea if the high register bits get cleared before the register
is modified.)

	David

> 
> I guess a user_64bit_mode(regs) check wouldn't harm. I'll add it when
> the vsyscall series is posted.
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2025-10-30 21:37 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-07  6:51 [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 01/15] x86/cpu: Enumerate the LASS feature bits Sohil Mehta
2025-10-07 18:19   ` Edgecombe, Rick P
2025-10-07 18:28     ` Dave Hansen
2025-10-07 20:20       ` Sohil Mehta
2025-10-07 20:38         ` Edgecombe, Rick P
2025-10-07 20:53           ` Sohil Mehta
2025-10-16  3:10         ` H. Peter Anvin
2025-10-07 20:49     ` Sohil Mehta
2025-10-07 23:16       ` Xin Li
2025-10-08 16:00         ` Edgecombe, Rick P
2025-10-16 15:35   ` Borislav Petkov
2025-10-21 18:03     ` Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 02/15] x86/asm: Introduce inline memcpy and memset Sohil Mehta
2025-10-21 12:47   ` Borislav Petkov
2025-10-21 13:48     ` David Laight
2025-10-21 18:06     ` Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 03/15] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
2025-10-07 16:55   ` Edgecombe, Rick P
2025-10-07 22:28     ` Sohil Mehta
2025-10-08 16:22       ` Edgecombe, Rick P
2025-10-10 17:10         ` Sohil Mehta
2025-10-21 20:03   ` Borislav Petkov
2025-10-21 20:55     ` Sohil Mehta
2025-10-22  9:56       ` Borislav Petkov
2025-10-22 19:49         ` Sohil Mehta
2025-10-22 20:03           ` Luck, Tony
2025-10-22  8:25     ` Peter Zijlstra
2025-10-22  9:40       ` Borislav Petkov
2025-10-22 10:22         ` Peter Zijlstra
2025-10-22 10:52           ` Borislav Petkov
2025-10-07  6:51 ` [PATCH v10 04/15] x86/cpu: Set LASS CR4 bit as pinning sensitive Sohil Mehta
2025-10-07 18:24   ` Edgecombe, Rick P
2025-10-07 23:11     ` Sohil Mehta
2025-10-08 16:52       ` Edgecombe, Rick P
2025-10-10 19:03         ` Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall() Sohil Mehta
2025-10-07 17:23   ` Edgecombe, Rick P
2025-10-07 23:05     ` Sohil Mehta
2025-10-08 17:36       ` Edgecombe, Rick P
2025-10-10 20:45         ` Sohil Mehta
2025-10-15 21:17           ` Sohil Mehta
2025-10-17 19:28   ` Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 06/15] x86/efi: Disable LASS while mapping the EFI runtime services Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 07/15] x86/kexec: Disable LASS during relocate kernel Sohil Mehta
2025-10-07 17:43   ` Edgecombe, Rick P
2025-10-07 22:33     ` Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 08/15] x86/vsyscall: Reorganize the page fault emulation code Sohil Mehta
2025-10-07 18:37   ` Edgecombe, Rick P
2025-10-07 18:48     ` Dave Hansen
2025-10-07 19:53       ` Edgecombe, Rick P
2025-10-07 22:52         ` Sohil Mehta
2025-10-08 17:42           ` Edgecombe, Rick P
2025-10-30 16:58       ` Andy Lutomirski
2025-10-30 17:22         ` H. Peter Anvin
2025-10-30 17:35           ` Andy Lutomirski
2025-10-30 19:28         ` Sohil Mehta
2025-10-30 21:37           ` David Laight
2025-10-07  6:51 ` [PATCH v10 09/15] x86/traps: Consolidate user fixups in exc_general_protection() Sohil Mehta
2025-10-07 17:46   ` Edgecombe, Rick P
2025-10-07 22:41     ` Sohil Mehta
2025-10-08 17:43       ` Edgecombe, Rick P
2025-10-07  6:51 ` [PATCH v10 10/15] x86/vsyscall: Add vsyscall emulation for #GP Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 11/15] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Sohil Mehta
2025-10-07 18:43   ` Edgecombe, Rick P
2025-10-07  6:51 ` [PATCH v10 12/15] x86/traps: Communicate a LASS violation in #GP message Sohil Mehta
2025-10-07 18:07   ` Edgecombe, Rick P
2025-10-07  6:51 ` [PATCH v10 13/15] x86/traps: Generalize #GP address decode and hint code Sohil Mehta
2025-10-07 18:43   ` Edgecombe, Rick P
2025-10-07  6:51 ` [PATCH v10 14/15] x86/traps: Provide additional hints for a kernel stack segment fault Sohil Mehta
2025-10-07  6:51 ` [PATCH v10 15/15] x86/cpu: Enable LASS by default during CPU initialization Sohil Mehta
2025-10-07 18:42   ` Edgecombe, Rick P
2025-10-07 16:23 ` [PATCH v10 00/15] x86: Enable Linear Address Space Separation support Edgecombe, Rick P
2025-10-17 19:52   ` Sohil Mehta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).