* [PATCH v5 00/16] Enable Linear Address Space Separation support
@ 2024-10-28 16:07 Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 01/16] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
` (16 more replies)
0 siblings, 17 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
Changes from v4[8]:
- Added PeterZ's Originally-by and SoB to 2/16
- Added lass_clac()/lass_stac() to differentiate from SMAP necessitated
clac()/stac() and to be NOPs on CPUs that don't support LASS
- Moved LASS enabling patch to the end to avoid rendering machines
unbootable between until the patch that disables LASS around EFI
initialization
- Reverted Pawan's LAM disabling commit
Changes from v3[6]:
- Made LAM dependent on LASS
- Moved EFI runtime initialization to x86 side of things
- Suspended LASS validation around EFI set_virtual_address_map call
- Added a message for the case of kernel side LASS violation
- Moved inline memset/memcpy versions to the common string.h
Changes from v2[5]:
- Added myself to the SoB chain
Changes from v1[1]:
- Emulate vsyscall violations in execute mode in the #GP fault handler
- Use inline memcpy and memset while patching alternatives
- Remove CONFIG_X86_LASS
- Make LASS depend on SMAP
- Dropped the minimal KVM enabling patch
Linear Address Space Separation (LASS) is a security feature that intends to
prevent malicious virtual address space accesses across user/kernel mode.
Such mode based access protection already exists today with paging and features
such as SMEP and SMAP. However, to enforce these protections, the processor
must traverse the paging structures in memory. Malicious software can use
timing information resulting from this traversal to determine details about the
paging structures, and these details may also be used to determine the layout
of the kernel memory.
The LASS mechanism provides the same mode-based protections as paging but
without traversing the paging structures. Because the protections enforced by
LASS are applied before paging, software will not be able to derive
paging-based timing information from the various caching structures such as the
TLBs, mid-level caches, page walker, data caches, etc. LASS can avoid probing
using double page faults, TLB flush and reload, and SW prefetch instructions.
See [2], [3] and [4] for some research on the related attack vectors.
In addition, LASS prevents an attack vector described in a Spectre LAM (SLAM)
whitepaper [7].
LASS enforcement relies on the typical kernel implemetation to divide the
64-bit virtual address space into two halves:
Addr[63]=0 -> User address space
Addr[63]=1 -> Kernel address space
Any data access or code execution across address spaces typically results in a
#GP fault.
Kernel accesses usually only happen to the kernel address space. However, there
are valid reasons for kernel to access memory in the user half. For these cases
(such as text poking and EFI runtime accesses), the kernel can temporarily
suspend the enforcement of LASS by toggling SMAP (Supervisor Mode Access
Prevention) using the stac()/clac() instructions and in one instance a downright
disabling LASS for an EFI runtime call.
User space cannot access any kernel address while LASS is enabled.
Unfortunately, legacy vsyscall functions are located in the address range
0xffffffffff600000 - 0xffffffffff601000 and emulated in kernel. To avoid
breaking user applications when LASS is enabled, extend the vsyscall emulation
in execute (XONLY) mode to the #GP fault handler.
In contrast, the vsyscall EMULATE mode is deprecated and not expected to be
used by anyone. Supporting EMULATE mode with LASS would need complex
intruction decoding in the #GP fault handler and is probably not worth the
hassle. Disable LASS in this rare case when someone absolutely needs and
enables vsyscall=emulate via the command line.
[1] https://lore.kernel.org/lkml/20230110055204.3227669-1-yian.chen@intel.com/
[2] “Practical Timing Side Channel Attacks against Kernel Space ASLR”,
https://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[3] “Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR”, http://doi.acm.org/10.1145/2976749.2978356
[4] “Harmful prefetch on Intel”, https://ioactive.com/harmful-prefetch-on-intel/ (H/T Anders)
[5] https://lore.kernel.org/all/20230530114247.21821-1-alexander.shishkin@linux.intel.com/
[6] https://lore.kernel.org/all/20230609183632.48706-1-alexander.shishkin@linux.intel.com/
[7] https://download.vusec.net/papers/slam_sp24.pdf
[8] https://lore.kernel.org/all/20240710160655.3402786-1-alexander.shishkin@linux.intel.com/
Alexander Shishkin (7):
init/main.c: Move EFI runtime service initialization to x86/cpu
x86/cpu: Defer CR pinning setup until after EFI initialization
efi: Disable LASS around set_virtual_address_map call
x86/vsyscall: Document the fact that vsyscall=emulate disables LASS
x86/traps: Communicate a LASS violation in #GP message
x86/cpu: Make LAM depend on LASS
Revert "x86/lam: Disable ADDRESS_MASKING in most cases"
Peter Zijlstra (1):
x86/asm: Introduce inline memcpy and memset
Sohil Mehta (7):
x86/cpu: Enumerate the LASS feature bits
x86/alternatives: Disable LASS when patching kernel alternatives
x86/vsyscall: Reorganize the #PF emulation code
x86/traps: Consolidate user fixups in exc_general_protection()
x86/vsyscall: Add vsyscall emulation for #GP
x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
x86/cpu: Enable LASS during CPU initialization
Yian Chen (1):
x86/cpu: Set LASS CR4 bit as pinning sensitive
.../admin-guide/kernel-parameters.txt | 4 +-
arch/x86/Kconfig | 1 -
arch/x86/entry/vsyscall/vsyscall_64.c | 61 +++++++++++++------
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 4 +-
arch/x86/include/asm/smap.h | 18 ++++++
arch/x86/include/asm/string.h | 26 ++++++++
arch/x86/include/asm/vsyscall.h | 14 +++--
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/alternative.c | 12 +++-
arch/x86/kernel/cpu/common.c | 25 +++++++-
arch/x86/kernel/cpu/cpuid-deps.c | 2 +
arch/x86/kernel/traps.c | 26 +++++---
arch/x86/mm/fault.c | 2 +-
arch/x86/platform/efi/efi.c | 13 ++++
init/main.c | 5 --
tools/arch/x86/include/asm/cpufeatures.h | 1 +
17 files changed, 171 insertions(+), 46 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v5 01/16] x86/cpu: Enumerate the LASS feature bits
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-29 14:55 ` Kirill A. Shutemov
2024-10-28 16:07 ` [PATCH v5 02/16] x86/asm: Introduce inline memcpy and memset Alexander Shishkin
` (15 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, Yian Chen
From: Sohil Mehta <sohil.mehta@intel.com>
Linear Address Space Separation (LASS) is a security feature that
intends to prevent malicious virtual address space accesses across
user/kernel mode.
Such mode based access protection already exists today with paging and
features such as SMEP and SMAP. However, to enforce these protections,
the processor must traverse the paging structures in memory. Malicious
software can use timing information resulting from this traversal to
determine details about the paging structures, and these details may
also be used to determine the layout of the kernel memory.
The LASS mechanism provides the same mode-based protections as paging
but without traversing the paging structures. Because the protections
enforced by LASS are applied before paging, software will not be able to
derive paging-based timing information from the various caching
structures such as the TLBs, mid-level caches, page walker, data caches,
etc.
LASS enforcement relies on the typical kernel implementation to divide
the 64-bit virtual address space into two halves:
Addr[63]=0 -> User address space
Addr[63]=1 -> Kernel address space
Any data access or code execution across address spaces typically
results in a #GP fault.
The LASS enforcement for kernel data access is dependent on CR4.SMAP
being set. The enforcement can be disabled by toggling the RFLAGS.AC bit
similar to SMAP.
Define the CPU feature bits to enumerate this feature and include
feature dependencies to reflect the same.
Co-developed-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 4 +++-
arch/x86/include/asm/smap.h | 18 ++++++++++++++++++
arch/x86/include/uapi/asm/processor-flags.h | 2 ++
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
tools/arch/x86/include/asm/cpufeatures.h | 1 +
6 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ea33439a5d00..acb3ccea2bd7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -319,6 +319,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* "avx_vnni" AVX VNNI instructions */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* "avx512_bf16" AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LASS (12*32+ 6) /* "lass" Linear Address Space Separation */
#define X86_FEATURE_CMPCCXADD (12*32+ 7) /* CMPccXADD instructions */
#define X86_FEATURE_ARCH_PERFMON_EXT (12*32+ 8) /* Intel Architectural PerfMon Extension */
#define X86_FEATURE_FZRM (12*32+10) /* Fast zero-length REP MOVSB */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index c492bdc97b05..76c7d362af94 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -22,12 +22,14 @@
# define DISABLE_CYRIX_ARR (1<<(X86_FEATURE_CYRIX_ARR & 31))
# define DISABLE_CENTAUR_MCR (1<<(X86_FEATURE_CENTAUR_MCR & 31))
# define DISABLE_PCID 0
+# define DISABLE_LASS 0
#else
# define DISABLE_VME 0
# define DISABLE_K6_MTRR 0
# define DISABLE_CYRIX_ARR 0
# define DISABLE_CENTAUR_MCR 0
# define DISABLE_PCID (1<<(X86_FEATURE_PCID & 31))
+# define DISABLE_LASS (1<<(X86_FEATURE_LASS & 31))
#endif /* CONFIG_X86_64 */
#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
@@ -146,7 +148,7 @@
#define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
#define DISABLED_MASK12 (DISABLE_FRED|DISABLE_LAM)
-#define DISABLED_MASK13 0
+#define DISABLED_MASK13 (DISABLE_LASS)
#define DISABLED_MASK14 0
#define DISABLED_MASK15 0
#define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index bab490379c65..8cb6f004800b 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -27,6 +27,12 @@
#else /* __ASSEMBLY__ */
+/*
+ * The CLAC/STAC instructions toggle enforcement of X86_FEATURE_SMAP.
+ * Add dedicated lass_*() variants for cases that are necessitated by
+ * LASS (X86_FEATURE_LASS) enforcement, which helps readability and
+ * avoids AC flag flipping on CPUs that don't support LASS.
+ */
static __always_inline void clac(void)
{
/* Note: a barrier is implicit in alternative() */
@@ -39,6 +45,18 @@ static __always_inline void stac(void)
alternative("", __ASM_STAC, X86_FEATURE_SMAP);
}
+static __always_inline void lass_clac(void)
+{
+ /* Note: a barrier is implicit in alternative() */
+ alternative("", __ASM_CLAC, X86_FEATURE_LASS);
+}
+
+static __always_inline void lass_stac(void)
+{
+ /* Note: a barrier is implicit in alternative() */
+ alternative("", __ASM_STAC, X86_FEATURE_LASS);
+}
+
static __always_inline unsigned long smap_save(void)
{
unsigned long flags;
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index f1a4adc78272..81d0c8bf1137 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -136,6 +136,8 @@
#define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT)
#define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement Technology */
#define X86_CR4_CET _BITUL(X86_CR4_CET_BIT)
+#define X86_CR4_LASS_BIT 27 /* enable Linear Address Space Separation support */
+#define X86_CR4_LASS _BITUL(X86_CR4_LASS_BIT)
#define X86_CR4_LAM_SUP_BIT 28 /* LAM for supervisor pointers */
#define X86_CR4_LAM_SUP _BITUL(X86_CR4_LAM_SUP_BIT)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 8bd84114c2d9..3f73c4b03348 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -83,6 +83,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{ X86_FEATURE_FRED, X86_FEATURE_LKGS },
+ { X86_FEATURE_LASS, X86_FEATURE_SMAP },
{}
};
diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
index 23698d0f4bb4..538930159f9f 100644
--- a/tools/arch/x86/include/asm/cpufeatures.h
+++ b/tools/arch/x86/include/asm/cpufeatures.h
@@ -319,6 +319,7 @@
/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX_VNNI (12*32+ 4) /* "avx_vnni" AVX VNNI instructions */
#define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* "avx512_bf16" AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_LASS (12*32+ 6) /* "lass" Linear Address Space Separation */
#define X86_FEATURE_CMPCCXADD (12*32+ 7) /* CMPccXADD instructions */
#define X86_FEATURE_ARCH_PERFMON_EXT (12*32+ 8) /* Intel Architectural PerfMon Extension */
#define X86_FEATURE_FZRM (12*32+10) /* Fast zero-length REP MOVSB */
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 02/16] x86/asm: Introduce inline memcpy and memset
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 01/16] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives Alexander Shishkin
` (14 subsequent siblings)
16 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
From: Peter Zijlstra <peterz@infradead.org>
Provide inline memcpy and memset functions that can be used instead of
the GCC builtins whenever necessary.
Originally-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y759AJ%2F0N9fqwDED@hirez.programming.kicks-ass.net/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/include/asm/string.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
index c3c2c1914d65..9cb5aae7fba9 100644
--- a/arch/x86/include/asm/string.h
+++ b/arch/x86/include/asm/string.h
@@ -1,6 +1,32 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_STRING_H
+#define _ASM_X86_STRING_H
+
#ifdef CONFIG_X86_32
# include <asm/string_32.h>
#else
# include <asm/string_64.h>
#endif
+
+static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
+{
+ void *ret = to;
+
+ asm volatile("rep movsb"
+ : "+D" (to), "+S" (from), "+c" (len)
+ : : "memory");
+ return ret;
+}
+
+static __always_inline void *__inline_memset(void *s, int v, size_t n)
+{
+ void *ret = s;
+
+ asm volatile("rep stosb"
+ : "+D" (s), "+c" (n)
+ : "a" ((uint8_t)v)
+ : "memory");
+ return ret;
+}
+
+#endif /* _ASM_X86_STRING_H */
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 01/16] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 02/16] x86/asm: Introduce inline memcpy and memset Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-28 17:49 ` Dave Hansen
2024-10-28 16:07 ` [PATCH v5 04/16] init/main.c: Move EFI runtime service initialization to x86/cpu Alexander Shishkin
` (13 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
From: Sohil Mehta <sohil.mehta@intel.com>
For patching, the kernel initializes a temporary mm area in the lower
half of the address range. See commit 4fc19708b165 ("x86/alternatives:
Initialize temporary mm for patching").
Disable LASS enforcement during patching using the stac()/clac()
instructions to avoid triggering a #GP fault.
The objtool warns due to a call to a non-allowed function that exists
outside of the stac/clac guard, or references to any function with a
dynamic function pointer inside the guard. See the Objtool warnings
section #9 in the document tools/objtool/Documentation/objtool.txt.
Considering that patching is usually small, replace the memcpy and
memset functions in the text poking functions with their inline versions
respectively.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/kernel/alternative.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d17518ca19b8..2dc097014c2d 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1841,16 +1841,24 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
__ro_after_init struct mm_struct *poking_mm;
__ro_after_init unsigned long poking_addr;
+/*
+ * poking_init() initializes the text poking address from the lower half of the
+ * address space. Relax LASS enforcement when accessing the poking address.
+ */
static void text_poke_memcpy(void *dst, const void *src, size_t len)
{
- memcpy(dst, src, len);
+ lass_stac();
+ __inline_memcpy(dst, src, len);
+ lass_clac();
}
static void text_poke_memset(void *dst, const void *src, size_t len)
{
int c = *(const int *)src;
- memset(dst, c, len);
+ lass_stac();
+ __inline_memset(dst, c, len);
+ lass_clac();
}
typedef void text_poke_f(void *dst, const void *src, size_t len);
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 04/16] init/main.c: Move EFI runtime service initialization to x86/cpu
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (2 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-29 22:35 ` Sohil Mehta
2024-10-30 7:36 ` Ard Biesheuvel
2024-10-28 16:07 ` [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization Alexander Shishkin
` (12 subsequent siblings)
16 siblings, 2 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
The EFI call in start_kernel() is guarded by #ifdef CONFIG_X86. Move
the thing to the arch_cpu_finalize_init() path on x86 and get rid of
the #ifdef in start_kernel().
No functional change intended.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/kernel/cpu/common.c | 7 +++++++
init/main.c | 5 -----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f41ab219cf1..b24ad418536e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -26,6 +26,7 @@
#include <linux/pgtable.h>
#include <linux/stackprotector.h>
#include <linux/utsname.h>
+#include <linux/efi.h>
#include <asm/alternative.h>
#include <asm/cmdline.h>
@@ -2382,6 +2383,12 @@ void __init arch_cpu_finalize_init(void)
fpu__init_system();
fpu__init_cpu();
+ /*
+ * This needs to follow the FPU initializtion, since EFI depends on it.
+ */
+ if (efi_enabled(EFI_RUNTIME_SERVICES))
+ efi_enter_virtual_mode();
+
/*
* Ensure that access to the per CPU representation has the initial
* boot CPU configuration.
diff --git a/init/main.c b/init/main.c
index c4778edae797..1d3a0a82d136 100644
--- a/init/main.c
+++ b/init/main.c
@@ -51,7 +51,6 @@
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/cgroup.h>
-#include <linux/efi.h>
#include <linux/tick.h>
#include <linux/sched/isolation.h>
#include <linux/interrupt.h>
@@ -1072,10 +1071,6 @@ void start_kernel(void)
pid_idr_init();
anon_vma_init();
-#ifdef CONFIG_X86
- if (efi_enabled(EFI_RUNTIME_SERVICES))
- efi_enter_virtual_mode();
-#endif
thread_stack_cache_init();
cred_init();
fork_init();
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (3 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 04/16] init/main.c: Move EFI runtime service initialization to x86/cpu Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-29 22:10 ` Sohil Mehta
2024-10-28 16:07 ` [PATCH v5 06/16] efi: Disable LASS around set_virtual_address_map call Alexander Shishkin
` (11 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
In order to map the EFI runtime services, set_virtual_address_map
needs to be called, which resides in the lower half of the address
space. This means that LASS needs to be temporarily disabled around
this call. This can only be done before the CR pinning is set up.
Move CR pinning setup behind the EFI initialization.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/kernel/cpu/common.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b24ad418536e..c249fd0aa3fb 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1953,7 +1953,6 @@ static __init void identify_boot_cpu(void)
enable_sep_cpu();
#endif
cpu_detect_tlb(&boot_cpu_data);
- setup_cr_pinning();
tsx_init();
tdx_init();
@@ -2385,10 +2384,16 @@ void __init arch_cpu_finalize_init(void)
/*
* This needs to follow the FPU initializtion, since EFI depends on it.
+ * It also needs to precede the CR pinning setup, because we need to be
+ * able to temporarily clear the CR4.LASS bit in order to execute the
+ * set_virtual_address_map call, which resides in lower addresses and
+ * would trip LASS if enabled.
*/
if (efi_enabled(EFI_RUNTIME_SERVICES))
efi_enter_virtual_mode();
+ setup_cr_pinning();
+
/*
* Ensure that access to the per CPU representation has the initial
* boot CPU configuration.
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 06/16] efi: Disable LASS around set_virtual_address_map call
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (4 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-29 15:00 ` Kirill A. Shutemov
2024-10-28 16:07 ` [PATCH v5 07/16] x86/vsyscall: Reorganize the #PF emulation code Alexander Shishkin
` (10 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
Of all the EFI runtime services, set_virtual_address_map is the only one
that is called at its lower mapping, which LASS prohibits regardless of
EFLAGS.AC setting. The only way to allow this to happen is to disable
LASS in the CR4 register.
Disable LASS around this low address EFI call.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/platform/efi/efi.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 88a96816de9a..4a7033f6de1f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -846,11 +846,24 @@ static void __init __efi_enter_virtual_mode(void)
efi_sync_low_kernel_mappings();
+ /*
+ * set_virtual_address_map is the only service located at lower
+ * addresses, so we have to temporarily disable LASS around it.
+ * Note that clearing EFLAGS.AC is not enough for this, the whole
+ * LASS needs to be disabled.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_LASS))
+ cr4_clear_bits(X86_CR4_LASS);
+
status = efi_set_virtual_address_map(efi.memmap.desc_size * count,
efi.memmap.desc_size,
efi.memmap.desc_version,
(efi_memory_desc_t *)pa,
efi_systab_phys);
+
+ if (cpu_feature_enabled(X86_FEATURE_LASS))
+ cr4_set_bits(X86_CR4_LASS);
+
if (status != EFI_SUCCESS) {
pr_err("Unable to switch EFI into virtual mode (status=%lx)!\n",
status);
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 07/16] x86/vsyscall: Reorganize the #PF emulation code
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (5 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 06/16] efi: Disable LASS around set_virtual_address_map call Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 08/16] x86/traps: Consolidate user fixups in exc_general_protection() Alexander Shishkin
` (9 subsequent siblings)
16 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
From: Sohil Mehta <sohil.mehta@intel.com>
Separate out the actual vsyscall emulation from the page fault specific
handling in preparation for the upcoming #GP fault emulation.
No functional change intended.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 42 +++++++++++++++------------
arch/x86/include/asm/vsyscall.h | 8 ++---
arch/x86/mm/fault.c | 2 +-
3 files changed, 29 insertions(+), 23 deletions(-)
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 2fb7d53cf333..e89d7d83a594 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -112,30 +112,13 @@ static bool write_ok_or_segv(unsigned long ptr, size_t size)
}
}
-bool emulate_vsyscall(unsigned long error_code,
- struct pt_regs *regs, unsigned long address)
+static bool __emulate_vsyscall(struct pt_regs *regs, unsigned long address)
{
unsigned long caller;
int vsyscall_nr, syscall_nr, tmp;
long ret;
unsigned long orig_dx;
- /* Write faults or kernel-privilege faults never get fixed up. */
- if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
- return false;
-
- if (!(error_code & X86_PF_INSTR)) {
- /* Failed vsyscall read */
- if (vsyscall_mode == EMULATE)
- return false;
-
- /*
- * User code tried and failed to read the vsyscall page.
- */
- warn_bad_vsyscall(KERN_INFO, regs, "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
- return false;
- }
-
/*
* No point in checking CS -- the only way to get here is a user mode
* trap to a high address, which means that we're in 64-bit user code.
@@ -270,6 +253,29 @@ bool emulate_vsyscall(unsigned long error_code,
return true;
}
+bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
+ unsigned long address)
+{
+ /* Write faults or kernel-privilege faults never get fixed up. */
+ if ((error_code & (X86_PF_WRITE | X86_PF_USER)) != X86_PF_USER)
+ return false;
+
+ if (!(error_code & X86_PF_INSTR)) {
+ /* Failed vsyscall read */
+ if (vsyscall_mode == EMULATE)
+ return false;
+
+ /*
+ * User code tried and failed to read the vsyscall page.
+ */
+ warn_bad_vsyscall(KERN_INFO, regs,
+ "vsyscall read attempt denied -- look up the vsyscall kernel parameter if you need a workaround");
+ return false;
+ }
+
+ return __emulate_vsyscall(regs, address);
+}
+
/*
* A pseudo VMA to allow ptrace access for the vsyscall page. This only
* covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index 472f0263dbc6..214977f4fa11 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -14,12 +14,12 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
* Called on instruction fetch fault in vsyscall page.
* Returns true if handled.
*/
-extern bool emulate_vsyscall(unsigned long error_code,
- struct pt_regs *regs, unsigned long address);
+extern bool emulate_vsyscall_pf(unsigned long error_code,
+ struct pt_regs *regs, unsigned long address);
#else
static inline void map_vsyscall(void) {}
-static inline bool emulate_vsyscall(unsigned long error_code,
- struct pt_regs *regs, unsigned long address)
+static inline bool emulate_vsyscall_pf(unsigned long error_code,
+ struct pt_regs *regs, unsigned long address)
{
return false;
}
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e6c469b323cc..44e2d1ef4128 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1318,7 +1318,7 @@ void do_user_addr_fault(struct pt_regs *regs,
* to consider the PF_PK bit.
*/
if (is_vsyscall_vaddr(address)) {
- if (emulate_vsyscall(error_code, regs, address))
+ if (emulate_vsyscall_pf(error_code, regs, address))
return;
}
#endif
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 08/16] x86/traps: Consolidate user fixups in exc_general_protection()
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (6 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 07/16] x86/vsyscall: Reorganize the #PF emulation code Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 09/16] x86/vsyscall: Add vsyscall emulation for #GP Alexander Shishkin
` (8 subsequent siblings)
16 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, Dave Hansen
From: Sohil Mehta <sohil.mehta@intel.com>
Move the UMIP exception fixup along with the other user mode fixups,
that is, under the common "if (user_mode(regs))" condition where the
rest of the fixups reside.
No functional change intended.
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/kernel/traps.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d05392db5d0f..b26a7aba0b2d 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -701,11 +701,6 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
cond_local_irq_enable(regs);
- if (static_cpu_has(X86_FEATURE_UMIP)) {
- if (user_mode(regs) && fixup_umip_exception(regs))
- goto exit;
- }
-
if (v8086_mode(regs)) {
local_irq_enable();
handle_vm86_fault((struct kernel_vm86_regs *) regs, error_code);
@@ -720,6 +715,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
if (fixup_vdso_exception(regs, X86_TRAP_GP, error_code, 0))
goto exit;
+ if (cpu_feature_enabled(X86_FEATURE_UMIP) && fixup_umip_exception(regs))
+ goto exit;
+
gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
goto exit;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 09/16] x86/vsyscall: Add vsyscall emulation for #GP
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (7 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 08/16] x86/traps: Consolidate user fixups in exc_general_protection() Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 10/16] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Alexander Shishkin
` (7 subsequent siblings)
16 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
From: Sohil Mehta <sohil.mehta@intel.com>
The legacy vsyscall page is mapped at a fixed address in the kernel
address range 0xffffffffff600000-0xffffffffff601000. Prior to LASS being
introduced, a legacy vsyscall page access from userspace would always
generate a page fault. The kernel emulates the execute (XONLY) accesses
in the page fault handler and returns back to userspace with the
appropriate register values.
Since LASS intercepts these accesses before the paging structures are
traversed it generates a general protection fault instead of a page
fault. The #GP fault doesn't provide much information in terms of the
error code. So, use the faulting RIP which is preserved in the user
registers to emulate the vsyscall access without going through complex
instruction decoding.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 11 ++++++++++-
arch/x86/include/asm/vsyscall.h | 6 ++++++
arch/x86/kernel/traps.c | 4 ++++
3 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index e89d7d83a594..97608883b4b4 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -23,7 +23,7 @@
* soon be no new userspace code that will ever use a vsyscall.
*
* The code in this file emulates vsyscalls when notified of a page
- * fault to a vsyscall address.
+ * fault or a general protection fault to a vsyscall address.
*/
#include <linux/kernel.h>
@@ -276,6 +276,15 @@ bool emulate_vsyscall_pf(unsigned long error_code, struct pt_regs *regs,
return __emulate_vsyscall(regs, address);
}
+bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+ /* Emulate only if the RIP points to the vsyscall address */
+ if (!is_vsyscall_vaddr(regs->ip))
+ return false;
+
+ return __emulate_vsyscall(regs, regs->ip);
+}
+
/*
* A pseudo VMA to allow ptrace access for the vsyscall page. This only
* covers the 64bit vsyscall page now. 32bit has a real VMA now and does
diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h
index 214977f4fa11..4eb8d3673223 100644
--- a/arch/x86/include/asm/vsyscall.h
+++ b/arch/x86/include/asm/vsyscall.h
@@ -16,6 +16,7 @@ extern void set_vsyscall_pgtable_user_bits(pgd_t *root);
*/
extern bool emulate_vsyscall_pf(unsigned long error_code,
struct pt_regs *regs, unsigned long address);
+extern bool emulate_vsyscall_gp(struct pt_regs *regs);
#else
static inline void map_vsyscall(void) {}
static inline bool emulate_vsyscall_pf(unsigned long error_code,
@@ -23,6 +24,11 @@ static inline bool emulate_vsyscall_pf(unsigned long error_code,
{
return false;
}
+
+static inline bool emulate_vsyscall_gp(struct pt_regs *regs)
+{
+ return false;
+}
#endif
/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b26a7aba0b2d..bae635cc6971 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -68,6 +68,7 @@
#include <asm/vdso.h>
#include <asm/tdx.h>
#include <asm/cfi.h>
+#include <asm/vsyscall.h>
#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -718,6 +719,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
if (cpu_feature_enabled(X86_FEATURE_UMIP) && fixup_umip_exception(regs))
goto exit;
+ if (cpu_feature_enabled(X86_FEATURE_LASS) && emulate_vsyscall_gp(regs))
+ goto exit;
+
gp_user_force_sig_segv(regs, X86_TRAP_GP, error_code, desc);
goto exit;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 10/16] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (8 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 09/16] x86/vsyscall: Add vsyscall emulation for #GP Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 11/16] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS Alexander Shishkin
` (6 subsequent siblings)
16 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
From: Sohil Mehta <sohil.mehta@intel.com>
The EMULATE mode of vsyscall maps the vsyscall page into user address
space which can be read directly by the user application. This mode has
been deprecated recently and can only be enabled from a special command
line parameter vsyscall=emulate. See commit bf00745e7791 ("x86/vsyscall:
Remove CONFIG_LEGACY_VSYSCALL_EMULATE")
Fixing the LASS violations during the EMULATE mode would need complex
instruction decoding since the resulting #GP fault does not include any
useful error information and the vsyscall address is not readily
available in the RIP.
At this point, no one is expected to be using the insecure and
deprecated EMULATE mode. The rare usages that need support probably
don't care much about security anyway. Disable LASS when EMULATE mode is
requested during command line parsing to avoid breaking user software.
LASS will be supported if vsyscall mode is set to XONLY or NONE.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/entry/vsyscall/vsyscall_64.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 97608883b4b4..7c845c1db3b4 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -36,6 +36,7 @@
#include <asm/vsyscall.h>
#include <asm/unistd.h>
#include <asm/fixmap.h>
+#include <asm/tlbflush.h>
#include <asm/traps.h>
#include <asm/paravirt.h>
@@ -63,6 +64,13 @@ static int __init vsyscall_setup(char *str)
else
return -EINVAL;
+ if (cpu_feature_enabled(X86_FEATURE_LASS) &&
+ vsyscall_mode == EMULATE) {
+ cr4_clear_bits(X86_CR4_LASS);
+ setup_clear_cpu_cap(X86_FEATURE_LASS);
+ pr_warn_once("x86/cpu: Disabling LASS support due to vsyscall=emulate\n");
+ }
+
return 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 11/16] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (9 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 10/16] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Alexander Shishkin
@ 2024-10-28 16:07 ` Alexander Shishkin
2024-10-29 23:41 ` Sohil Mehta
2024-10-28 16:08 ` [PATCH v5 12/16] x86/cpu: Set LASS CR4 bit as pinning sensitive Alexander Shishkin
` (5 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:07 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, Dave Hansen
Since EMULATE mode of vsyscall disables LASS, because fixing the LASS
violations during the EMULATE mode would need complex instruction
decoding, document this fact in kernel-parameters.txt.
Cc: Andy Lutomirski <luto@kernel.org>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
Documentation/admin-guide/kernel-parameters.txt | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1518343bbe22..4091dc48670a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7391,7 +7391,9 @@
emulate Vsyscalls turn into traps and are emulated
reasonably safely. The vsyscall page is
- readable.
+ readable. This also disables the LASS
+ feature to allow userspace to poke around
+ the vsyscall page.
xonly [default] Vsyscalls turn into traps and are
emulated reasonably safely. The vsyscall
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 12/16] x86/cpu: Set LASS CR4 bit as pinning sensitive
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (10 preceding siblings ...)
2024-10-28 16:07 ` [PATCH v5 11/16] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS Alexander Shishkin
@ 2024-10-28 16:08 ` Alexander Shishkin
2024-10-28 16:08 ` [PATCH v5 13/16] x86/traps: Communicate a LASS violation in #GP message Alexander Shishkin
` (4 subsequent siblings)
16 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:08 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, Yian Chen
From: Yian Chen <yian.chen@intel.com>
Security features such as LASS are not expected to be disabled once
initialized. Add LASS to the CR4 pinned mask.
Signed-off-by: Yian Chen <yian.chen@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c249fd0aa3fb..f8eed9548ea1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -402,7 +402,8 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
/* These bits should not change their value after CPU init is finished. */
static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
- X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED;
+ X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED |
+ X86_CR4_LASS;
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 13/16] x86/traps: Communicate a LASS violation in #GP message
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (11 preceding siblings ...)
2024-10-28 16:08 ` [PATCH v5 12/16] x86/cpu: Set LASS CR4 bit as pinning sensitive Alexander Shishkin
@ 2024-10-28 16:08 ` Alexander Shishkin
2024-10-28 16:08 ` [PATCH v5 14/16] x86/cpu: Make LAM depend on LASS Alexander Shishkin
` (3 subsequent siblings)
16 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:08 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
Provide a more helpful message on #GP when a kernel side LASS violation
is detected.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/kernel/traps.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index bae635cc6971..89e35ab8dbd9 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -537,7 +537,8 @@ DEFINE_IDTENTRY(exc_bounds)
enum kernel_gp_hint {
GP_NO_HINT,
GP_NON_CANONICAL,
- GP_CANONICAL
+ GP_CANONICAL,
+ GP_LASS_VIOLATION
};
/*
@@ -573,6 +574,8 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
if (*addr < ~__VIRTUAL_MASK &&
*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
return GP_NON_CANONICAL;
+ else if (*addr < ~__VIRTUAL_MASK && cpu_feature_enabled(X86_FEATURE_LASS))
+ return GP_LASS_VIOLATION;
#endif
return GP_CANONICAL;
@@ -696,6 +699,11 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
enum kernel_gp_hint hint = GP_NO_HINT;
unsigned long gp_addr;
+ static char *help[] = {
+ [GP_NON_CANONICAL] = "probably for non-canonical address",
+ [GP_CANONICAL] = "maybe for address",
+ [GP_LASS_VIOLATION] = "LASS prevented access to address"
+ };
if (user_mode(regs) && try_fixup_enqcmd_gp())
return;
@@ -735,9 +743,7 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
hint = get_kernel_gp_address(regs, &gp_addr);
if (hint != GP_NO_HINT)
- snprintf(desc, sizeof(desc), GPFSTR ", %s 0x%lx",
- (hint == GP_NON_CANONICAL) ? "probably for non-canonical address"
- : "maybe for address",
+ snprintf(desc, sizeof(desc), GPFSTR ", %s 0x%lx", help[hint],
gp_addr);
/*
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 14/16] x86/cpu: Make LAM depend on LASS
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (12 preceding siblings ...)
2024-10-28 16:08 ` [PATCH v5 13/16] x86/traps: Communicate a LASS violation in #GP message Alexander Shishkin
@ 2024-10-28 16:08 ` Alexander Shishkin
2024-10-31 0:06 ` Sohil Mehta
2024-10-28 16:08 ` [PATCH v5 15/16] x86/cpu: Enable LASS during CPU initialization Alexander Shishkin
` (2 subsequent siblings)
16 siblings, 1 reply; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:08 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
To prevent exploits for Spectre based on LAM as demonstrated by the
whitepaper [1], make LAM depend on LASS, which avoids this type of
vulnerability.
[1] https://download.vusec.net/papers/slam_sp24.pdf
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3f73c4b03348..d9fb2423605e 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
{ X86_FEATURE_FRED, X86_FEATURE_LKGS },
{ X86_FEATURE_LASS, X86_FEATURE_SMAP },
+ { X86_FEATURE_LAM, X86_FEATURE_LASS },
{}
};
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 15/16] x86/cpu: Enable LASS during CPU initialization
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (13 preceding siblings ...)
2024-10-28 16:08 ` [PATCH v5 14/16] x86/cpu: Make LAM depend on LASS Alexander Shishkin
@ 2024-10-28 16:08 ` Alexander Shishkin
2024-10-28 16:08 ` [PATCH v5 16/16] Revert "x86/lam: Disable ADDRESS_MASKING in most cases" Alexander Shishkin
2024-10-29 17:14 ` [PATCH v5 00/16] Enable Linear Address Space Separation support Matthew Wilcox
16 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:08 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
From: Sohil Mehta <sohil.mehta@intel.com>
Being a security feature, enable LASS by default if the platform
supports it.
While at it, get rid of the comment above the SMAP/SMEP/UMIP/LASS setup
instead of updating it to mention LASS as well, as the whole sequence is
quite self-explanatory.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
arch/x86/kernel/cpu/common.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f8eed9548ea1..2f5faa5979a9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -400,6 +400,12 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
cr4_clear_bits(X86_CR4_UMIP);
}
+static __always_inline void setup_lass(struct cpuinfo_x86 *c)
+{
+ if (cpu_feature_enabled(X86_FEATURE_LASS))
+ cr4_set_bits(X86_CR4_LASS);
+}
+
/* These bits should not change their value after CPU init is finished. */
static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
X86_CR4_FSGSBASE | X86_CR4_CET | X86_CR4_FRED |
@@ -1848,10 +1854,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Disable the PN if appropriate */
squash_the_stupid_serial_number(c);
- /* Set up SMEP/SMAP/UMIP */
setup_smep(c);
setup_smap(c);
setup_umip(c);
+ setup_lass(c);
/* Enable FSGSBASE instructions if available. */
if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* [PATCH v5 16/16] Revert "x86/lam: Disable ADDRESS_MASKING in most cases"
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (14 preceding siblings ...)
2024-10-28 16:08 ` [PATCH v5 15/16] x86/cpu: Enable LASS during CPU initialization Alexander Shishkin
@ 2024-10-28 16:08 ` Alexander Shishkin
2024-10-28 20:41 ` Kirill A. Shutemov
2024-10-29 17:14 ` [PATCH v5 00/16] Enable Linear Address Space Separation support Matthew Wilcox
16 siblings, 1 reply; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 16:08 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Alexander Shishkin, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
This reverts commit 3267cb6d3a174ff83d6287dcd5b0047bbd912452.
LASS mitigates the Spectre based on LAM (SLAM) [1] and an earlier
commit made LAM depend on LASS, so we no longer need to disable LAM at
compile time, so revert the commit that disables LAM.
[1] https://download.vusec.net/papers/slam_sp24.pdf
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
CC: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0bdb7a394f59..192d5145f54e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2257,7 +2257,6 @@ config RANDOMIZE_MEMORY_PHYSICAL_PADDING
config ADDRESS_MASKING
bool "Linear Address Masking support"
depends on X86_64
- depends on COMPILE_TEST || !CPU_MITIGATIONS # wait for LASS
help
Linear Address Masking (LAM) modifies the checking that is applied
to 64-bit linear addresses, allowing software to use of the
--
2.45.2
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives
2024-10-28 16:07 ` [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives Alexander Shishkin
@ 2024-10-28 17:49 ` Dave Hansen
2024-10-29 11:36 ` Peter Zijlstra
0 siblings, 1 reply; 42+ messages in thread
From: Dave Hansen @ 2024-10-28 17:49 UTC (permalink / raw)
To: Alexander Shishkin, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Tony Luck, Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi
On 10/28/24 09:07, Alexander Shishkin wrote:
> static void text_poke_memcpy(void *dst, const void *src, size_t len)
> {
> - memcpy(dst, src, len);
> + lass_stac();
> + __inline_memcpy(dst, src, len);
> + lass_clac();
> }
>
> static void text_poke_memset(void *dst, const void *src, size_t len)
> {
> int c = *(const int *)src;
>
> - memset(dst, c, len);
> + lass_stac();
> + __inline_memset(dst, c, len);
> + lass_clac();
> }
These are the _only_ users of lass_stac/clac() or the new inlines.
First of all, I totally agree that the _existing_ strict objtool
behavior around STAC/CLAC is a good idea.
But text poking really is special and the context is highly unlikely to
result in bugs or exploits. My first instinct here would have been to
tell objtool that the text poking code is OK and to relax objtool's
STAC/CLAC paranoia here.
Looking at objtool, I can see how important it is to keep the STAC/CLAC
code as dirt simple and foolproof as possible. I don't see an obvious
way to except the text poking code without adding at least some complexity.
Basically what I'm asking for is if the goal is to keep objtool simple,
please *SAY* that. Because on the surface this doesn't look like a good
idea.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 16/16] Revert "x86/lam: Disable ADDRESS_MASKING in most cases"
2024-10-28 16:08 ` [PATCH v5 16/16] Revert "x86/lam: Disable ADDRESS_MASKING in most cases" Alexander Shishkin
@ 2024-10-28 20:41 ` Kirill A. Shutemov
2024-10-28 22:00 ` Alexander Shishkin
0 siblings, 1 reply; 42+ messages in thread
From: Kirill A. Shutemov @ 2024-10-28 20:41 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
On Mon, Oct 28, 2024 at 06:08:04PM +0200, Alexander Shishkin wrote:
> This reverts commit 3267cb6d3a174ff83d6287dcd5b0047bbd912452.
>
> LASS mitigates the Spectre based on LAM (SLAM) [1] and an earlier
> commit made LAM depend on LASS, so we no longer need to disable LAM at
> compile time, so revert the commit that disables LAM.
>
> [1] https://download.vusec.net/papers/slam_sp24.pdf
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> CC: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Before re-enabling LAM, you need to uncomment X86_FEATURE_LAM check in
arch/x86/kernel/cpu/common.c introduced in recent 86e6b1547b3d ("x86: fix
user address masking non-canonical speculation issue").
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 16/16] Revert "x86/lam: Disable ADDRESS_MASKING in most cases"
2024-10-28 20:41 ` Kirill A. Shutemov
@ 2024-10-28 22:00 ` Alexander Shishkin
0 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-28 22:00 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, alexander.shishkin
"Kirill A. Shutemov" <kirill@shutemov.name> writes:
> On Mon, Oct 28, 2024 at 06:08:04PM +0200, Alexander Shishkin wrote:
>> This reverts commit 3267cb6d3a174ff83d6287dcd5b0047bbd912452.
>>
>> LASS mitigates the Spectre based on LAM (SLAM) [1] and an earlier
>> commit made LAM depend on LASS, so we no longer need to disable LAM at
>> compile time, so revert the commit that disables LAM.
>>
>> [1] https://download.vusec.net/papers/slam_sp24.pdf
>>
>> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> CC: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>
> Before re-enabling LAM, you need to uncomment X86_FEATURE_LAM check in
> arch/x86/kernel/cpu/common.c introduced in recent 86e6b1547b3d ("x86: fix
> user address masking non-canonical speculation issue").
Forgot about that one. Thanks!
Regards,
--
Alex
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives
2024-10-28 17:49 ` Dave Hansen
@ 2024-10-29 11:36 ` Peter Zijlstra
2024-10-29 18:48 ` Peter Zijlstra
0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2024-10-29 11:36 UTC (permalink / raw)
To: Dave Hansen
Cc: Alexander Shishkin, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
On Mon, Oct 28, 2024 at 10:49:07AM -0700, Dave Hansen wrote:
> On 10/28/24 09:07, Alexander Shishkin wrote:
> > static void text_poke_memcpy(void *dst, const void *src, size_t len)
> > {
> > - memcpy(dst, src, len);
> > + lass_stac();
> > + __inline_memcpy(dst, src, len);
> > + lass_clac();
> > }
> >
> > static void text_poke_memset(void *dst, const void *src, size_t len)
> > {
> > int c = *(const int *)src;
> >
> > - memset(dst, c, len);
> > + lass_stac();
> > + __inline_memset(dst, c, len);
> > + lass_clac();
> > }
>
> These are the _only_ users of lass_stac/clac() or the new inlines.
For now; I have vague memories of running into trouble with compilers
doing random things with memcpy before, and having these inline versions
gives us more control.
One of the cases I remember running into was KASAN, where a compiler is
SUPPOSED to issue __asan_memcpy calls instead of the regular memcpy
calls, except they weren't all doing that, with the end result that our
regular memcpy implementation grew instrumentation to deal with that.
That got sorted -- by deprecating / breaking all those non-conformant
compilers. But still, I think it would be good to have the option to
force a simple inline memcpy when needed.
> First of all, I totally agree that the _existing_ strict objtool
> behavior around STAC/CLAC is a good idea.
>
> But text poking really is special and the context is highly unlikely to
> result in bugs or exploits. My first instinct here would have been to
> tell objtool that the text poking code is OK and to relax objtool's
> STAC/CLAC paranoia here.
>
> Looking at objtool, I can see how important it is to keep the STAC/CLAC
> code as dirt simple and foolproof as possible. I don't see an obvious
> way to except the text poking code without adding at least some complexity.
>
> Basically what I'm asking for is if the goal is to keep objtool simple,
> please *SAY* that. Because on the surface this doesn't look like a good
> idea.
There is, you can add it to uaccess_safe_builtin[], but I'm not sure we
want to blanked accept memcpy() -- or perhaps that is what you're
saying.
Anyway, looking at this, I see we grew rep_{movs,stos}_alternative, as
used in copy_user_generic() and __clear_user(). Which are all somewhat
similar.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 01/16] x86/cpu: Enumerate the LASS feature bits
2024-10-28 16:07 ` [PATCH v5 01/16] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
@ 2024-10-29 14:55 ` Kirill A. Shutemov
2024-10-29 21:46 ` Sohil Mehta
0 siblings, 1 reply; 42+ messages in thread
From: Kirill A. Shutemov @ 2024-10-29 14:55 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Alexey Kardashevskiy, Jonathan Corbet, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, Yian Chen
On Mon, Oct 28, 2024 at 06:07:49PM +0200, Alexander Shishkin wrote:
> From: Sohil Mehta <sohil.mehta@intel.com>
>
> Linear Address Space Separation (LASS) is a security feature that
> intends to prevent malicious virtual address space accesses across
> user/kernel mode.
>
> Such mode based access protection already exists today with paging and
> features such as SMEP and SMAP. However, to enforce these protections,
> the processor must traverse the paging structures in memory. Malicious
> software can use timing information resulting from this traversal to
> determine details about the paging structures, and these details may
> also be used to determine the layout of the kernel memory.
>
> The LASS mechanism provides the same mode-based protections as paging
> but without traversing the paging structures. Because the protections
> enforced by LASS are applied before paging, software will not be able to
> derive paging-based timing information from the various caching
> structures such as the TLBs, mid-level caches, page walker, data caches,
> etc.
>
> LASS enforcement relies on the typical kernel implementation to divide
> the 64-bit virtual address space into two halves:
> Addr[63]=0 -> User address space
> Addr[63]=1 -> Kernel address space
>
> Any data access or code execution across address spaces typically
> results in a #GP fault.
SDM mentions #SS for LASS violations on stack instructions. Do we care to
provide a sensible error message on #SS as we do for #GP?
> The LASS enforcement for kernel data access is dependent on CR4.SMAP
> being set. The enforcement can be disabled by toggling the RFLAGS.AC bit
> similar to SMAP.
>
> Define the CPU feature bits to enumerate this feature and include
> feature dependencies to reflect the same.
>
> Co-developed-by: Yian Chen <yian.chen@intel.com>
> Signed-off-by: Yian Chen <yian.chen@intel.com>
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/disabled-features.h | 4 +++-
> arch/x86/include/asm/smap.h | 18 ++++++++++++++++++
> arch/x86/include/uapi/asm/processor-flags.h | 2 ++
> arch/x86/kernel/cpu/cpuid-deps.c | 1 +
> tools/arch/x86/include/asm/cpufeatures.h | 1 +
> 6 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index ea33439a5d00..acb3ccea2bd7 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -319,6 +319,7 @@
> /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* "avx_vnni" AVX VNNI instructions */
> #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* "avx512_bf16" AVX512 BFLOAT16 instructions */
> +#define X86_FEATURE_LASS (12*32+ 6) /* "lass" Linear Address Space Separation */
> #define X86_FEATURE_CMPCCXADD (12*32+ 7) /* CMPccXADD instructions */
> #define X86_FEATURE_ARCH_PERFMON_EXT (12*32+ 8) /* Intel Architectural PerfMon Extension */
> #define X86_FEATURE_FZRM (12*32+10) /* Fast zero-length REP MOVSB */
> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index c492bdc97b05..76c7d362af94 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -22,12 +22,14 @@
> # define DISABLE_CYRIX_ARR (1<<(X86_FEATURE_CYRIX_ARR & 31))
> # define DISABLE_CENTAUR_MCR (1<<(X86_FEATURE_CENTAUR_MCR & 31))
> # define DISABLE_PCID 0
> +# define DISABLE_LASS 0
> #else
> # define DISABLE_VME 0
> # define DISABLE_K6_MTRR 0
> # define DISABLE_CYRIX_ARR 0
> # define DISABLE_CENTAUR_MCR 0
> # define DISABLE_PCID (1<<(X86_FEATURE_PCID & 31))
> +# define DISABLE_LASS (1<<(X86_FEATURE_LASS & 31))
> #endif /* CONFIG_X86_64 */
>
> #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> @@ -146,7 +148,7 @@
> #define DISABLED_MASK11 (DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
> DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
> #define DISABLED_MASK12 (DISABLE_FRED|DISABLE_LAM)
> -#define DISABLED_MASK13 0
> +#define DISABLED_MASK13 (DISABLE_LASS)
> #define DISABLED_MASK14 0
> #define DISABLED_MASK15 0
> #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
> index bab490379c65..8cb6f004800b 100644
> --- a/arch/x86/include/asm/smap.h
> +++ b/arch/x86/include/asm/smap.h
> @@ -27,6 +27,12 @@
>
> #else /* __ASSEMBLY__ */
>
> +/*
> + * The CLAC/STAC instructions toggle enforcement of X86_FEATURE_SMAP.
> + * Add dedicated lass_*() variants for cases that are necessitated by
> + * LASS (X86_FEATURE_LASS) enforcement, which helps readability and
> + * avoids AC flag flipping on CPUs that don't support LASS.
> + */
Maybe add a new line here? The comment is for the group of helpers, not
for clac() specifically.
> static __always_inline void clac(void)
> {
> /* Note: a barrier is implicit in alternative() */
> @@ -39,6 +45,18 @@ static __always_inline void stac(void)
> alternative("", __ASM_STAC, X86_FEATURE_SMAP);
> }
>
> +static __always_inline void lass_clac(void)
> +{
> + /* Note: a barrier is implicit in alternative() */
> + alternative("", __ASM_CLAC, X86_FEATURE_LASS);
> +}
> +
> +static __always_inline void lass_stac(void)
> +{
> + /* Note: a barrier is implicit in alternative() */
> + alternative("", __ASM_STAC, X86_FEATURE_LASS);
> +}
> +
> static __always_inline unsigned long smap_save(void)
> {
> unsigned long flags;
> diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
> index f1a4adc78272..81d0c8bf1137 100644
> --- a/arch/x86/include/uapi/asm/processor-flags.h
> +++ b/arch/x86/include/uapi/asm/processor-flags.h
> @@ -136,6 +136,8 @@
> #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT)
> #define X86_CR4_CET_BIT 23 /* enable Control-flow Enforcement Technology */
> #define X86_CR4_CET _BITUL(X86_CR4_CET_BIT)
> +#define X86_CR4_LASS_BIT 27 /* enable Linear Address Space Separation support */
> +#define X86_CR4_LASS _BITUL(X86_CR4_LASS_BIT)
> #define X86_CR4_LAM_SUP_BIT 28 /* LAM for supervisor pointers */
> #define X86_CR4_LAM_SUP _BITUL(X86_CR4_LAM_SUP_BIT)
>
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index 8bd84114c2d9..3f73c4b03348 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -83,6 +83,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> { X86_FEATURE_AMX_TILE, X86_FEATURE_XFD },
> { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
> { X86_FEATURE_FRED, X86_FEATURE_LKGS },
> + { X86_FEATURE_LASS, X86_FEATURE_SMAP },
> {}
> };
>
> diff --git a/tools/arch/x86/include/asm/cpufeatures.h b/tools/arch/x86/include/asm/cpufeatures.h
> index 23698d0f4bb4..538930159f9f 100644
> --- a/tools/arch/x86/include/asm/cpufeatures.h
> +++ b/tools/arch/x86/include/asm/cpufeatures.h
> @@ -319,6 +319,7 @@
> /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
> #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* "avx_vnni" AVX VNNI instructions */
> #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* "avx512_bf16" AVX512 BFLOAT16 instructions */
> +#define X86_FEATURE_LASS (12*32+ 6) /* "lass" Linear Address Space Separation */
> #define X86_FEATURE_CMPCCXADD (12*32+ 7) /* CMPccXADD instructions */
> #define X86_FEATURE_ARCH_PERFMON_EXT (12*32+ 8) /* Intel Architectural PerfMon Extension */
> #define X86_FEATURE_FZRM (12*32+10) /* Fast zero-length REP MOVSB */
> --
> 2.45.2
>
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 06/16] efi: Disable LASS around set_virtual_address_map call
2024-10-28 16:07 ` [PATCH v5 06/16] efi: Disable LASS around set_virtual_address_map call Alexander Shishkin
@ 2024-10-29 15:00 ` Kirill A. Shutemov
0 siblings, 0 replies; 42+ messages in thread
From: Kirill A. Shutemov @ 2024-10-29 15:00 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Alexey Kardashevskiy, Jonathan Corbet, Sohil Mehta, Ingo Molnar,
Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
On Mon, Oct 28, 2024 at 06:07:54PM +0200, Alexander Shishkin wrote:
> Of all the EFI runtime services, set_virtual_address_map is the only one
set_virtual_address_map()
> that is called at its lower mapping, which LASS prohibits regardless of
> EFLAGS.AC setting. The only way to allow this to happen is to disable
> LASS in the CR4 register.
How does it interact with cr_pinning? IIUC, this can happen well after
boot? Like on efivar fs mount.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 00/16] Enable Linear Address Space Separation support
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
` (15 preceding siblings ...)
2024-10-28 16:08 ` [PATCH v5 16/16] Revert "x86/lam: Disable ADDRESS_MASKING in most cases" Alexander Shishkin
@ 2024-10-29 17:14 ` Matthew Wilcox
2024-10-30 7:16 ` Alexander Shishkin
16 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2024-10-29 17:14 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
On Mon, Oct 28, 2024 at 06:07:48PM +0200, Alexander Shishkin wrote:
> Linear Address Space Separation (LASS) is a security feature that intends to
> prevent malicious virtual address space accesses across user/kernel mode.
>
> Such mode based access protection already exists today with paging and features
> such as SMEP and SMAP. However, to enforce these protections, the processor
> must traverse the paging structures in memory. Malicious software can use
> timing information resulting from this traversal to determine details about the
> paging structures, and these details may also be used to determine the layout
> of the kernel memory.
>
> The LASS mechanism provides the same mode-based protections as paging but
> without traversing the paging structures. Because the protections enforced by
> LASS are applied before paging, software will not be able to derive
> paging-based timing information from the various caching structures such as the
> TLBs, mid-level caches, page walker, data caches, etc. LASS can avoid probing
> using double page faults, TLB flush and reload, and SW prefetch instructions.
> See [2], [3] and [4] for some research on the related attack vectors.
>
> In addition, LASS prevents an attack vector described in a Spectre LAM (SLAM)
> whitepaper [7].
>
> LASS enforcement relies on the typical kernel implemetation to divide the
> 64-bit virtual address space into two halves:
> Addr[63]=0 -> User address space
> Addr[63]=1 -> Kernel address space
> Any data access or code execution across address spaces typically results in a
> #GP fault.
>
> Kernel accesses usually only happen to the kernel address space. However, there
> are valid reasons for kernel to access memory in the user half. For these cases
> (such as text poking and EFI runtime accesses), the kernel can temporarily
> suspend the enforcement of LASS by toggling SMAP (Supervisor Mode Access
> Prevention) using the stac()/clac() instructions and in one instance a downright
> disabling LASS for an EFI runtime call.
>
> User space cannot access any kernel address while LASS is enabled.
> Unfortunately, legacy vsyscall functions are located in the address range
> 0xffffffffff600000 - 0xffffffffff601000 and emulated in kernel. To avoid
> breaking user applications when LASS is enabled, extend the vsyscall emulation
> in execute (XONLY) mode to the #GP fault handler.
>
> In contrast, the vsyscall EMULATE mode is deprecated and not expected to be
> used by anyone. Supporting EMULATE mode with LASS would need complex
> intruction decoding in the #GP fault handler and is probably not worth the
> hassle. Disable LASS in this rare case when someone absolutely needs and
> enables vsyscall=emulate via the command line.
I lack the wit to read & understand these patches to answer this
question, so I'll just ask it:
What happens when the kernel does a NULL pointer dereference (due to a
bug)? It's not an attempt to access userspace, but it should result in
a good bug report. Normally this would be outside a STAC/CLAC region,
but I suppose technically it could be within one.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives
2024-10-29 11:36 ` Peter Zijlstra
@ 2024-10-29 18:48 ` Peter Zijlstra
2024-10-30 7:40 ` Alexander Shishkin
0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2024-10-29 18:48 UTC (permalink / raw)
To: Dave Hansen
Cc: Alexander Shishkin, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
On Tue, Oct 29, 2024 at 12:36:11PM +0100, Peter Zijlstra wrote:
> Anyway, looking at this, I see we grew rep_{movs,stos}_alternative, as
> used in copy_user_generic() and __clear_user(). Which are all somewhat
> similar.
That is, we could consider something like the completely untested and
probably broken, will light your granny on fire and maul pets like
below..
---
diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
index 9cb5aae7fba9..e25a988360a1 100644
--- a/arch/x86/include/asm/string.h
+++ b/arch/x86/include/asm/string.h
@@ -2,31 +2,50 @@
#ifndef _ASM_X86_STRING_H
#define _ASM_X86_STRING_H
+#include <asm/asm.h>
+#include <asm/alternative.h>
+
#ifdef CONFIG_X86_32
# include <asm/string_32.h>
#else
# include <asm/string_64.h>
#endif
+#ifdef CONFIG_X86_64
+#define ALT_64(orig, alt, feat) ALTERNATIVE(orig, alt, feat)
+#else
+#define ALT_64(orig, alt, feat) orig
+#endif
+
static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
{
void *ret = to;
- asm volatile("rep movsb"
- : "+D" (to), "+S" (from), "+c" (len)
- : : "memory");
- return ret;
+ asm volatile("1:\n\t"
+ ALT_64("rep movsb",
+ "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
+ "2:\n\t"
+ _ASM_EXTABLE_UA(1b, 2b)
+ : "+D" (to), "+S" (from), "+c" (len), ASM_CALL_CONSTRAINT
+ : : "memory", _ASM_AX);
+
+ return ret + len;
}
static __always_inline void *__inline_memset(void *s, int v, size_t n)
{
void *ret = s;
- asm volatile("rep stosb"
- : "+D" (s), "+c" (n)
+ asm volatile("1:\n\t"
+ ALT_64("rep stosb",
+ "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRM))
+ "2:\n\t"
+ _ASM_EXTABLE_UA(1b, 2b)
+ : "+D" (s), "+c" (n), ASM_CALL_CONSTRAINT
: "a" ((uint8_t)v)
- : "memory");
- return ret;
+ : "memory", _ASM_SI);
+
+ return ret + len;
}
#endif /* _ASM_X86_STRING_H */
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index b0a887209400..9f2d2c2ca731 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -13,6 +13,7 @@
#include <asm/page.h>
#include <asm/percpu.h>
#include <asm/runtime-const.h>
+#include <asm/string.h>
/*
* Virtual variable: there's no actual backing store for this,
@@ -118,21 +119,12 @@ rep_movs_alternative(void *to, const void *from, unsigned len);
static __always_inline __must_check unsigned long
copy_user_generic(void *to, const void *from, unsigned long len)
{
+ void *ret;
+
stac();
- /*
- * If CPU has FSRM feature, use 'rep movs'.
- * Otherwise, use rep_movs_alternative.
- */
- asm volatile(
- "1:\n\t"
- ALTERNATIVE("rep movsb",
- "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
- "2:\n"
- _ASM_EXTABLE_UA(1b, 2b)
- :"+c" (len), "+D" (to), "+S" (from), ASM_CALL_CONSTRAINT
- : : "memory", "rax");
+ ret = __inline_memcpy(to, from, len);
clac();
- return len;
+ return ret - to;
}
static __always_inline __must_check unsigned long
@@ -178,25 +170,15 @@ rep_stos_alternative(void __user *addr, unsigned long len);
static __always_inline __must_check unsigned long __clear_user(void __user *addr, unsigned long size)
{
- might_fault();
- stac();
+ void *ret;
- /*
- * No memory constraint because it doesn't change any memory gcc
- * knows about.
- */
- asm volatile(
- "1:\n\t"
- ALTERNATIVE("rep stosb",
- "call rep_stos_alternative", ALT_NOT(X86_FEATURE_FSRS))
- "2:\n"
- _ASM_EXTABLE_UA(1b, 2b)
- : "+c" (size), "+D" (addr), ASM_CALL_CONSTRAINT
- : "a" (0));
+ might_fault();
+ stac();
+ ret = __inline_memset(addr, 0, size);
clac();
- return size;
+ return ret - addr;
}
static __always_inline unsigned long clear_user(void __user *to, unsigned long n)
diff --git a/arch/x86/lib/clear_page_64.S b/arch/x86/lib/clear_page_64.S
index 2760a15fbc00..17d4bf6f50e5 100644
--- a/arch/x86/lib/clear_page_64.S
+++ b/arch/x86/lib/clear_page_64.S
@@ -53,16 +53,22 @@ SYM_FUNC_END(clear_page_erms)
EXPORT_SYMBOL_GPL(clear_page_erms)
/*
- * Default clear user-space.
+ * Default memset
* Input:
* rdi destination
+ * rsi scratch
* rcx count
- * rax is zero
+ * al is value
*
* Output:
- * rcx: uncleared bytes or 0 if successful.
+ * rcx: unset bytes or 0 if successful.
*/
SYM_FUNC_START(rep_stos_alternative)
+
+ movzbl %al, %rsi
+ movabs $0x0101010101010101, %rax
+ mulq %rsi
+
cmpq $64,%rcx
jae .Lunrolled
^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [PATCH v5 01/16] x86/cpu: Enumerate the LASS feature bits
2024-10-29 14:55 ` Kirill A. Shutemov
@ 2024-10-29 21:46 ` Sohil Mehta
0 siblings, 0 replies; 42+ messages in thread
From: Sohil Mehta @ 2024-10-29 21:46 UTC (permalink / raw)
To: Kirill A. Shutemov, Alexander Shishkin
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Alexey Kardashevskiy, Jonathan Corbet, Ingo Molnar, Pawan Gupta,
Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
Yian Chen
>> +/*
>> + * The CLAC/STAC instructions toggle enforcement of X86_FEATURE_SMAP.
>> + * Add dedicated lass_*() variants for cases that are necessitated by
It would be useful to know when such a situation is necessitated. For
example, text_poke_mem* doesn't get flagged by SMAP but only by LASS. I
guess the answer is related to paging but it would be useful to describe
it in a commit message or a comment.
I am imagining a scenario where someone needs to use one of these
stac()/clac() pairs but isn't sure which one to use. Both of them would
seem to work but one is better suited than other.
>> + * LASS (X86_FEATURE_LASS) enforcement, which helps readability and
>> + * avoids AC flag flipping on CPUs that don't support LASS.
>> + */
>
> Maybe add a new line here? The comment is for the group of helpers, not
> for clac() specifically.
>
Also, it might be better to move the common text "/* Note: a barrier is
implicit in alternative() */" to the above comment as well.
Repeating it 4 times makes it unnecessarily distracting to read the code.
>> static __always_inline void clac(void)
>> {
>> /* Note: a barrier is implicit in alternative() */
>> @@ -39,6 +45,18 @@ static __always_inline void stac(void)
>> alternative("", __ASM_STAC, X86_FEATURE_SMAP);
>> }
>>
>> +static __always_inline void lass_clac(void)
>> +{
>> + /* Note: a barrier is implicit in alternative() */
>> + alternative("", __ASM_CLAC, X86_FEATURE_LASS);
>> +}
>> +
>> +static __always_inline void lass_stac(void)
>> +{
>> + /* Note: a barrier is implicit in alternative() */
>> + alternative("", __ASM_STAC, X86_FEATURE_LASS);
>> +}
>> +
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-28 16:07 ` [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization Alexander Shishkin
@ 2024-10-29 22:10 ` Sohil Mehta
2024-10-29 22:26 ` Luck, Tony
0 siblings, 1 reply; 42+ messages in thread
From: Sohil Mehta @ 2024-10-29 22:10 UTC (permalink / raw)
To: Alexander Shishkin, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Tony Luck, Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi
On 10/28/2024 9:07 AM, Alexander Shishkin wrote:
> In order to map the EFI runtime services, set_virtual_address_map
> needs to be called, which resides in the lower half of the address
> space. This means that LASS needs to be temporarily disabled around
> this call. This can only be done before the CR pinning is set up.
>
...
>
> /*
> * This needs to follow the FPU initializtion, since EFI depends on it.
> + * It also needs to precede the CR pinning setup, because we need to be
> + * able to temporarily clear the CR4.LASS bit in order to execute the
> + * set_virtual_address_map call, which resides in lower addresses and
> + * would trip LASS if enabled.
> */
It would be helpful to describe why lass_stac()/clac() doesn't work here
and instead the heavy handed CR4 toggling is needed.
> if (efi_enabled(EFI_RUNTIME_SERVICES))
> efi_enter_virtual_mode();
>
> + setup_cr_pinning();
> +
^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 22:10 ` Sohil Mehta
@ 2024-10-29 22:26 ` Luck, Tony
2024-10-29 22:52 ` Dave Hansen
0 siblings, 1 reply; 42+ messages in thread
From: Luck, Tony @ 2024-10-29 22:26 UTC (permalink / raw)
To: Mehta, Sohil, Alexander Shishkin, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Huang, Kai, Sandipan Das, Breno Leitao, Edgecombe, Rick P,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
> /*
> * This needs to follow the FPU initializtion, since EFI depends on it.
> + * It also needs to precede the CR pinning setup, because we need to be
> + * able to temporarily clear the CR4.LASS bit in order to execute the
> + * set_virtual_address_map call, which resides in lower addresses and
> + * would trip LASS if enabled.
> */
Why are the temporary mappings used to patch kernel code in the lower half
of the virtual address space? The comments in front of use_temporary_mm()
say:
* Using a temporary mm allows to set temporary mappings that are not accessible
* by other CPUs. Such mappings are needed to perform sensitive memory writes
* that override the kernel memory protections (e.g., W^X), without exposing the
* temporary page-table mappings that are required for these write operations to
* other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
* mapping is torn down.
But couldn't we map into upper half and do some/all of:
1) Trust that there aren't stupid bugs that dereference random pointers into the
temporary mapping?
2) Make a "this CPU only" mapping
3) Avoid preemption while patching so there is no need for TLB shootdown
by other CPUs when the temporary mapping is torn down, just flush local TLB.
-Tony
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 04/16] init/main.c: Move EFI runtime service initialization to x86/cpu
2024-10-28 16:07 ` [PATCH v5 04/16] init/main.c: Move EFI runtime service initialization to x86/cpu Alexander Shishkin
@ 2024-10-29 22:35 ` Sohil Mehta
2024-10-30 7:36 ` Ard Biesheuvel
1 sibling, 0 replies; 42+ messages in thread
From: Sohil Mehta @ 2024-10-29 22:35 UTC (permalink / raw)
To: Alexander Shishkin, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Tony Luck, Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi
Would a better title be?
x86/efi: Move runtime service initialization to arch/x86
On 10/28/2024 9:07 AM, Alexander Shishkin wrote:
> The EFI call in start_kernel() is guarded by #ifdef CONFIG_X86. Move
> the thing to the arch_cpu_finalize_init() path on x86 and get rid of
> the #ifdef in start_kernel().
>
> No functional change intended.
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> arch/x86/kernel/cpu/common.c | 7 +++++++
> init/main.c | 5 -----
> 2 files changed, 7 insertions(+), 5 deletions(-)
Other than that,
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 22:26 ` Luck, Tony
@ 2024-10-29 22:52 ` Dave Hansen
2024-10-29 22:59 ` Luck, Tony
0 siblings, 1 reply; 42+ messages in thread
From: Dave Hansen @ 2024-10-29 22:52 UTC (permalink / raw)
To: Luck, Tony, Mehta, Sohil, Alexander Shishkin, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Huang, Kai, Sandipan Das, Breno Leitao, Edgecombe, Rick P,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
On 10/29/24 15:26, Luck, Tony wrote:
>> /*
>> * This needs to follow the FPU initializtion, since EFI depends on it.
>> + * It also needs to precede the CR pinning setup, because we need to be
>> + * able to temporarily clear the CR4.LASS bit in order to execute the
>> + * set_virtual_address_map call, which resides in lower addresses and
>> + * would trip LASS if enabled.
>> */
>
> Why are the temporary mappings used to patch kernel code in the lower half
> of the virtual address space?
I was just asking myself the same thing. The upper half is always
mapped uniformly. When you create an MM you copy the 256->511th pgd
entries verbatim from the init_mm's pgd.
If you map something the <=255th pgd entry, it isn't (by default)
visible to other mm's. That's why a new mm also tends to get you a new
process.
> But couldn't we map into upper half and do some/all of:
>
> 1) Trust that there aren't stupid bugs that dereference random pointers into the
> temporary mapping?
> 2) Make a "this CPU only" mapping
> 3) Avoid preemption while patching so there is no need for TLB shootdown
> by other CPUs when the temporary mapping is torn down, just flush local TLB.
It's about enforcing R^X semantics. We should limit the time and scope
where mappings have some data both writeable and executable.
If we poke text in the upper half of the address space, any kernel
thread might be exploited to write to what will soon be executable.
If we do it in the lower half in its own mm, you have to compromise the
thread doing the text poking after the mapping is created but before it
is invalidated. With LASS you *ALSO* need to do it in the STAC/CLAC
window which is smaller than the window when the TLB is valid.
*IF* we switched things to do text poking in the upper half of the
address space, we'd probably want to find a completely unused PGD entry.
I'm not sure off the top of my head if we have a good one for that or
if it's worth the trouble.
^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 22:52 ` Dave Hansen
@ 2024-10-29 22:59 ` Luck, Tony
2024-10-29 23:02 ` H. Peter Anvin
2024-10-29 23:03 ` Dave Hansen
0 siblings, 2 replies; 42+ messages in thread
From: Luck, Tony @ 2024-10-29 22:59 UTC (permalink / raw)
To: Hansen, Dave, Mehta, Sohil, Alexander Shishkin, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Huang, Kai, Sandipan Das, Breno Leitao, Edgecombe, Rick P,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
> *IF* we switched things to do text poking in the upper half of the
> address space, we'd probably want to find a completely unused PGD entry.
> I'm not sure off the top of my head if we have a good one for that or
> if it's worth the trouble.
I expect that would be easy on 64-bit (no way the kernel needs all the
PGD entries from 256..511) but hard for 32-bit (where kernel address
space is in critically short supply on any machine with >1GB RAM).
-Tony
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 22:59 ` Luck, Tony
@ 2024-10-29 23:02 ` H. Peter Anvin
2024-10-29 23:03 ` Dave Hansen
1 sibling, 0 replies; 42+ messages in thread
From: H. Peter Anvin @ 2024-10-29 23:02 UTC (permalink / raw)
To: Luck, Tony, Hansen, Dave, Mehta, Sohil, Alexander Shishkin,
Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Huang, Kai, Sandipan Das, Breno Leitao, Edgecombe, Rick P,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
On 10/29/24 15:59, Luck, Tony wrote:
>> *IF* we switched things to do text poking in the upper half of the
>> address space, we'd probably want to find a completely unused PGD entry.
>> I'm not sure off the top of my head if we have a good one for that or
>> if it's worth the trouble.
>
> I expect that would be easy on 64-bit (no way the kernel needs all the
> PGD entries from 256..511) but hard for 32-bit (where kernel address
> space is in critically short supply on any machine with >1GB RAM).
No LASS on 32 bits...
-hpa
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 22:59 ` Luck, Tony
2024-10-29 23:02 ` H. Peter Anvin
@ 2024-10-29 23:03 ` Dave Hansen
2024-10-29 23:05 ` H. Peter Anvin
1 sibling, 1 reply; 42+ messages in thread
From: Dave Hansen @ 2024-10-29 23:03 UTC (permalink / raw)
To: Luck, Tony, Mehta, Sohil, Alexander Shishkin, Andy Lutomirski,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86@kernel.org, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Huang, Kai, Sandipan Das, Breno Leitao, Edgecombe, Rick P,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
On 10/29/24 15:59, Luck, Tony wrote:
>> *IF* we switched things to do text poking in the upper half of the
>> address space, we'd probably want to find a completely unused PGD entry.
>> I'm not sure off the top of my head if we have a good one for that or
>> if it's worth the trouble.
> I expect that would be easy on 64-bit (no way the kernel needs all the
> PGD entries from 256..511) but hard for 32-bit (where kernel address
> space is in critically short supply on any machine with >1GB RAM).
Yeah, I was talking about 64-bit only. On 32-bit PAE a PGD maps 1/4 of
the address space which is totally unworkable for stealing.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 23:03 ` Dave Hansen
@ 2024-10-29 23:05 ` H. Peter Anvin
2024-10-29 23:18 ` Luck, Tony
0 siblings, 1 reply; 42+ messages in thread
From: H. Peter Anvin @ 2024-10-29 23:05 UTC (permalink / raw)
To: Dave Hansen, Luck, Tony, Mehta, Sohil, Alexander Shishkin,
Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Huang, Kai, Sandipan Das, Breno Leitao, Edgecombe, Rick P,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
On 10/29/24 16:03, Dave Hansen wrote:
> On 10/29/24 15:59, Luck, Tony wrote:
>>> *IF* we switched things to do text poking in the upper half of the
>>> address space, we'd probably want to find a completely unused PGD entry.
>>> I'm not sure off the top of my head if we have a good one for that or
>>> if it's worth the trouble.
>> I expect that would be easy on 64-bit (no way the kernel needs all the
>> PGD entries from 256..511) but hard for 32-bit (where kernel address
>> space is in critically short supply on any machine with >1GB RAM).
>
> Yeah, I was talking about 64-bit only. On 32-bit PAE a PGD maps 1/4 of
> the address space which is totally unworkable for stealing.
But it is also not necessary.
-hpa
^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 23:05 ` H. Peter Anvin
@ 2024-10-29 23:18 ` Luck, Tony
2024-10-29 23:41 ` H. Peter Anvin
2024-10-30 11:43 ` Kirill A. Shutemov
0 siblings, 2 replies; 42+ messages in thread
From: Luck, Tony @ 2024-10-29 23:18 UTC (permalink / raw)
To: H. Peter Anvin, Hansen, Dave, Mehta, Sohil, Alexander Shishkin,
Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Huang, Kai, Sandipan Das, Breno Leitao, Edgecombe, Rick P,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
>> Yeah, I was talking about 64-bit only. On 32-bit PAE a PGD maps 1/4 of
>> the address space which is totally unworkable for stealing.
>
> But it is also not necessary.
So maybe we could make the 64-bit version of use_temporary_mm()
use some reserved address mapping to a reserved PGD in the upper
half of address space, and the 32-bit version continue to use "user"
addresses. It's unclear to me whether adding complexity here would be
worth it to remove the 64-bit STAC/CLAC text patching issues.
-Tony
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 23:18 ` Luck, Tony
@ 2024-10-29 23:41 ` H. Peter Anvin
2024-10-30 11:43 ` Kirill A. Shutemov
1 sibling, 0 replies; 42+ messages in thread
From: H. Peter Anvin @ 2024-10-29 23:41 UTC (permalink / raw)
To: Luck, Tony, Hansen, Dave, Mehta, Sohil, Alexander Shishkin,
Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Huang, Kai, Sandipan Das, Breno Leitao, Edgecombe, Rick P,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
On 10/29/24 16:18, Luck, Tony wrote:
>>> Yeah, I was talking about 64-bit only. On 32-bit PAE a PGD maps 1/4 of
>>> the address space which is totally unworkable for stealing.
>>
>> But it is also not necessary.
>
> So maybe we could make the 64-bit version of use_temporary_mm()
> use some reserved address mapping to a reserved PGD in the upper
> half of address space, and the 32-bit version continue to use "user"
> addresses. It's unclear to me whether adding complexity here would be
> worth it to remove the 64-bit STAC/CLAC text patching issues.
>
For 32 bits we can also simply use something further down in the
hierarchy. It's not like we can afford to have the PGD be anything other
than RWX on 32 bits.
-hpa
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 11/16] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS
2024-10-28 16:07 ` [PATCH v5 11/16] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS Alexander Shishkin
@ 2024-10-29 23:41 ` Sohil Mehta
0 siblings, 0 replies; 42+ messages in thread
From: Sohil Mehta @ 2024-10-29 23:41 UTC (permalink / raw)
To: Alexander Shishkin, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Tony Luck, Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
Dave Hansen
On 10/28/2024 9:07 AM, Alexander Shishkin wrote:
> Since EMULATE mode of vsyscall disables LASS, because fixing the LASS
> violations during the EMULATE mode would need complex instruction
> decoding, document this fact in kernel-parameters.txt.
>
> Cc: Andy Lutomirski <luto@kernel.org>
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
It might be better to combine this patch with the previous one. Both the
patches are small enough and related to the same thing.
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 1518343bbe22..4091dc48670a 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -7391,7 +7391,9 @@
>
> emulate Vsyscalls turn into traps and are emulated
> reasonably safely. The vsyscall page is
> - readable.
> + readable. This also disables the LASS
> + feature to allow userspace to poke around
> + the vsyscall page.
>
I am not sure if the person reading this guide would be aware of LASS.
Also the way the sentence is structured it might be easy to misinterpret
its meaning.
How about something like:
emulate Vsyscalls turn into traps and are emulated
reasonably safely. The vsyscall page is
readable. This disables the linear
address space separation (LASS) security
feature and makes the system less secure.
> xonly [default] Vsyscalls turn into traps and are
> emulated reasonably safely. The vsyscall
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 00/16] Enable Linear Address Space Separation support
2024-10-29 17:14 ` [PATCH v5 00/16] Enable Linear Address Space Separation support Matthew Wilcox
@ 2024-10-30 7:16 ` Alexander Shishkin
0 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-30 7:16 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, alexander.shishkin
Matthew Wilcox <willy@infradead.org> writes:
> On Mon, Oct 28, 2024 at 06:07:48PM +0200, Alexander Shishkin wrote:
> I lack the wit to read & understand these patches to answer this
> question, so I'll just ask it:
I was hoping they are readable and straightforward. Please, do point out
things that are not so, or not explained well enough, I'll fix them.
> What happens when the kernel does a NULL pointer dereference (due to a
> bug)? It's not an attempt to access userspace, but it should result in
> a good bug report. Normally this would be outside a STAC/CLAC region,
> but I suppose technically it could be within one.
Outside of STAC/CLAC there will be a message, see 13/16 or [0]. It
doesn't have helpful things like "if (address < PAGE_SIZE) printk("NULL
ptr deref\n);", but since it prints the address, I assumed it was
sufficient. Does this sound reasonable? Or is it preferrable to make it
look exactly like the !LASS NULL dereference?
Inside STAC/CLAC it should trigger a regular page fault and all the
error messages that result from it.
[0] https://lore.kernel.org/all/20241028160917.1380714-14-alexander.shishkin@linux.intel.com/
Thanks,
--
Alex
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 04/16] init/main.c: Move EFI runtime service initialization to x86/cpu
2024-10-28 16:07 ` [PATCH v5 04/16] init/main.c: Move EFI runtime service initialization to x86/cpu Alexander Shishkin
2024-10-29 22:35 ` Sohil Mehta
@ 2024-10-30 7:36 ` Ard Biesheuvel
1 sibling, 0 replies; 42+ messages in thread
From: Ard Biesheuvel @ 2024-10-30 7:36 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi
On Mon, 28 Oct 2024 at 17:10, Alexander Shishkin
<alexander.shishkin@linux.intel.com> wrote:
>
> The EFI call in start_kernel() is guarded by #ifdef CONFIG_X86. Move
> the thing to the arch_cpu_finalize_init() path on x86 and get rid of
> the #ifdef in start_kernel().
>
> No functional change intended.
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
> arch/x86/kernel/cpu/common.c | 7 +++++++
> init/main.c | 5 -----
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 8f41ab219cf1..b24ad418536e 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -26,6 +26,7 @@
> #include <linux/pgtable.h>
> #include <linux/stackprotector.h>
> #include <linux/utsname.h>
> +#include <linux/efi.h>
>
> #include <asm/alternative.h>
> #include <asm/cmdline.h>
> @@ -2382,6 +2383,12 @@ void __init arch_cpu_finalize_init(void)
> fpu__init_system();
> fpu__init_cpu();
>
> + /*
> + * This needs to follow the FPU initializtion, since EFI depends on it.
> + */
> + if (efi_enabled(EFI_RUNTIME_SERVICES))
> + efi_enter_virtual_mode();
> +
> /*
> * Ensure that access to the per CPU representation has the initial
> * boot CPU configuration.
> diff --git a/init/main.c b/init/main.c
> index c4778edae797..1d3a0a82d136 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -51,7 +51,6 @@
> #include <linux/cpu.h>
> #include <linux/cpuset.h>
> #include <linux/cgroup.h>
> -#include <linux/efi.h>
> #include <linux/tick.h>
> #include <linux/sched/isolation.h>
> #include <linux/interrupt.h>
> @@ -1072,10 +1071,6 @@ void start_kernel(void)
>
> pid_idr_init();
> anon_vma_init();
> -#ifdef CONFIG_X86
> - if (efi_enabled(EFI_RUNTIME_SERVICES))
> - efi_enter_virtual_mode();
> -#endif
> thread_stack_cache_init();
> cred_init();
> fork_init();
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives
2024-10-29 18:48 ` Peter Zijlstra
@ 2024-10-30 7:40 ` Alexander Shishkin
0 siblings, 0 replies; 42+ messages in thread
From: Alexander Shishkin @ 2024-10-30 7:40 UTC (permalink / raw)
To: Peter Zijlstra, Dave Hansen
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Kirill A. Shutemov, Alexey Kardashevskiy, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, alexander.shishkin
Peter Zijlstra <peterz@infradead.org> writes:
> On Tue, Oct 29, 2024 at 12:36:11PM +0100, Peter Zijlstra wrote:
> static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
> {
> void *ret = to;
>
> - asm volatile("rep movsb"
> - : "+D" (to), "+S" (from), "+c" (len)
> - : : "memory");
> - return ret;
> + asm volatile("1:\n\t"
> + ALT_64("rep movsb",
> + "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM))
I don't know if it matters, but this basically brings in a whole memcpy
to a text_poke situation, which should only be a handful of bytes, and
creates a new stack frame in the !FSRM case, which the __always_inline
was intending to avoid. But given what text_poke is, maybe micro
optimizations don't really matter. And fewer memcpy() implementations
seems like a good idea.
Thanks,
--
Alex
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization
2024-10-29 23:18 ` Luck, Tony
2024-10-29 23:41 ` H. Peter Anvin
@ 2024-10-30 11:43 ` Kirill A. Shutemov
1 sibling, 0 replies; 42+ messages in thread
From: Kirill A. Shutemov @ 2024-10-30 11:43 UTC (permalink / raw)
To: Luck, Tony
Cc: H. Peter Anvin, Hansen, Dave, Mehta, Sohil, Alexander Shishkin,
Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86@kernel.org, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Li, Xin3,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Alexey Kardashevskiy, Jonathan Corbet, Ingo Molnar, Pawan Gupta,
Daniel Sneddon, Huang, Kai, Sandipan Das, Breno Leitao,
Edgecombe, Rick P, Alexei Starovoitov, Hou Tao, Juergen Gross,
Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org
On Tue, Oct 29, 2024 at 11:18:29PM +0000, Luck, Tony wrote:
> >> Yeah, I was talking about 64-bit only. On 32-bit PAE a PGD maps 1/4 of
> >> the address space which is totally unworkable for stealing.
> >
> > But it is also not necessary.
>
> So maybe we could make the 64-bit version of use_temporary_mm()
> use some reserved address mapping to a reserved PGD in the upper
> half of address space, and the 32-bit version continue to use "user"
> addresses. It's unclear to me whether adding complexity here would be
> worth it to remove the 64-bit STAC/CLAC text patching issues.
Redesigning use_temporary_mm() is an interesting experiment, but it is
out of scope for the series.
LASS blocks LAM enabling. It would be nice to get LAM re-enabled soonish.
Maybe we can look at use_temporary_mm() again after LASS gets upstream?
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v5 14/16] x86/cpu: Make LAM depend on LASS
2024-10-28 16:08 ` [PATCH v5 14/16] x86/cpu: Make LAM depend on LASS Alexander Shishkin
@ 2024-10-31 0:06 ` Sohil Mehta
0 siblings, 0 replies; 42+ messages in thread
From: Sohil Mehta @ 2024-10-31 0:06 UTC (permalink / raw)
To: Alexander Shishkin, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Tony Luck, Kirill A. Shutemov, Alexey Kardashevskiy
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi
On 10/28/2024 9:08 AM, Alexander Shishkin wrote:
> To prevent exploits for Spectre based on LAM as demonstrated by the
> whitepaper [1], make LAM depend on LASS, which avoids this type of
> vulnerability.
>
> [1] https://download.vusec.net/papers/slam_sp24.pdf
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> ---
> arch/x86/kernel/cpu/cpuid-deps.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index 3f73c4b03348..d9fb2423605e 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> { X86_FEATURE_SHSTK, X86_FEATURE_XSAVES },
> { X86_FEATURE_FRED, X86_FEATURE_LKGS },
> { X86_FEATURE_LASS, X86_FEATURE_SMAP },
> + { X86_FEATURE_LAM, X86_FEATURE_LASS },
The dependencies listed in cpuid_deps[] are only enforced when a feature
such as LASS is explicitly disabled. If the system is missing LASS at
boot then LAM would still be enabled.
We would need this patch to enforce it:
https://lore.kernel.org/lkml/20241030233118.615493-1-sohil.mehta@intel.com/
Sohil
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2024-10-31 0:06 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-28 16:07 [PATCH v5 00/16] Enable Linear Address Space Separation support Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 01/16] x86/cpu: Enumerate the LASS feature bits Alexander Shishkin
2024-10-29 14:55 ` Kirill A. Shutemov
2024-10-29 21:46 ` Sohil Mehta
2024-10-28 16:07 ` [PATCH v5 02/16] x86/asm: Introduce inline memcpy and memset Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 03/16] x86/alternatives: Disable LASS when patching kernel alternatives Alexander Shishkin
2024-10-28 17:49 ` Dave Hansen
2024-10-29 11:36 ` Peter Zijlstra
2024-10-29 18:48 ` Peter Zijlstra
2024-10-30 7:40 ` Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 04/16] init/main.c: Move EFI runtime service initialization to x86/cpu Alexander Shishkin
2024-10-29 22:35 ` Sohil Mehta
2024-10-30 7:36 ` Ard Biesheuvel
2024-10-28 16:07 ` [PATCH v5 05/16] x86/cpu: Defer CR pinning setup until after EFI initialization Alexander Shishkin
2024-10-29 22:10 ` Sohil Mehta
2024-10-29 22:26 ` Luck, Tony
2024-10-29 22:52 ` Dave Hansen
2024-10-29 22:59 ` Luck, Tony
2024-10-29 23:02 ` H. Peter Anvin
2024-10-29 23:03 ` Dave Hansen
2024-10-29 23:05 ` H. Peter Anvin
2024-10-29 23:18 ` Luck, Tony
2024-10-29 23:41 ` H. Peter Anvin
2024-10-30 11:43 ` Kirill A. Shutemov
2024-10-28 16:07 ` [PATCH v5 06/16] efi: Disable LASS around set_virtual_address_map call Alexander Shishkin
2024-10-29 15:00 ` Kirill A. Shutemov
2024-10-28 16:07 ` [PATCH v5 07/16] x86/vsyscall: Reorganize the #PF emulation code Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 08/16] x86/traps: Consolidate user fixups in exc_general_protection() Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 09/16] x86/vsyscall: Add vsyscall emulation for #GP Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 10/16] x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE Alexander Shishkin
2024-10-28 16:07 ` [PATCH v5 11/16] x86/vsyscall: Document the fact that vsyscall=emulate disables LASS Alexander Shishkin
2024-10-29 23:41 ` Sohil Mehta
2024-10-28 16:08 ` [PATCH v5 12/16] x86/cpu: Set LASS CR4 bit as pinning sensitive Alexander Shishkin
2024-10-28 16:08 ` [PATCH v5 13/16] x86/traps: Communicate a LASS violation in #GP message Alexander Shishkin
2024-10-28 16:08 ` [PATCH v5 14/16] x86/cpu: Make LAM depend on LASS Alexander Shishkin
2024-10-31 0:06 ` Sohil Mehta
2024-10-28 16:08 ` [PATCH v5 15/16] x86/cpu: Enable LASS during CPU initialization Alexander Shishkin
2024-10-28 16:08 ` [PATCH v5 16/16] Revert "x86/lam: Disable ADDRESS_MASKING in most cases" Alexander Shishkin
2024-10-28 20:41 ` Kirill A. Shutemov
2024-10-28 22:00 ` Alexander Shishkin
2024-10-29 17:14 ` [PATCH v5 00/16] Enable Linear Address Space Separation support Matthew Wilcox
2024-10-30 7:16 ` Alexander Shishkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).