linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
       [not found] ` <20250707080317.3791624-3-kirill.shutemov@linux.intel.com>
@ 2025-07-09  1:08   ` Sohil Mehta
  2025-07-09  9:35     ` Kirill A. Shutemov
  2025-07-09 16:58   ` Dave Hansen
  2025-07-28 19:11   ` David Laight
  2 siblings, 1 reply; 30+ messages in thread
From: Sohil Mehta @ 2025-07-09  1:08 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> From: Sohil Mehta <sohil.mehta@intel.com>
> 
> For patching, the kernel initializes a temporary mm area in the lower
> half of the address range. See commit 4fc19708b165 ("x86/alternatives:
> Initialize temporary mm for patching").
> 
> Disable LASS enforcement during patching to avoid triggering a #GP
> fault.
> 
> The objtool warns due to a call to a non-allowed function that exists
> outside of the stac/clac guard, or references to any function with a
> dynamic function pointer inside the guard. See the Objtool warnings
> section #9 in the document tools/objtool/Documentation/objtool.txt.
> 
> Considering that patching is usually small, replace the memcpy() and
> memset() functions in the text poking functions with their open coded
> versions.
> 
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Kirill, it might be worth adding your co-developed-by tag. The patch has
more changes than I can claim credit for.

> ---
>  arch/x86/include/asm/smap.h   | 33 +++++++++++++++++++++++++++++++--
>  arch/x86/kernel/alternative.c | 28 ++++++++++++++++++++++++++--
>  2 files changed, 57 insertions(+), 4 deletions(-)
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
       [not found] ` <20250707080317.3791624-5-kirill.shutemov@linux.intel.com>
@ 2025-07-09  1:19   ` Sohil Mehta
  2025-07-09  9:38     ` Kirill A. Shutemov
  2025-07-09 17:00   ` Dave Hansen
  1 sibling, 1 reply; 30+ messages in thread
From: Sohil Mehta @ 2025-07-09  1:19 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> 
> In order to map the EFI runtime services, set_virtual_address_map()
> needs to be called, which resides in the lower half of the address
> space. This means that LASS needs to be temporarily disabled around
> this call. This can only be done before the CR pinning is set up.
> 
> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
> arch_cpu_finalize_init(), defer it until core initcall.
> 
> Wrapping efi_enter_virtual_mode() into lass_stac()/clac() is not enough
> because AC flag gates data accesses, but not instruction fetch. Clearing
> the CR4 bit is required.
> 

I think the wording might need to be reordered. How about?

In order to map the EFI runtime services, set_virtual_address_map()
needs to be called, which resides in the lower half of the address
space. This means that LASS needs to be temporarily disabled around
this call.

Wrapping efi_enter_virtual_mode() into lass_stac()/clac() is not enough
because AC flag gates data accesses, but not instruction fetch. Clearing
the CR4 bit is required.

However, this must be done before the CR pinning is set up. Instead of
arbitrarily moving setup_cr_pinning() after efi_enter_virtual_mode() in
arch_cpu_finalize_init(), defer it until core initcall.

Other than that,
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>

> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/cpu/common.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index ec62e2f9ea16..f10f9f618805 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -490,11 +490,14 @@ void cr4_init(void)
>   * parsed), record any of the sensitive CR bits that are set, and
>   * enable CR pinning.
>   */
> -static void __init setup_cr_pinning(void)
> +static int __init setup_cr_pinning(void)
>  {
>  	cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
>  	static_key_enable(&cr_pinning.key);
> +
> +	return 0;
>  }
> +core_initcall(setup_cr_pinning);
>  
>  static __init int x86_nofsgsbase_setup(char *arg)
>  {
> @@ -2082,7 +2085,6 @@ static __init void identify_boot_cpu(void)
>  	enable_sep_cpu();
>  #endif
>  	cpu_detect_tlb(&boot_cpu_data);
> -	setup_cr_pinning();
>  
>  	tsx_init();
>  	tdx_init();



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 05/16] efi: Disable LASS around set_virtual_address_map() EFI call
       [not found] ` <20250707080317.3791624-6-kirill.shutemov@linux.intel.com>
@ 2025-07-09  1:27   ` Sohil Mehta
  0 siblings, 0 replies; 30+ messages in thread
From: Sohil Mehta @ 2025-07-09  1:27 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> 
> Of all the EFI runtime services, set_virtual_address_map() is the only
> one that is called at its lower mapping, which LASS prohibits regardless
> of EFLAGS.AC setting. The only way to allow this to happen is to disable
> LASS in the CR4 register.
> 
> Disable LASS around this low address EFI call.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/platform/efi/efi.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 

Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>

A minor nit below.

> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 463b784499a8..5b23c0daedef 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -787,6 +787,7 @@ static void __init __efi_enter_virtual_mode(void)
>  	int count = 0, pg_shift = 0;
>  	void *new_memmap = NULL;
>  	efi_status_t status;
> +	unsigned long lass;
>  	unsigned long pa;
>  

The two unsigned longs can be on the same line.

>  	if (efi_alloc_page_tables()) {


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 11/16] x86/traps: Communicate a LASS violation in #GP message
       [not found] ` <20250707080317.3791624-12-kirill.shutemov@linux.intel.com>
@ 2025-07-09  2:40   ` Sohil Mehta
  2025-07-09  9:31     ` Kirill A. Shutemov
  0 siblings, 1 reply; 30+ messages in thread
From: Sohil Mehta @ 2025-07-09  2:40 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> 
> Provide a more helpful message on #GP when a kernel side LASS violation
> is detected.
> 
> A NULL pointer dereference is reported if a LASS violation occurs due to
> accessing the first page frame.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/traps.c | 41 +++++++++++++++++++++++++++++------------
>  1 file changed, 29 insertions(+), 12 deletions(-)
> 

Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>

A nit below.

> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 59bfbdf0a1a0..4a4194e1d119 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -636,7 +636,16 @@ DEFINE_IDTENTRY(exc_bounds)
>  enum kernel_gp_hint {
>  	GP_NO_HINT,
>  	GP_NON_CANONICAL,
> -	GP_CANONICAL
> +	GP_CANONICAL,
> +	GP_LASS_VIOLATION,
> +	GP_NULL_POINTER,
> +};
> +
> +static const char * const kernel_gp_hint_help[] = {
> +	[GP_NON_CANONICAL]	= "probably for non-canonical address",
> +	[GP_CANONICAL]		= "maybe for address",
> +	[GP_LASS_VIOLATION]	= "LASS prevented access to address",
> +	[GP_NULL_POINTER]	= "kernel NULL pointer dereference",
>  };
>  
>  /*
> @@ -664,14 +673,23 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
>  		return GP_NO_HINT;
>  
>  #ifdef CONFIG_X86_64

Might as well get rid of the #ifdef in C code, if possible.

if (!IS_ENABLED(CONFIG_X86_64)
	return GP_CANONICAL;

or combine it with the next check.

> -	/*
> -	 * Check that:
> -	 *  - the operand is not in the kernel half
> -	 *  - the last byte of the operand is not in the user canonical half
> -	 */
> -	if (*addr < ~__VIRTUAL_MASK &&
> -	    *addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
> +	/* Operand is in the kernel half */
> +	if (*addr >= ~__VIRTUAL_MASK)
> +		return GP_CANONICAL;
> +
> +	/* The last byte of the operand is not in the user canonical half */
> +	if (*addr + insn.opnd_bytes - 1 > __VIRTUAL_MASK)
>  		return GP_NON_CANONICAL;
> +
> +	/*
> +	 * If LASS is enabled, NULL pointer dereference generates
> +	 * #GP instead of #PF.
> +	 */
> +	if (*addr < PAGE_SIZE)
> +		return GP_NULL_POINTER;
> +
> +	if (cpu_feature_enabled(X86_FEATURE_LASS))
> +		return GP_LASS_VIOLATION;
>  #endif
>  
>  	return GP_CANONICAL;


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 12/16] x86/traps: Generalize #GP address decode and hint code
       [not found] ` <20250707080317.3791624-13-kirill.shutemov@linux.intel.com>
@ 2025-07-09  4:59   ` Sohil Mehta
  0 siblings, 0 replies; 30+ messages in thread
From: Sohil Mehta @ 2025-07-09  4:59 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> In most cases, an access causing a LASS violation results in a general
> protection exception (#GP); for stack accesses (those due to
> stack-oriented instructions, as well as accesses that implicitly or
> explicitly use the SS segment register), a stack fault (#SS) is
> generated.
> 
> Handlers for #GP and #SS will now share code to decode the exception
> address and retrieve the exception hint string.
> 
> The helper, enum, and array should be renamed as they are no longer
> specific to #GP.
> 
> No functional change intended.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/dumpstack.c |  6 ++--
>  arch/x86/kernel/traps.c     | 58 ++++++++++++++++++-------------------
>  2 files changed, 32 insertions(+), 32 deletions(-)
> 

Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 13/16] x86/traps: Handle LASS thrown #SS
       [not found] ` <20250707080317.3791624-14-kirill.shutemov@linux.intel.com>
@ 2025-07-09  5:12   ` Sohil Mehta
  2025-07-09 10:38     ` Kirill A. Shutemov
  2025-07-11  1:23   ` Sohil Mehta
  1 sibling, 1 reply; 30+ messages in thread
From: Sohil Mehta @ 2025-07-09  5:12 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:

> +	hint = get_kernel_exc_address(regs, &exc_addr);
> +	if (hint != EXC_NO_HINT)
> +		printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint], exc_addr);
> +
> +	if (hint != EXC_NON_CANONICAL)
> +		exc_addr = 0;
> +
> +	die_addr(SSFSTR, regs, error_code, exc_addr);

I see a slight difference between the #GP handling and the #SS handling
here. For the #GP case, we seem to pass the hint string to die_addr().

However, for the #SS, the hint is printed above and only SSFSTR gets
passed onto die_addr(). I am curious about the reasoning.

> +	return;
> +
> +error_trap:
> +	do_error_trap(regs, error_code, SSFSTR, X86_TRAP_SS, SIGBUS, 0, NULL);
> +}
> +


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 16/16] x86: Re-enable Linear Address Masking
       [not found] ` <20250707080317.3791624-17-kirill.shutemov@linux.intel.com>
@ 2025-07-09  5:31   ` Sohil Mehta
  2025-07-09 11:00     ` Kirill A. Shutemov
  0 siblings, 1 reply; 30+ messages in thread
From: Sohil Mehta @ 2025-07-09  5:31 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> This reverts commit 3267cb6d3a174ff83d6287dcd5b0047bbd912452.
> 
> LASS mitigates the Spectre based on LAM (SLAM) [1] and the previous
> commit made LAM depend on LASS, so we no longer need to disable LAM at
> compile time, so revert the commit that disables LAM.
> 

Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>

You may have missed my comments in the previous revision.
https://lore.kernel.org/all/af709ffa-eb11-4de5-9ae8-a179cb99750c@intel.com/

Mainly, x86 maintainers prefer imperative tone and references such as
"previous commit" can be confusing sometimes.


> Adjust USER_PTR_MAX if LAM enabled, allowing tag bits to be set for
> userspace pointers. The value for the constant is defined in a way to
> avoid overflow compiler warning on 32-bit config.
> 
> [1] https://download.vusec.net/papers/slam_sp24.pdf
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
>  arch/x86/Kconfig             | 1 -
>  arch/x86/kernel/cpu/common.c | 5 +----
>  2 files changed, 1 insertion(+), 5 deletions(-)
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 11/16] x86/traps: Communicate a LASS violation in #GP message
  2025-07-09  2:40   ` [PATCHv9 11/16] x86/traps: Communicate a LASS violation in #GP message Sohil Mehta
@ 2025-07-09  9:31     ` Kirill A. Shutemov
  2025-07-09  9:36       ` Geert Uytterhoeven
  0 siblings, 1 reply; 30+ messages in thread
From: Kirill A. Shutemov @ 2025-07-09  9:31 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 08, 2025 at 07:40:35PM -0700, Sohil Mehta wrote:
> > @@ -664,14 +673,23 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
> >  		return GP_NO_HINT;
> >  
> >  #ifdef CONFIG_X86_64
> 
> Might as well get rid of the #ifdef in C code, if possible.
> 
> if (!IS_ENABLED(CONFIG_X86_64)
> 	return GP_CANONICAL;
> 
> or combine it with the next check.

I tried this before. It triggers compiler error on 32-bit:

arch/x86/kernel/traps.c:673:16: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
  673 |         if (*addr >= ~__VIRTUAL_MASK)
      |                       ^~~~~~~~~~~~~~

__VIRTUAL_MASK is not usable on 32-bit configs.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-07-09  1:08   ` [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
@ 2025-07-09  9:35     ` Kirill A. Shutemov
  0 siblings, 0 replies; 30+ messages in thread
From: Kirill A. Shutemov @ 2025-07-09  9:35 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 08, 2025 at 06:08:18PM -0700, Sohil Mehta wrote:
> On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> > From: Sohil Mehta <sohil.mehta@intel.com>
> > 
> > For patching, the kernel initializes a temporary mm area in the lower
> > half of the address range. See commit 4fc19708b165 ("x86/alternatives:
> > Initialize temporary mm for patching").
> > 
> > Disable LASS enforcement during patching to avoid triggering a #GP
> > fault.
> > 
> > The objtool warns due to a call to a non-allowed function that exists
> > outside of the stac/clac guard, or references to any function with a
> > dynamic function pointer inside the guard. See the Objtool warnings
> > section #9 in the document tools/objtool/Documentation/objtool.txt.
> > 
> > Considering that patching is usually small, replace the memcpy() and
> > memset() functions in the text poking functions with their open coded
> > versions.
> > 
> > Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> Kirill, it might be worth adding your co-developed-by tag. The patch has
> more changes than I can claim credit for.

Okay. Will do.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 11/16] x86/traps: Communicate a LASS violation in #GP message
  2025-07-09  9:31     ` Kirill A. Shutemov
@ 2025-07-09  9:36       ` Geert Uytterhoeven
  2025-07-09  9:51         ` Kirill A. Shutemov
  0 siblings, 1 reply; 30+ messages in thread
From: Geert Uytterhoeven @ 2025-07-09  9:36 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Sohil Mehta, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin,
	Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

Hi Kirill,

On Wed, 9 Jul 2025 at 11:31, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
> On Tue, Jul 08, 2025 at 07:40:35PM -0700, Sohil Mehta wrote:
> > > @@ -664,14 +673,23 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
> > >             return GP_NO_HINT;
> > >
> > >  #ifdef CONFIG_X86_64
> >
> > Might as well get rid of the #ifdef in C code, if possible.
> >
> > if (!IS_ENABLED(CONFIG_X86_64)
> >       return GP_CANONICAL;
> >
> > or combine it with the next check.
>
> I tried this before. It triggers compiler error on 32-bit:
>
> arch/x86/kernel/traps.c:673:16: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
>   673 |         if (*addr >= ~__VIRTUAL_MASK)
>       |                       ^~~~~~~~~~~~~~
>
> __VIRTUAL_MASK is not usable on 32-bit configs.

arch/x86/include/asm/page_32_types.h:#define __VIRTUAL_MASK_SHIFT       32
arch/x86/include/asm/page_32_types.h:#define __VIRTUAL_MASK_SHIFT       32
arch/x86/include/asm/page_64_types.h:#define __VIRTUAL_MASK_SHIFT
 (pgtable_l5_enabled() ? 56 : 47)
arch/x86/include/asm/page_types.h:#define __VIRTUAL_MASK
 ((1UL << __VIRTUAL_MASK_SHIFT) - 1)

Given __VIRTUAL_MASK_SHIFT is 32 on 32-bit platforms, perhaps
__VIRTUAL_MASK should just be changed to shift 1ULL instead?
Or better, use GENMASK(__VIRTUAL_MASK_SHIFT - 1, 0), so the
resulting type is still unsigned long.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
  2025-07-09  1:19   ` [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall Sohil Mehta
@ 2025-07-09  9:38     ` Kirill A. Shutemov
  0 siblings, 0 replies; 30+ messages in thread
From: Kirill A. Shutemov @ 2025-07-09  9:38 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 08, 2025 at 06:19:03PM -0700, Sohil Mehta wrote:
> On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> > From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > 
> > In order to map the EFI runtime services, set_virtual_address_map()
> > needs to be called, which resides in the lower half of the address
> > space. This means that LASS needs to be temporarily disabled around
> > this call. This can only be done before the CR pinning is set up.
> > 
> > Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
> > arch_cpu_finalize_init(), defer it until core initcall.
> > 
> > Wrapping efi_enter_virtual_mode() into lass_stac()/clac() is not enough
> > because AC flag gates data accesses, but not instruction fetch. Clearing
> > the CR4 bit is required.
> > 
> 
> I think the wording might need to be reordered. How about?
> 
> In order to map the EFI runtime services, set_virtual_address_map()
> needs to be called, which resides in the lower half of the address
> space. This means that LASS needs to be temporarily disabled around
> this call.
> 
> Wrapping efi_enter_virtual_mode() into lass_stac()/clac() is not enough
> because AC flag gates data accesses, but not instruction fetch. Clearing
> the CR4 bit is required.
> 
> However, this must be done before the CR pinning is set up. Instead of
> arbitrarily moving setup_cr_pinning() after efi_enter_virtual_mode() in
> arch_cpu_finalize_init(), defer it until core initcall.
> 
> Other than that,
> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>

Okay, looks good, thanks!

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 11/16] x86/traps: Communicate a LASS violation in #GP message
  2025-07-09  9:36       ` Geert Uytterhoeven
@ 2025-07-09  9:51         ` Kirill A. Shutemov
  0 siblings, 0 replies; 30+ messages in thread
From: Kirill A. Shutemov @ 2025-07-09  9:51 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Sohil Mehta, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin,
	Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On Wed, Jul 09, 2025 at 11:36:27AM +0200, Geert Uytterhoeven wrote:
> Hi Kirill,
> 
> On Wed, 9 Jul 2025 at 11:31, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> > On Tue, Jul 08, 2025 at 07:40:35PM -0700, Sohil Mehta wrote:
> > > > @@ -664,14 +673,23 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
> > > >             return GP_NO_HINT;
> > > >
> > > >  #ifdef CONFIG_X86_64
> > >
> > > Might as well get rid of the #ifdef in C code, if possible.
> > >
> > > if (!IS_ENABLED(CONFIG_X86_64)
> > >       return GP_CANONICAL;
> > >
> > > or combine it with the next check.
> >
> > I tried this before. It triggers compiler error on 32-bit:
> >
> > arch/x86/kernel/traps.c:673:16: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
> >   673 |         if (*addr >= ~__VIRTUAL_MASK)
> >       |                       ^~~~~~~~~~~~~~
> >
> > __VIRTUAL_MASK is not usable on 32-bit configs.
> 
> arch/x86/include/asm/page_32_types.h:#define __VIRTUAL_MASK_SHIFT       32
> arch/x86/include/asm/page_32_types.h:#define __VIRTUAL_MASK_SHIFT       32
> arch/x86/include/asm/page_64_types.h:#define __VIRTUAL_MASK_SHIFT
>  (pgtable_l5_enabled() ? 56 : 47)
> arch/x86/include/asm/page_types.h:#define __VIRTUAL_MASK
>  ((1UL << __VIRTUAL_MASK_SHIFT) - 1)
> 
> Given __VIRTUAL_MASK_SHIFT is 32 on 32-bit platforms, perhaps
> __VIRTUAL_MASK should just be changed to shift 1ULL instead?
> Or better, use GENMASK(__VIRTUAL_MASK_SHIFT - 1, 0), so the
> resulting type is still unsigned long.

Making __VIRTUAL_MASK unsigned long long is no-go. Virtual address are
unsigned long. I guess GENMASK() would work.

I think re-defining __VIRTUAL_MASK is out-of-scope for the patchset. Feel
free to prepare a separate patch to do it.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 13/16] x86/traps: Handle LASS thrown #SS
  2025-07-09  5:12   ` [PATCHv9 13/16] x86/traps: Handle LASS thrown #SS Sohil Mehta
@ 2025-07-09 10:38     ` Kirill A. Shutemov
  2025-07-11  1:22       ` Sohil Mehta
  0 siblings, 1 reply; 30+ messages in thread
From: Kirill A. Shutemov @ 2025-07-09 10:38 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 08, 2025 at 10:12:28PM -0700, Sohil Mehta wrote:
> On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> 
> > +	hint = get_kernel_exc_address(regs, &exc_addr);
> > +	if (hint != EXC_NO_HINT)
> > +		printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint], exc_addr);
> > +
> > +	if (hint != EXC_NON_CANONICAL)
> > +		exc_addr = 0;
> > +
> > +	die_addr(SSFSTR, regs, error_code, exc_addr);
> 
> I see a slight difference between the #GP handling and the #SS handling
> here. For the #GP case, we seem to pass the hint string to die_addr().
> 
> However, for the #SS, the hint is printed above and only SSFSTR gets
> passed onto die_addr(). I am curious about the reasoning.

I hate how 'desc' size is defined in #GP handler:

	char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;

Too much voodoo to my liking. And it will overflow if any hint string is
going to be longer than 50.

I don't want to repeat this magic for #SS.

I would argue we need to print directly in #GP handler as I do in #SS.
But, IMO, it is outside of the scope of this patchset.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 16/16] x86: Re-enable Linear Address Masking
  2025-07-09  5:31   ` [PATCHv9 16/16] x86: Re-enable Linear Address Masking Sohil Mehta
@ 2025-07-09 11:00     ` Kirill A. Shutemov
  2025-07-11  0:42       ` Sohil Mehta
  0 siblings, 1 reply; 30+ messages in thread
From: Kirill A. Shutemov @ 2025-07-09 11:00 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Tue, Jul 08, 2025 at 10:31:05PM -0700, Sohil Mehta wrote:
> On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> > This reverts commit 3267cb6d3a174ff83d6287dcd5b0047bbd912452.
> > 
> > LASS mitigates the Spectre based on LAM (SLAM) [1] and the previous
> > commit made LAM depend on LASS, so we no longer need to disable LAM at
> > compile time, so revert the commit that disables LAM.
> > 
> 
> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
> 
> You may have missed my comments in the previous revision.
> https://lore.kernel.org/all/af709ffa-eb11-4de5-9ae8-a179cb99750c@intel.com/
> 
> Mainly, x86 maintainers prefer imperative tone and references such as
> "previous commit" can be confusing sometimes.

Indeed, missed. My bad.

I've merged last two patches and updated the commit message:

https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=x86/lass

I hope it is still okay to use your Reviewed-by tag.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
       [not found] ` <20250707080317.3791624-3-kirill.shutemov@linux.intel.com>
  2025-07-09  1:08   ` [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
@ 2025-07-09 16:58   ` Dave Hansen
  2025-07-25  2:35     ` Sohil Mehta
  2025-07-28 19:11   ` David Laight
  2 siblings, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2025-07-09 16:58 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/25 01:03, Kirill A. Shutemov wrote:
> From: Sohil Mehta <sohil.mehta@intel.com>
> 
> For patching, the kernel initializes a temporary mm area in the lower
> half of the address range. See commit 4fc19708b165 ("x86/alternatives:
> Initialize temporary mm for patching").
> 
> Disable LASS enforcement during patching to avoid triggering a #GP
> fault.
> 
> The objtool warns due to a call to a non-allowed function that exists
> outside of the stac/clac guard, or references to any function with a
> dynamic function pointer inside the guard. See the Objtool warnings
> section #9 in the document tools/objtool/Documentation/objtool.txt.
> 
> Considering that patching is usually small, replace the memcpy() and
> memset() functions in the text poking functions with their open coded
> versions.
> 
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/include/asm/smap.h   | 33 +++++++++++++++++++++++++++++++--
>  arch/x86/kernel/alternative.c | 28 ++++++++++++++++++++++++++--
>  2 files changed, 57 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
> index 4f84d421d1cf..d0cc24348641 100644
> --- a/arch/x86/include/asm/smap.h
> +++ b/arch/x86/include/asm/smap.h
> @@ -23,18 +23,47 @@
>  
>  #else /* __ASSEMBLER__ */
>  
> +/*
> + * The CLAC/STAC instructions toggle the enforcement of X86_FEATURE_SMAP and
> + * X86_FEATURE_LASS.
> + *
> + * SMAP enforcement is based on the _PAGE_BIT_USER bit in the page tables: the
> + * kernel is not allowed to touch pages with the bit set unless the AC bit is
> + * set.
> + *
> + * LASS enforcement is based on bit 63 of the virtual address. The kernel is
> + * not allowed to touch memory in the lower half of the virtual address space
> + * unless the AC bit is set.
> + *
> + * Use stac()/clac() when accessing userspace (_PAGE_USER) mappings,
> + * regardless of location.
> + *
> + * Use lass_stac()/lass_clac() when accessing kernel mappings (!_PAGE_USER)
> + * in the lower half of the address space.
> + *
> + * Note: a barrier is implicit in alternative().
> + */
> +
>  static __always_inline void clac(void)
>  {
> -	/* Note: a barrier is implicit in alternative() */
>  	alternative("", "clac", X86_FEATURE_SMAP);
>  }
>  
>  static __always_inline void stac(void)
>  {
> -	/* Note: a barrier is implicit in alternative() */
>  	alternative("", "stac", X86_FEATURE_SMAP);
>  }
>  
> +static __always_inline void lass_clac(void)
> +{
> +	alternative("", "clac", X86_FEATURE_LASS);
> +}
> +
> +static __always_inline void lass_stac(void)
> +{
> +	alternative("", "stac", X86_FEATURE_LASS);
> +}

Could we please move the comments about lass_*() closer to the LASS
functions?

>  static __always_inline unsigned long smap_save(void)
>  {
>  	unsigned long flags;
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index ea1d984166cd..992ece0e879a 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -2447,16 +2447,40 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
>  __ro_after_init struct mm_struct *text_poke_mm;
>  __ro_after_init unsigned long text_poke_mm_addr;
>  
> +/*
> + * Text poking creates and uses a mapping in the lower half of the
> + * address space. Relax LASS enforcement when accessing the poking
> + * address.
> + */
> +
>  static void text_poke_memcpy(void *dst, const void *src, size_t len)
>  {
> -	memcpy(dst, src, len);
> +	lass_stac();
> +
> +	/*
> +	 * Objtool is picky about what occurs within the STAC/CLAC region
> +	 * because this code runs with protection disabled. Objtool typically
> +	 * does not permit function calls in this area.
> +	 *
> +	 * Avoid using memcpy() here. Instead, open code it.
> +	 */
> +	asm volatile("rep movsb"
> +		     : "+D" (dst), "+S" (src), "+c" (len) : : "memory");
> +
> +	lass_clac();
>  }

This didn't turn out great. At the _very_ least, we could have a:

	inline_memcpy_i_really_mean_it()

with the rep mov. Or even a #define if we were super paranoid the
compiler is out to get us.

But _actually_ open-coding inline assembly is far too ugly to live.

We can also be a bit more compact about the comments:

	/*
	 * objtool enforces a strict policy of "no function calls within
	 * AC=1 regions". Adhere to the policy by doing a memcpy() that
	 * will never result in a function call.
	 */



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
       [not found] ` <20250707080317.3791624-5-kirill.shutemov@linux.intel.com>
  2025-07-09  1:19   ` [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall Sohil Mehta
@ 2025-07-09 17:00   ` Dave Hansen
  2025-07-31 23:45     ` Sohil Mehta
  1 sibling, 1 reply; 30+ messages in thread
From: Dave Hansen @ 2025-07-09 17:00 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Sohil Mehta, Ingo Molnar, Pawan Gupta,
	Daniel Sneddon, Kai Huang, Sandipan Das, Breno Leitao,
	Rick Edgecombe, Alexei Starovoitov, Hou Tao, Juergen Gross,
	Vegard Nossum, Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/25 01:03, Kirill A. Shutemov wrote:
> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
> arch_cpu_finalize_init(), defer it until core initcall.

What are the side effects of this move? Are there other benefits? What
are the risks?

BTW, ideally, you'd get an ack from one of the folks who put the CR
pinning in the kernel in the first place to make sure this isn't
breaking the mechanism in any important ways.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 16/16] x86: Re-enable Linear Address Masking
  2025-07-09 11:00     ` Kirill A. Shutemov
@ 2025-07-11  0:42       ` Sohil Mehta
  0 siblings, 0 replies; 30+ messages in thread
From: Sohil Mehta @ 2025-07-11  0:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On 7/9/2025 4:00 AM, Kirill A. Shutemov wrote:

> I've merged last two patches and updated the commit message:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git/commit/?h=x86/lass
> 
> I hope it is still okay to use your Reviewed-by tag.
> 

Yes, that should be fine.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 13/16] x86/traps: Handle LASS thrown #SS
  2025-07-09 10:38     ` Kirill A. Shutemov
@ 2025-07-11  1:22       ` Sohil Mehta
  0 siblings, 0 replies; 30+ messages in thread
From: Sohil Mehta @ 2025-07-11  1:22 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On 7/9/2025 3:38 AM, Kirill A. Shutemov wrote:
> On Tue, Jul 08, 2025 at 10:12:28PM -0700, Sohil Mehta wrote:
>> On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
>>
>>> +	hint = get_kernel_exc_address(regs, &exc_addr);
>>> +	if (hint != EXC_NO_HINT)
>>> +		printk(SSFSTR ", %s 0x%lx", kernel_exc_hint_help[hint], exc_addr);
>>> +
>>> +	if (hint != EXC_NON_CANONICAL)
>>> +		exc_addr = 0;
>>> +
>>> +	die_addr(SSFSTR, regs, error_code, exc_addr);
>>
>> I see a slight difference between the #GP handling and the #SS handling
>> here. For the #GP case, we seem to pass the hint string to die_addr().
>>
>> However, for the #SS, the hint is printed above and only SSFSTR gets
>> passed onto die_addr(). I am curious about the reasoning.
> 
> I hate how 'desc' size is defined in #GP handler:
> 
> 	char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
> 
> Too much voodoo to my liking. And it will overflow if any hint string is
> going to be longer than 50.
> 
> I don't want to repeat this magic for #SS.
> 

Thanks, that makes sense.

> I would argue we need to print directly in #GP handler as I do in #SS.
> But, IMO, it is outside of the scope of this patchset.
> 




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 13/16] x86/traps: Handle LASS thrown #SS
       [not found] ` <20250707080317.3791624-14-kirill.shutemov@linux.intel.com>
  2025-07-09  5:12   ` [PATCHv9 13/16] x86/traps: Handle LASS thrown #SS Sohil Mehta
@ 2025-07-11  1:23   ` Sohil Mehta
  1 sibling, 0 replies; 30+ messages in thread
From: Sohil Mehta @ 2025-07-11  1:23 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm

On 7/7/2025 1:03 AM, Kirill A. Shutemov wrote:
> LASS throws a #GP for any violations except for stack register accesses,
> in which case it throws a #SS instead. Handle this similarly to how other
> LASS violations are handled.
> 
> In case of FRED, before handling #SS as LASS violation, kernel has to
> check if there's a fixup for the exception. It can address #SS due to
> invalid user context on ERETU. See 5105e7687ad3 ("x86/fred: Fixup
> fault on ERETU by jumping to fred_entrypoint_user") for more details.
> 
> Co-developed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/traps.c | 41 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 6 deletions(-)
> 

Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-07-09 16:58   ` Dave Hansen
@ 2025-07-25  2:35     ` Sohil Mehta
  0 siblings, 0 replies; 30+ messages in thread
From: Sohil Mehta @ 2025-07-25  2:35 UTC (permalink / raw)
  To: Dave Hansen, Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf,
	Xiongwei Song, Xin Li, Mike Rapoport (IBM), Brijesh Singh,
	Michael Roth, Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Kees Cook, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov

[-- Attachment #1: Type: text/plain, Size: 783 bytes --]

On 7/9/2025 9:58 AM, Dave Hansen wrote:

>> +	 * Avoid using memcpy() here. Instead, open code it.
>> +	 */
>> +	asm volatile("rep movsb"
>> +		     : "+D" (dst), "+S" (src), "+c" (len) : : "memory");
>> +
>> +	lass_clac();
>>  }
> 
> This didn't turn out great. At the _very_ least, we could have a:
> 
> 	inline_memcpy_i_really_mean_it()
> 

It looks like we should go back to __inline_memcpy()/_memset()
implementation that PeterZ had initially proposed. It seems to fit all
the requirements, right? Patch attached.

https://lore.kernel.org/lkml/20241028160917.1380714-3-alexander.shishkin@linux.intel.com/

> with the rep mov. Or even a #define if we were super paranoid the
> compiler is out to get us.
> 
> But _actually_ open-coding inline assembly is far too ugly to live.
> 

[-- Attachment #2: x86-asm-Introduce-inline-memcpy-and-memset.patch --]
[-- Type: text/plain, Size: 1419 bytes --]

From eb3b45b377df90d3b367e2b3fddfff1a72624a4e Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon, 28 Oct 2024 18:07:50 +0200
Subject: [PATCH] x86/asm: Introduce inline memcpy and memset

Provide inline memcpy and memset functions that can be used instead of
the GCC builtins whenever necessary.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
 arch/x86/include/asm/string.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
index c3c2c1914d65..9cb5aae7fba9 100644
--- a/arch/x86/include/asm/string.h
+++ b/arch/x86/include/asm/string.h
@@ -1,6 +1,32 @@
 /* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_STRING_H
+#define _ASM_X86_STRING_H
+
 #ifdef CONFIG_X86_32
 # include <asm/string_32.h>
 #else
 # include <asm/string_64.h>
 #endif
+
+static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
+{
+	void *ret = to;
+
+	asm volatile("rep movsb"
+		     : "+D" (to), "+S" (from), "+c" (len)
+		     : : "memory");
+	return ret;
+}
+
+static __always_inline void *__inline_memset(void *s, int v, size_t n)
+{
+	void *ret = s;
+
+	asm volatile("rep stosb"
+		     : "+D" (s), "+c" (n)
+		     : "a" ((uint8_t)v)
+		     : "memory");
+	return ret;
+}
+
+#endif /* _ASM_X86_STRING_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
       [not found] ` <20250707080317.3791624-3-kirill.shutemov@linux.intel.com>
  2025-07-09  1:08   ` [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
  2025-07-09 16:58   ` Dave Hansen
@ 2025-07-28 19:11   ` David Laight
  2025-07-28 19:28     ` H. Peter Anvin
  2 siblings, 1 reply; 30+ messages in thread
From: David Laight @ 2025-07-28 19:11 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Mon,  7 Jul 2025 11:03:02 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:

> From: Sohil Mehta <sohil.mehta@intel.com>
> 
> For patching, the kernel initializes a temporary mm area in the lower
> half of the address range. See commit 4fc19708b165 ("x86/alternatives:
> Initialize temporary mm for patching").
> 
> Disable LASS enforcement during patching to avoid triggering a #GP
> fault.
> 
> The objtool warns due to a call to a non-allowed function that exists
> outside of the stac/clac guard, or references to any function with a
> dynamic function pointer inside the guard. See the Objtool warnings
> section #9 in the document tools/objtool/Documentation/objtool.txt.
> 
> Considering that patching is usually small, replace the memcpy() and
> memset() functions in the text poking functions with their open coded
> versions.
...

Or just write a byte copy loop in C with (eg) barrier() inside it
to stop gcc converting it to memcpy().

	David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-07-28 19:11   ` David Laight
@ 2025-07-28 19:28     ` H. Peter Anvin
  2025-07-28 19:38       ` David Laight
  0 siblings, 1 reply; 30+ messages in thread
From: H. Peter Anvin @ 2025-07-28 19:28 UTC (permalink / raw)
  To: David Laight, Kirill A. Shutemov
  Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On July 28, 2025 12:11:37 PM PDT, David Laight <david.laight.linux@gmail.com> wrote:
>On Mon,  7 Jul 2025 11:03:02 +0300
>"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>
>> From: Sohil Mehta <sohil.mehta@intel.com>
>> 
>> For patching, the kernel initializes a temporary mm area in the lower
>> half of the address range. See commit 4fc19708b165 ("x86/alternatives:
>> Initialize temporary mm for patching").
>> 
>> Disable LASS enforcement during patching to avoid triggering a #GP
>> fault.
>> 
>> The objtool warns due to a call to a non-allowed function that exists
>> outside of the stac/clac guard, or references to any function with a
>> dynamic function pointer inside the guard. See the Objtool warnings
>> section #9 in the document tools/objtool/Documentation/objtool.txt.
>> 
>> Considering that patching is usually small, replace the memcpy() and
>> memset() functions in the text poking functions with their open coded
>> versions.
>...
>
>Or just write a byte copy loop in C with (eg) barrier() inside it
>to stop gcc converting it to memcpy().
>
>	David

Great. It's rep movsb without any of the performance.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-07-28 19:28     ` H. Peter Anvin
@ 2025-07-28 19:38       ` David Laight
  2025-08-01  0:15         ` Sohil Mehta
  0 siblings, 1 reply; 30+ messages in thread
From: David Laight @ 2025-07-28 19:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
	Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
	Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On Mon, 28 Jul 2025 12:28:33 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On July 28, 2025 12:11:37 PM PDT, David Laight <david.laight.linux@gmail.com> wrote:
> >On Mon,  7 Jul 2025 11:03:02 +0300
> >"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> >  
> >> From: Sohil Mehta <sohil.mehta@intel.com>
> >> 
> >> For patching, the kernel initializes a temporary mm area in the lower
> >> half of the address range. See commit 4fc19708b165 ("x86/alternatives:
> >> Initialize temporary mm for patching").
> >> 
> >> Disable LASS enforcement during patching to avoid triggering a #GP
> >> fault.
> >> 
> >> The objtool warns due to a call to a non-allowed function that exists
> >> outside of the stac/clac guard, or references to any function with a
> >> dynamic function pointer inside the guard. See the Objtool warnings
> >> section #9 in the document tools/objtool/Documentation/objtool.txt.
> >> 
> >> Considering that patching is usually small, replace the memcpy() and
> >> memset() functions in the text poking functions with their open coded
> >> versions.  
> >...
> >
> >Or just write a byte copy loop in C with (eg) barrier() inside it
> >to stop gcc converting it to memcpy().
> >
> >	David  
> 
> Great. It's rep movsb without any of the performance.

And without the massive setup overhead that dominates short copies.
Given the rest of the code I'm sure a byte copy loop won't make
any difference to the overall performance.

	David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
  2025-07-09 17:00   ` Dave Hansen
@ 2025-07-31 23:45     ` Sohil Mehta
  2025-08-01  0:01       ` Dave Hansen
  0 siblings, 1 reply; 30+ messages in thread
From: Sohil Mehta @ 2025-07-31 23:45 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner, Dave Hansen, Kees Cook
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Eric Biggers, Jason Gunthorpe, Masami Hiramatsu (Google),
	Andrew Morton, Luis Chamberlain, Yuntao Wang, Rasmus Villemoes,
	Christophe Leroy, Tejun Heo, Changbin Du, Huang Shijie,
	Geert Uytterhoeven, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-doc, linux-kernel, linux-efi, linux-mm, Kirill A. Shutemov,
	Kirill A. Shutemov, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney,
	Josh Poimboeuf, Xiongwei Song, Xin Li, Mike Rapoport (IBM),
	Brijesh Singh, Michael Roth, Tony Luck, Alexey Kardashevskiy,
	Alexander Shishkin, X86-kernel

On 7/9/2025 10:00 AM, Dave Hansen wrote:
> On 7/7/25 01:03, Kirill A. Shutemov wrote:
>> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
>> arch_cpu_finalize_init(), defer it until core initcall.
> 
> What are the side effects of this move? Are there other benefits? What
> are the risks?
> 

Picking this up from Kirill.. Reevaluating this, core_initcall() seems
too late for setup_cr_pinning().

We need to have CR pinning completed, and the associated static key
enabled before AP bring up. start_secondary()->cr4_init() depends on the
cr_pinning static key to initialize CR4 for APs.

To find the optimal location for CR pinning, here are the constraints:

1) The initialization of all the CPU-specific security features such as
SMAP, SMEP, UMIP and LASS must be done.

2) Since EFI needs to toggle CR4.LASS, EFI initialization must be completed.

3) Since APs depend on the BSP for CR initialization, CR pinning should
happen before AP bringup.

4) CR pinning should happen before userspace comes up, since that's what
we are protecting against.

I shortlisted two locations, arch_cpu_finalize_init() and early_initcall().

a) start_kernel()
     arch_cpu_finalize_init()

arch_cpu_finalize_init() seems like the logical fit, since CR pinning
can be considered as the "finalizing" the security sensitive control
registers. Doing it at the conclusion of CPU initialization makes sense.


b) start_kernel()
     rest_init()
       kernel_init()
         kernel_init_freeable()
           do_pre_smp_initcalls()
             early_initcall()

We could push the pinning until early_initcall() since that happens just
before SMP bringup as well the init process being executed. But, I don't
see any benefit to doing that.

Most of the stuff between arch_cpu_finalize_init() and rest_init() seems
to be arch agnostic (except maybe ACPI). Though the likelihood of
anything touching the pinned bits is low, it would be better to have the
bits pinned and complain if someone modifies them.

I am inclined towards arch_cpu_finalize_init() but I don't have a strong
preference. Dave, is there any other aspect I should consider?


> BTW, ideally, you'd get an ack from one of the folks who put the CR
> pinning in the kernel in the first place to make sure this isn't
> breaking the mechanism in any important ways.

Kees, do you have any preference or suggestions?




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
  2025-07-31 23:45     ` Sohil Mehta
@ 2025-08-01  0:01       ` Dave Hansen
  2025-08-01  4:43         ` Sohil Mehta
  2025-08-02 18:51         ` Kees Cook
  0 siblings, 2 replies; 30+ messages in thread
From: Dave Hansen @ 2025-08-01  0:01 UTC (permalink / raw)
  To: Sohil Mehta, Thomas Gleixner, Dave Hansen, Kees Cook
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Eric Biggers, Jason Gunthorpe, Masami Hiramatsu (Google),
	Andrew Morton, Luis Chamberlain, Yuntao Wang, Rasmus Villemoes,
	Christophe Leroy, Tejun Heo, Changbin Du, Huang Shijie,
	Geert Uytterhoeven, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-doc, linux-kernel, linux-efi, linux-mm, Kirill A. Shutemov,
	Kirill A. Shutemov, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney,
	Josh Poimboeuf, Xiongwei Song, Xin Li, Mike Rapoport (IBM),
	Brijesh Singh, Michael Roth, Tony Luck, Alexey Kardashevskiy,
	Alexander Shishkin, X86-kernel

On 7/31/25 16:45, Sohil Mehta wrote:
> On 7/9/2025 10:00 AM, Dave Hansen wrote:
>> On 7/7/25 01:03, Kirill A. Shutemov wrote:
>>> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
>>> arch_cpu_finalize_init(), defer it until core initcall.
>> What are the side effects of this move? Are there other benefits? What
>> are the risks?
>>
> Picking this up from Kirill.. Reevaluating this, core_initcall() seems
> too late for setup_cr_pinning().
> 
> We need to have CR pinning completed, and the associated static key
> enabled before AP bring up. start_secondary()->cr4_init() depends on the
> cr_pinning static key to initialize CR4 for APs.

Sure, if you leave cr4_init() completely as-is.

'cr4_pinned_bits' should be set by the boot CPU. Secondary CPUs should
also read 'cr4_pinned_bits' when setting up their own cr4's,
unconditionally, independent of 'cr_pinning'.

The thing I think we should change is the pinning _enforcement_. The
easiest way to do that is to remove the static_branch_likely() in
cr4_init() and then delay flipping the static branch until just before
userspace starts.

Basically, split up the:

static void __init setup_cr_pinning(void)
{
    cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
    static_key_enable(&cr_pinning.key);
}

code into its two logical pieces:

 1. Populate 'cr4_pinned_bits' from the boot CPU so the secondaries can
    use it
 2. Enable the static key so pinning enforcement is enabled


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
  2025-07-28 19:38       ` David Laight
@ 2025-08-01  0:15         ` Sohil Mehta
  0 siblings, 0 replies; 30+ messages in thread
From: Sohil Mehta @ 2025-08-01  0:15 UTC (permalink / raw)
  To: David Laight, H. Peter Anvin, Dave Hansen
  Cc: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
	Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
	Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
	Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
	Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
	linux-efi, linux-mm

On 7/28/2025 12:38 PM, David Laight wrote:
>>> ...
>>>
>>> Or just write a byte copy loop in C with (eg) barrier() inside it
>>> to stop gcc converting it to memcpy().
>>>
>>> 	David  
>>
>> Great. It's rep movsb without any of the performance.
> 
> And without the massive setup overhead that dominates short copies.
> Given the rest of the code I'm sure a byte copy loop won't make
> any difference to the overall performance.
>

Wouldn't it be better to introduce a generic mechanism than something
customized for this scenario?

PeterZ had suggested that inline memcpy could have more usages:
https://lore.kernel.org/lkml/20241029113611.GS14555@noisy.programming.kicks-ass.net/

Is there a concern that the inline versions might get optimized into
standard memcpy/memset calls by GCC? Wouldn't the volatile keyword
prevent that?

static __always_inline void *__inline_memcpy(void *to, const void *from,
size_t len)
{
	void *ret = to;

	asm volatile("rep movsb"
		     : "+D" (to), "+S" (from), "+c" (len)
		     : : "memory");
	return ret;
}

static __always_inline void *__inline_memset(void *s, int v, size_t n)
{
	void *ret = s;

	asm volatile("rep stosb"
		     : "+D" (s), "+c" (n)
		     : "a" ((uint8_t)v)
		     : "memory");
	return ret;
}



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
  2025-08-01  0:01       ` Dave Hansen
@ 2025-08-01  4:43         ` Sohil Mehta
  2025-08-01 14:22           ` Dave Hansen
  2025-08-02 18:51         ` Kees Cook
  1 sibling, 1 reply; 30+ messages in thread
From: Sohil Mehta @ 2025-08-01  4:43 UTC (permalink / raw)
  To: Dave Hansen, Thomas Gleixner, Dave Hansen, Kees Cook
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Eric Biggers, Jason Gunthorpe, Masami Hiramatsu (Google),
	Andrew Morton, Luis Chamberlain, Yuntao Wang, Rasmus Villemoes,
	Christophe Leroy, Tejun Heo, Changbin Du, Huang Shijie,
	Geert Uytterhoeven, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-doc, linux-kernel, linux-efi, linux-mm, Kirill A. Shutemov,
	Kirill A. Shutemov, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney,
	Josh Poimboeuf, Xiongwei Song, Xin Li, Mike Rapoport (IBM),
	Brijesh Singh, Michael Roth, Tony Luck, Alexey Kardashevskiy,
	Alexander Shishkin, X86-kernel

On 7/31/2025 5:01 PM, Dave Hansen wrote:
> 
> 'cr4_pinned_bits' should be set by the boot CPU. Secondary CPUs should
> also read 'cr4_pinned_bits' when setting up their own cr4's,
> unconditionally, independent of 'cr_pinning'.
> 
> The thing I think we should change is the pinning _enforcement_. The
> easiest way to do that is to remove the static_branch_likely() in
> cr4_init() and then delay flipping the static branch until just before
> userspace starts.
> 

Based on the current implementation and some git history [1], it seems
that cr_pinning is expected to catch 2 types of issues.

One is a malicious user trying to change the CR registers from
userspace. But, another scenario is a regular caller to
native_write_cr4() *mistakenly* clearing a pinned bit.

[1]:
https://lore.kernel.org/all/alpine.DEB.2.21.1906141646320.1722@nanos.tec.linutronix.de/

Could deferring enforcement lead to a scenario where we end up with
different CR4 values on different CPUs? Maybe I am misinterpreting this
and protecting against in-kernel errors is not a goal.

In general, you want to delay the CR pinning enforcement until
absolutely needed. I am curious about the motivation. I understand we
should avoid doing it at arbitrary points in the boot. But,
arch_cpu_finalize_init() and early_initcall() seem to be decent
mileposts to me.

Are you anticipating that we would need to move setup_cr_pinning() again
when another user similar to EFI shows up?
> Basically, split up the:
> 
> static void __init setup_cr_pinning(void)
> {
>     cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
>     static_key_enable(&cr_pinning.key);
> }
> 
> code into its two logical pieces:
> 
>  1. Populate 'cr4_pinned_bits' from the boot CPU so the secondaries can
>     use it
>  2. Enable the static key so pinning enforcement is enabled




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
  2025-08-01  4:43         ` Sohil Mehta
@ 2025-08-01 14:22           ` Dave Hansen
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hansen @ 2025-08-01 14:22 UTC (permalink / raw)
  To: Sohil Mehta, Thomas Gleixner, Dave Hansen, Kees Cook
  Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
	Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
	Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
	Eric Biggers, Jason Gunthorpe, Masami Hiramatsu (Google),
	Andrew Morton, Luis Chamberlain, Yuntao Wang, Rasmus Villemoes,
	Christophe Leroy, Tejun Heo, Changbin Du, Huang Shijie,
	Geert Uytterhoeven, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-doc, linux-kernel, linux-efi, linux-mm, Kirill A. Shutemov,
	Kirill A. Shutemov, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney,
	Josh Poimboeuf, Xiongwei Song, Xin Li, Mike Rapoport (IBM),
	Brijesh Singh, Michael Roth, Tony Luck, Alexey Kardashevskiy,
	Alexander Shishkin, X86-kernel

On 7/31/25 21:43, Sohil Mehta wrote:
...
> Could deferring enforcement lead to a scenario where we end up with
> different CR4 values on different CPUs? Maybe I am misinterpreting this
> and protecting against in-kernel errors is not a goal.

Sure, theoretically.

But if that's a concern, it can be checked at the time that enforcement
starts:

	for_each_online_cpu(cpu) {
		unsigned long cr4 = per_cpu(cpu_tlbstate.cr4, cpu);
		if ((cr4 & cr4_pinned_mask) == cr4_pinned_bits))
			continue;
		WARN("blah blah");
	}

Or use smp_call_function() to check each CPU's CR4 directly.

Or, the next time that CPU does a TLB flush that toggles X86_CR4_PGE,
it'll get the warning from the regular pinning path.

So, sure, this does widen the window during boot where a secondary CPU
might get a bad CR4 value, and it would make it harder to track down
where it happened. We _could_ print a pr_debug() message when the bit
gets cleared but not enforce things if anyone is super worried about this.

> In general, you want to delay the CR pinning enforcement until
> absolutely needed. I am curious about the motivation. I understand we
> should avoid doing it at arbitrary points in the boot. But,
> arch_cpu_finalize_init() and early_initcall() seem to be decent
> mileposts to me.
> 
> Are you anticipating that we would need to move setup_cr_pinning() again
> when another user similar to EFI shows up?
Yep.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
  2025-08-01  0:01       ` Dave Hansen
  2025-08-01  4:43         ` Sohil Mehta
@ 2025-08-02 18:51         ` Kees Cook
  2025-08-04  6:55           ` H. Peter Anvin
  1 sibling, 1 reply; 30+ messages in thread
From: Kees Cook @ 2025-08-02 18:51 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Sohil Mehta, Thomas Gleixner, Dave Hansen, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov, Kirill A. Shutemov, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Peter Zijlstra,
	Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
	Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
	Tony Luck, Alexey Kardashevskiy, Alexander Shishkin, X86-kernel

On Thu, Jul 31, 2025 at 05:01:37PM -0700, Dave Hansen wrote:
> On 7/31/25 16:45, Sohil Mehta wrote:
> > On 7/9/2025 10:00 AM, Dave Hansen wrote:
> >> On 7/7/25 01:03, Kirill A. Shutemov wrote:
> >>> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
> >>> arch_cpu_finalize_init(), defer it until core initcall.
> >> What are the side effects of this move? Are there other benefits? What
> >> are the risks?
> >>
> > Picking this up from Kirill.. Reevaluating this, core_initcall() seems
> > too late for setup_cr_pinning().
> > 
> > We need to have CR pinning completed, and the associated static key
> > enabled before AP bring up. start_secondary()->cr4_init() depends on the
> > cr_pinning static key to initialize CR4 for APs.
> 
> Sure, if you leave cr4_init() completely as-is.
> 
> 'cr4_pinned_bits' should be set by the boot CPU. Secondary CPUs should
> also read 'cr4_pinned_bits' when setting up their own cr4's,
> unconditionally, independent of 'cr_pinning'.
> 
> The thing I think we should change is the pinning _enforcement_. The
> easiest way to do that is to remove the static_branch_likely() in
> cr4_init() and then delay flipping the static branch until just before
> userspace starts.

Yeah, this is fine from my perspective. The goal with the pinning was
about keeping things safe in the face of an attack from userspace that
managed to get at MSR values and keeping them from being trivially
changed.

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
  2025-08-02 18:51         ` Kees Cook
@ 2025-08-04  6:55           ` H. Peter Anvin
  0 siblings, 0 replies; 30+ messages in thread
From: H. Peter Anvin @ 2025-08-04  6:55 UTC (permalink / raw)
  To: Kees Cook, Dave Hansen
  Cc: Sohil Mehta, Thomas Gleixner, Dave Hansen, Jonathan Corbet,
	Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
	Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
	Juergen Gross, Vegard Nossum, Eric Biggers, Jason Gunthorpe,
	Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
	Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
	Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
	Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
	linux-mm, Kirill A. Shutemov, Kirill A. Shutemov, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Peter Zijlstra, Ard Biesheuvel,
	Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
	Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
	Alexey Kardashevskiy, Alexander Shishkin, X86-kernel

On August 2, 2025 11:51:28 AM PDT, Kees Cook <kees@kernel.org> wrote:
>On Thu, Jul 31, 2025 at 05:01:37PM -0700, Dave Hansen wrote:
>> On 7/31/25 16:45, Sohil Mehta wrote:
>> > On 7/9/2025 10:00 AM, Dave Hansen wrote:
>> >> On 7/7/25 01:03, Kirill A. Shutemov wrote:
>> >>> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
>> >>> arch_cpu_finalize_init(), defer it until core initcall.
>> >> What are the side effects of this move? Are there other benefits? What
>> >> are the risks?
>> >>
>> > Picking this up from Kirill.. Reevaluating this, core_initcall() seems
>> > too late for setup_cr_pinning().
>> > 
>> > We need to have CR pinning completed, and the associated static key
>> > enabled before AP bring up. start_secondary()->cr4_init() depends on the
>> > cr_pinning static key to initialize CR4 for APs.
>> 
>> Sure, if you leave cr4_init() completely as-is.
>> 
>> 'cr4_pinned_bits' should be set by the boot CPU. Secondary CPUs should
>> also read 'cr4_pinned_bits' when setting up their own cr4's,
>> unconditionally, independent of 'cr_pinning'.
>> 
>> The thing I think we should change is the pinning _enforcement_. The
>> easiest way to do that is to remove the static_branch_likely() in
>> cr4_init() and then delay flipping the static branch until just before
>> userspace starts.
>
>Yeah, this is fine from my perspective. The goal with the pinning was
>about keeping things safe in the face of an attack from userspace that
>managed to get at MSR values and keeping them from being trivially
>changed.
>

I have mentioned this before: I would like to see CR4-pinning use a patchable immediate to make it harder to manipulate. If the mask is final when alternatives are run, that would be a good time to install it; the code can just contain a zero immediate (no pinning) or a very limited set of bits that must never be changed at all up to that point.


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2025-08-04  6:56 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250707080317.3791624-1-kirill.shutemov@linux.intel.com>
     [not found] ` <20250707080317.3791624-5-kirill.shutemov@linux.intel.com>
2025-07-09  1:19   ` [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall Sohil Mehta
2025-07-09  9:38     ` Kirill A. Shutemov
2025-07-09 17:00   ` Dave Hansen
2025-07-31 23:45     ` Sohil Mehta
2025-08-01  0:01       ` Dave Hansen
2025-08-01  4:43         ` Sohil Mehta
2025-08-01 14:22           ` Dave Hansen
2025-08-02 18:51         ` Kees Cook
2025-08-04  6:55           ` H. Peter Anvin
     [not found] ` <20250707080317.3791624-6-kirill.shutemov@linux.intel.com>
2025-07-09  1:27   ` [PATCHv9 05/16] efi: Disable LASS around set_virtual_address_map() EFI call Sohil Mehta
     [not found] ` <20250707080317.3791624-12-kirill.shutemov@linux.intel.com>
2025-07-09  2:40   ` [PATCHv9 11/16] x86/traps: Communicate a LASS violation in #GP message Sohil Mehta
2025-07-09  9:31     ` Kirill A. Shutemov
2025-07-09  9:36       ` Geert Uytterhoeven
2025-07-09  9:51         ` Kirill A. Shutemov
     [not found] ` <20250707080317.3791624-13-kirill.shutemov@linux.intel.com>
2025-07-09  4:59   ` [PATCHv9 12/16] x86/traps: Generalize #GP address decode and hint code Sohil Mehta
     [not found] ` <20250707080317.3791624-17-kirill.shutemov@linux.intel.com>
2025-07-09  5:31   ` [PATCHv9 16/16] x86: Re-enable Linear Address Masking Sohil Mehta
2025-07-09 11:00     ` Kirill A. Shutemov
2025-07-11  0:42       ` Sohil Mehta
     [not found] ` <20250707080317.3791624-14-kirill.shutemov@linux.intel.com>
2025-07-09  5:12   ` [PATCHv9 13/16] x86/traps: Handle LASS thrown #SS Sohil Mehta
2025-07-09 10:38     ` Kirill A. Shutemov
2025-07-11  1:22       ` Sohil Mehta
2025-07-11  1:23   ` Sohil Mehta
     [not found] ` <20250707080317.3791624-3-kirill.shutemov@linux.intel.com>
2025-07-09  1:08   ` [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
2025-07-09  9:35     ` Kirill A. Shutemov
2025-07-09 16:58   ` Dave Hansen
2025-07-25  2:35     ` Sohil Mehta
2025-07-28 19:11   ` David Laight
2025-07-28 19:28     ` H. Peter Anvin
2025-07-28 19:38       ` David Laight
2025-08-01  0:15         ` Sohil Mehta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).