* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
[not found] ` <7d93b343-b275-4edb-ae26-4578ae53652f@intel.com>
@ 2025-07-25 2:35 ` Sohil Mehta
0 siblings, 0 replies; 11+ messages in thread
From: Sohil Mehta @ 2025-07-25 2:35 UTC (permalink / raw)
To: Dave Hansen, Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf,
Xiongwei Song, Xin Li, Mike Rapoport (IBM), Brijesh Singh,
Michael Roth, Tony Luck, Alexey Kardashevskiy, Alexander Shishkin
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Kees Cook, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
linux-mm, Kirill A. Shutemov
[-- Attachment #1: Type: text/plain, Size: 783 bytes --]
On 7/9/2025 9:58 AM, Dave Hansen wrote:
>> + * Avoid using memcpy() here. Instead, open code it.
>> + */
>> + asm volatile("rep movsb"
>> + : "+D" (dst), "+S" (src), "+c" (len) : : "memory");
>> +
>> + lass_clac();
>> }
>
> This didn't turn out great. At the _very_ least, we could have a:
>
> inline_memcpy_i_really_mean_it()
>
It looks like we should go back to __inline_memcpy()/_memset()
implementation that PeterZ had initially proposed. It seems to fit all
the requirements, right? Patch attached.
https://lore.kernel.org/lkml/20241028160917.1380714-3-alexander.shishkin@linux.intel.com/
> with the rep mov. Or even a #define if we were super paranoid the
> compiler is out to get us.
>
> But _actually_ open-coding inline assembly is far too ugly to live.
>
[-- Attachment #2: x86-asm-Introduce-inline-memcpy-and-memset.patch --]
[-- Type: text/plain, Size: 1419 bytes --]
From eb3b45b377df90d3b367e2b3fddfff1a72624a4e Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon, 28 Oct 2024 18:07:50 +0200
Subject: [PATCH] x86/asm: Introduce inline memcpy and memset
Provide inline memcpy and memset functions that can be used instead of
the GCC builtins whenever necessary.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
arch/x86/include/asm/string.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/arch/x86/include/asm/string.h b/arch/x86/include/asm/string.h
index c3c2c1914d65..9cb5aae7fba9 100644
--- a/arch/x86/include/asm/string.h
+++ b/arch/x86/include/asm/string.h
@@ -1,6 +1,32 @@
/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_STRING_H
+#define _ASM_X86_STRING_H
+
#ifdef CONFIG_X86_32
# include <asm/string_32.h>
#else
# include <asm/string_64.h>
#endif
+
+static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len)
+{
+ void *ret = to;
+
+ asm volatile("rep movsb"
+ : "+D" (to), "+S" (from), "+c" (len)
+ : : "memory");
+ return ret;
+}
+
+static __always_inline void *__inline_memset(void *s, int v, size_t n)
+{
+ void *ret = s;
+
+ asm volatile("rep stosb"
+ : "+D" (s), "+c" (n)
+ : "a" ((uint8_t)v)
+ : "memory");
+ return ret;
+}
+
+#endif /* _ASM_X86_STRING_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
[not found] ` <20250707080317.3791624-3-kirill.shutemov@linux.intel.com>
[not found] ` <7d93b343-b275-4edb-ae26-4578ae53652f@intel.com>
@ 2025-07-28 19:11 ` David Laight
2025-07-28 19:28 ` H. Peter Anvin
1 sibling, 1 reply; 11+ messages in thread
From: David Laight @ 2025-07-28 19:11 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, linux-mm
On Mon, 7 Jul 2025 11:03:02 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> From: Sohil Mehta <sohil.mehta@intel.com>
>
> For patching, the kernel initializes a temporary mm area in the lower
> half of the address range. See commit 4fc19708b165 ("x86/alternatives:
> Initialize temporary mm for patching").
>
> Disable LASS enforcement during patching to avoid triggering a #GP
> fault.
>
> The objtool warns due to a call to a non-allowed function that exists
> outside of the stac/clac guard, or references to any function with a
> dynamic function pointer inside the guard. See the Objtool warnings
> section #9 in the document tools/objtool/Documentation/objtool.txt.
>
> Considering that patching is usually small, replace the memcpy() and
> memset() functions in the text poking functions with their open coded
> versions.
...
Or just write a byte copy loop in C with (eg) barrier() inside it
to stop gcc converting it to memcpy().
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
2025-07-28 19:11 ` David Laight
@ 2025-07-28 19:28 ` H. Peter Anvin
2025-07-28 19:38 ` David Laight
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2025-07-28 19:28 UTC (permalink / raw)
To: David Laight, Kirill A. Shutemov
Cc: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, linux-mm
On July 28, 2025 12:11:37 PM PDT, David Laight <david.laight.linux@gmail.com> wrote:
>On Mon, 7 Jul 2025 11:03:02 +0300
>"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>
>> From: Sohil Mehta <sohil.mehta@intel.com>
>>
>> For patching, the kernel initializes a temporary mm area in the lower
>> half of the address range. See commit 4fc19708b165 ("x86/alternatives:
>> Initialize temporary mm for patching").
>>
>> Disable LASS enforcement during patching to avoid triggering a #GP
>> fault.
>>
>> The objtool warns due to a call to a non-allowed function that exists
>> outside of the stac/clac guard, or references to any function with a
>> dynamic function pointer inside the guard. See the Objtool warnings
>> section #9 in the document tools/objtool/Documentation/objtool.txt.
>>
>> Considering that patching is usually small, replace the memcpy() and
>> memset() functions in the text poking functions with their open coded
>> versions.
>...
>
>Or just write a byte copy loop in C with (eg) barrier() inside it
>to stop gcc converting it to memcpy().
>
> David
Great. It's rep movsb without any of the performance.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
2025-07-28 19:28 ` H. Peter Anvin
@ 2025-07-28 19:38 ` David Laight
2025-08-01 0:15 ` Sohil Mehta
0 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2025-07-28 19:38 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
Sohil Mehta, Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang,
Sandipan Das, Breno Leitao, Rick Edgecombe, Alexei Starovoitov,
Hou Tao, Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, linux-mm
On Mon, 28 Jul 2025 12:28:33 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:
> On July 28, 2025 12:11:37 PM PDT, David Laight <david.laight.linux@gmail.com> wrote:
> >On Mon, 7 Jul 2025 11:03:02 +0300
> >"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> >
> >> From: Sohil Mehta <sohil.mehta@intel.com>
> >>
> >> For patching, the kernel initializes a temporary mm area in the lower
> >> half of the address range. See commit 4fc19708b165 ("x86/alternatives:
> >> Initialize temporary mm for patching").
> >>
> >> Disable LASS enforcement during patching to avoid triggering a #GP
> >> fault.
> >>
> >> The objtool warns due to a call to a non-allowed function that exists
> >> outside of the stac/clac guard, or references to any function with a
> >> dynamic function pointer inside the guard. See the Objtool warnings
> >> section #9 in the document tools/objtool/Documentation/objtool.txt.
> >>
> >> Considering that patching is usually small, replace the memcpy() and
> >> memset() functions in the text poking functions with their open coded
> >> versions.
> >...
> >
> >Or just write a byte copy loop in C with (eg) barrier() inside it
> >to stop gcc converting it to memcpy().
> >
> > David
>
> Great. It's rep movsb without any of the performance.
And without the massive setup overhead that dominates short copies.
Given the rest of the code I'm sure a byte copy loop won't make
any difference to the overall performance.
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
[not found] ` <6075af69-299f-43d2-a3c8-353a2a3b7ee7@intel.com>
@ 2025-07-31 23:45 ` Sohil Mehta
2025-08-01 0:01 ` Dave Hansen
0 siblings, 1 reply; 11+ messages in thread
From: Sohil Mehta @ 2025-07-31 23:45 UTC (permalink / raw)
To: Dave Hansen, Thomas Gleixner, Dave Hansen, Kees Cook
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Eric Biggers, Jason Gunthorpe, Masami Hiramatsu (Google),
Andrew Morton, Luis Chamberlain, Yuntao Wang, Rasmus Villemoes,
Christophe Leroy, Tejun Heo, Changbin Du, Huang Shijie,
Geert Uytterhoeven, Namhyung Kim, Arnaldo Carvalho de Melo,
linux-doc, linux-kernel, linux-efi, linux-mm, Kirill A. Shutemov,
Kirill A. Shutemov, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney,
Josh Poimboeuf, Xiongwei Song, Xin Li, Mike Rapoport (IBM),
Brijesh Singh, Michael Roth, Tony Luck, Alexey Kardashevskiy,
Alexander Shishkin, X86-kernel
On 7/9/2025 10:00 AM, Dave Hansen wrote:
> On 7/7/25 01:03, Kirill A. Shutemov wrote:
>> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
>> arch_cpu_finalize_init(), defer it until core initcall.
>
> What are the side effects of this move? Are there other benefits? What
> are the risks?
>
Picking this up from Kirill.. Reevaluating this, core_initcall() seems
too late for setup_cr_pinning().
We need to have CR pinning completed, and the associated static key
enabled before AP bring up. start_secondary()->cr4_init() depends on the
cr_pinning static key to initialize CR4 for APs.
To find the optimal location for CR pinning, here are the constraints:
1) The initialization of all the CPU-specific security features such as
SMAP, SMEP, UMIP and LASS must be done.
2) Since EFI needs to toggle CR4.LASS, EFI initialization must be completed.
3) Since APs depend on the BSP for CR initialization, CR pinning should
happen before AP bringup.
4) CR pinning should happen before userspace comes up, since that's what
we are protecting against.
I shortlisted two locations, arch_cpu_finalize_init() and early_initcall().
a) start_kernel()
arch_cpu_finalize_init()
arch_cpu_finalize_init() seems like the logical fit, since CR pinning
can be considered as the "finalizing" the security sensitive control
registers. Doing it at the conclusion of CPU initialization makes sense.
b) start_kernel()
rest_init()
kernel_init()
kernel_init_freeable()
do_pre_smp_initcalls()
early_initcall()
We could push the pinning until early_initcall() since that happens just
before SMP bringup as well the init process being executed. But, I don't
see any benefit to doing that.
Most of the stuff between arch_cpu_finalize_init() and rest_init() seems
to be arch agnostic (except maybe ACPI). Though the likelihood of
anything touching the pinned bits is low, it would be better to have the
bits pinned and complain if someone modifies them.
I am inclined towards arch_cpu_finalize_init() but I don't have a strong
preference. Dave, is there any other aspect I should consider?
> BTW, ideally, you'd get an ack from one of the folks who put the CR
> pinning in the kernel in the first place to make sure this isn't
> breaking the mechanism in any important ways.
Kees, do you have any preference or suggestions?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
2025-07-31 23:45 ` [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall Sohil Mehta
@ 2025-08-01 0:01 ` Dave Hansen
2025-08-01 4:43 ` Sohil Mehta
2025-08-02 18:51 ` Kees Cook
0 siblings, 2 replies; 11+ messages in thread
From: Dave Hansen @ 2025-08-01 0:01 UTC (permalink / raw)
To: Sohil Mehta, Thomas Gleixner, Dave Hansen, Kees Cook
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Eric Biggers, Jason Gunthorpe, Masami Hiramatsu (Google),
Andrew Morton, Luis Chamberlain, Yuntao Wang, Rasmus Villemoes,
Christophe Leroy, Tejun Heo, Changbin Du, Huang Shijie,
Geert Uytterhoeven, Namhyung Kim, Arnaldo Carvalho de Melo,
linux-doc, linux-kernel, linux-efi, linux-mm, Kirill A. Shutemov,
Kirill A. Shutemov, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney,
Josh Poimboeuf, Xiongwei Song, Xin Li, Mike Rapoport (IBM),
Brijesh Singh, Michael Roth, Tony Luck, Alexey Kardashevskiy,
Alexander Shishkin, X86-kernel
On 7/31/25 16:45, Sohil Mehta wrote:
> On 7/9/2025 10:00 AM, Dave Hansen wrote:
>> On 7/7/25 01:03, Kirill A. Shutemov wrote:
>>> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
>>> arch_cpu_finalize_init(), defer it until core initcall.
>> What are the side effects of this move? Are there other benefits? What
>> are the risks?
>>
> Picking this up from Kirill.. Reevaluating this, core_initcall() seems
> too late for setup_cr_pinning().
>
> We need to have CR pinning completed, and the associated static key
> enabled before AP bring up. start_secondary()->cr4_init() depends on the
> cr_pinning static key to initialize CR4 for APs.
Sure, if you leave cr4_init() completely as-is.
'cr4_pinned_bits' should be set by the boot CPU. Secondary CPUs should
also read 'cr4_pinned_bits' when setting up their own cr4's,
unconditionally, independent of 'cr_pinning'.
The thing I think we should change is the pinning _enforcement_. The
easiest way to do that is to remove the static_branch_likely() in
cr4_init() and then delay flipping the static branch until just before
userspace starts.
Basically, split up the:
static void __init setup_cr_pinning(void)
{
cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
static_key_enable(&cr_pinning.key);
}
code into its two logical pieces:
1. Populate 'cr4_pinned_bits' from the boot CPU so the secondaries can
use it
2. Enable the static key so pinning enforcement is enabled
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives
2025-07-28 19:38 ` David Laight
@ 2025-08-01 0:15 ` Sohil Mehta
0 siblings, 0 replies; 11+ messages in thread
From: Sohil Mehta @ 2025-08-01 0:15 UTC (permalink / raw)
To: David Laight, H. Peter Anvin, Dave Hansen
Cc: Kirill A. Shutemov, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, x86, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Alexey Kardashevskiy, Alexander Shishkin, Jonathan Corbet,
Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Kees Cook, Eric Biggers,
Jason Gunthorpe, Masami Hiramatsu (Google), Andrew Morton,
Luis Chamberlain, Yuntao Wang, Rasmus Villemoes, Christophe Leroy,
Tejun Heo, Changbin Du, Huang Shijie, Geert Uytterhoeven,
Namhyung Kim, Arnaldo Carvalho de Melo, linux-doc, linux-kernel,
linux-efi, linux-mm
On 7/28/2025 12:38 PM, David Laight wrote:
>>> ...
>>>
>>> Or just write a byte copy loop in C with (eg) barrier() inside it
>>> to stop gcc converting it to memcpy().
>>>
>>> David
>>
>> Great. It's rep movsb without any of the performance.
>
> And without the massive setup overhead that dominates short copies.
> Given the rest of the code I'm sure a byte copy loop won't make
> any difference to the overall performance.
>
Wouldn't it be better to introduce a generic mechanism than something
customized for this scenario?
PeterZ had suggested that inline memcpy could have more usages:
https://lore.kernel.org/lkml/20241029113611.GS14555@noisy.programming.kicks-ass.net/
Is there a concern that the inline versions might get optimized into
standard memcpy/memset calls by GCC? Wouldn't the volatile keyword
prevent that?
static __always_inline void *__inline_memcpy(void *to, const void *from,
size_t len)
{
void *ret = to;
asm volatile("rep movsb"
: "+D" (to), "+S" (from), "+c" (len)
: : "memory");
return ret;
}
static __always_inline void *__inline_memset(void *s, int v, size_t n)
{
void *ret = s;
asm volatile("rep stosb"
: "+D" (s), "+c" (n)
: "a" ((uint8_t)v)
: "memory");
return ret;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
2025-08-01 0:01 ` Dave Hansen
@ 2025-08-01 4:43 ` Sohil Mehta
2025-08-01 14:22 ` Dave Hansen
2025-08-02 18:51 ` Kees Cook
1 sibling, 1 reply; 11+ messages in thread
From: Sohil Mehta @ 2025-08-01 4:43 UTC (permalink / raw)
To: Dave Hansen, Thomas Gleixner, Dave Hansen, Kees Cook
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Eric Biggers, Jason Gunthorpe, Masami Hiramatsu (Google),
Andrew Morton, Luis Chamberlain, Yuntao Wang, Rasmus Villemoes,
Christophe Leroy, Tejun Heo, Changbin Du, Huang Shijie,
Geert Uytterhoeven, Namhyung Kim, Arnaldo Carvalho de Melo,
linux-doc, linux-kernel, linux-efi, linux-mm, Kirill A. Shutemov,
Kirill A. Shutemov, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney,
Josh Poimboeuf, Xiongwei Song, Xin Li, Mike Rapoport (IBM),
Brijesh Singh, Michael Roth, Tony Luck, Alexey Kardashevskiy,
Alexander Shishkin, X86-kernel
On 7/31/2025 5:01 PM, Dave Hansen wrote:
>
> 'cr4_pinned_bits' should be set by the boot CPU. Secondary CPUs should
> also read 'cr4_pinned_bits' when setting up their own cr4's,
> unconditionally, independent of 'cr_pinning'.
>
> The thing I think we should change is the pinning _enforcement_. The
> easiest way to do that is to remove the static_branch_likely() in
> cr4_init() and then delay flipping the static branch until just before
> userspace starts.
>
Based on the current implementation and some git history [1], it seems
that cr_pinning is expected to catch 2 types of issues.
One is a malicious user trying to change the CR registers from
userspace. But, another scenario is a regular caller to
native_write_cr4() *mistakenly* clearing a pinned bit.
[1]:
https://lore.kernel.org/all/alpine.DEB.2.21.1906141646320.1722@nanos.tec.linutronix.de/
Could deferring enforcement lead to a scenario where we end up with
different CR4 values on different CPUs? Maybe I am misinterpreting this
and protecting against in-kernel errors is not a goal.
In general, you want to delay the CR pinning enforcement until
absolutely needed. I am curious about the motivation. I understand we
should avoid doing it at arbitrary points in the boot. But,
arch_cpu_finalize_init() and early_initcall() seem to be decent
mileposts to me.
Are you anticipating that we would need to move setup_cr_pinning() again
when another user similar to EFI shows up?
> Basically, split up the:
>
> static void __init setup_cr_pinning(void)
> {
> cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
> static_key_enable(&cr_pinning.key);
> }
>
> code into its two logical pieces:
>
> 1. Populate 'cr4_pinned_bits' from the boot CPU so the secondaries can
> use it
> 2. Enable the static key so pinning enforcement is enabled
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
2025-08-01 4:43 ` Sohil Mehta
@ 2025-08-01 14:22 ` Dave Hansen
0 siblings, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2025-08-01 14:22 UTC (permalink / raw)
To: Sohil Mehta, Thomas Gleixner, Dave Hansen, Kees Cook
Cc: Jonathan Corbet, Ingo Molnar, Pawan Gupta, Daniel Sneddon,
Kai Huang, Sandipan Das, Breno Leitao, Rick Edgecombe,
Alexei Starovoitov, Hou Tao, Juergen Gross, Vegard Nossum,
Eric Biggers, Jason Gunthorpe, Masami Hiramatsu (Google),
Andrew Morton, Luis Chamberlain, Yuntao Wang, Rasmus Villemoes,
Christophe Leroy, Tejun Heo, Changbin Du, Huang Shijie,
Geert Uytterhoeven, Namhyung Kim, Arnaldo Carvalho de Melo,
linux-doc, linux-kernel, linux-efi, linux-mm, Kirill A. Shutemov,
Kirill A. Shutemov, Andy Lutomirski, Ingo Molnar, Borislav Petkov,
H. Peter Anvin, Peter Zijlstra, Ard Biesheuvel, Paul E. McKenney,
Josh Poimboeuf, Xiongwei Song, Xin Li, Mike Rapoport (IBM),
Brijesh Singh, Michael Roth, Tony Luck, Alexey Kardashevskiy,
Alexander Shishkin, X86-kernel
On 7/31/25 21:43, Sohil Mehta wrote:
...
> Could deferring enforcement lead to a scenario where we end up with
> different CR4 values on different CPUs? Maybe I am misinterpreting this
> and protecting against in-kernel errors is not a goal.
Sure, theoretically.
But if that's a concern, it can be checked at the time that enforcement
starts:
for_each_online_cpu(cpu) {
unsigned long cr4 = per_cpu(cpu_tlbstate.cr4, cpu);
if ((cr4 & cr4_pinned_mask) == cr4_pinned_bits))
continue;
WARN("blah blah");
}
Or use smp_call_function() to check each CPU's CR4 directly.
Or, the next time that CPU does a TLB flush that toggles X86_CR4_PGE,
it'll get the warning from the regular pinning path.
So, sure, this does widen the window during boot where a secondary CPU
might get a bad CR4 value, and it would make it harder to track down
where it happened. We _could_ print a pr_debug() message when the bit
gets cleared but not enforce things if anyone is super worried about this.
> In general, you want to delay the CR pinning enforcement until
> absolutely needed. I am curious about the motivation. I understand we
> should avoid doing it at arbitrary points in the boot. But,
> arch_cpu_finalize_init() and early_initcall() seem to be decent
> mileposts to me.
>
> Are you anticipating that we would need to move setup_cr_pinning() again
> when another user similar to EFI shows up?
Yep.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
2025-08-01 0:01 ` Dave Hansen
2025-08-01 4:43 ` Sohil Mehta
@ 2025-08-02 18:51 ` Kees Cook
2025-08-04 6:55 ` H. Peter Anvin
1 sibling, 1 reply; 11+ messages in thread
From: Kees Cook @ 2025-08-02 18:51 UTC (permalink / raw)
To: Dave Hansen
Cc: Sohil Mehta, Thomas Gleixner, Dave Hansen, Jonathan Corbet,
Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
linux-mm, Kirill A. Shutemov, Kirill A. Shutemov, Andy Lutomirski,
Ingo Molnar, Borislav Petkov, H. Peter Anvin, Peter Zijlstra,
Ard Biesheuvel, Paul E. McKenney, Josh Poimboeuf, Xiongwei Song,
Xin Li, Mike Rapoport (IBM), Brijesh Singh, Michael Roth,
Tony Luck, Alexey Kardashevskiy, Alexander Shishkin, X86-kernel
On Thu, Jul 31, 2025 at 05:01:37PM -0700, Dave Hansen wrote:
> On 7/31/25 16:45, Sohil Mehta wrote:
> > On 7/9/2025 10:00 AM, Dave Hansen wrote:
> >> On 7/7/25 01:03, Kirill A. Shutemov wrote:
> >>> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
> >>> arch_cpu_finalize_init(), defer it until core initcall.
> >> What are the side effects of this move? Are there other benefits? What
> >> are the risks?
> >>
> > Picking this up from Kirill.. Reevaluating this, core_initcall() seems
> > too late for setup_cr_pinning().
> >
> > We need to have CR pinning completed, and the associated static key
> > enabled before AP bring up. start_secondary()->cr4_init() depends on the
> > cr_pinning static key to initialize CR4 for APs.
>
> Sure, if you leave cr4_init() completely as-is.
>
> 'cr4_pinned_bits' should be set by the boot CPU. Secondary CPUs should
> also read 'cr4_pinned_bits' when setting up their own cr4's,
> unconditionally, independent of 'cr_pinning'.
>
> The thing I think we should change is the pinning _enforcement_. The
> easiest way to do that is to remove the static_branch_likely() in
> cr4_init() and then delay flipping the static branch until just before
> userspace starts.
Yeah, this is fine from my perspective. The goal with the pinning was
about keeping things safe in the face of an attack from userspace that
managed to get at MSR values and keeping them from being trivially
changed.
--
Kees Cook
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall
2025-08-02 18:51 ` Kees Cook
@ 2025-08-04 6:55 ` H. Peter Anvin
0 siblings, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2025-08-04 6:55 UTC (permalink / raw)
To: Kees Cook, Dave Hansen
Cc: Sohil Mehta, Thomas Gleixner, Dave Hansen, Jonathan Corbet,
Ingo Molnar, Pawan Gupta, Daniel Sneddon, Kai Huang, Sandipan Das,
Breno Leitao, Rick Edgecombe, Alexei Starovoitov, Hou Tao,
Juergen Gross, Vegard Nossum, Eric Biggers, Jason Gunthorpe,
Masami Hiramatsu (Google), Andrew Morton, Luis Chamberlain,
Yuntao Wang, Rasmus Villemoes, Christophe Leroy, Tejun Heo,
Changbin Du, Huang Shijie, Geert Uytterhoeven, Namhyung Kim,
Arnaldo Carvalho de Melo, linux-doc, linux-kernel, linux-efi,
linux-mm, Kirill A. Shutemov, Kirill A. Shutemov, Andy Lutomirski,
Ingo Molnar, Borislav Petkov, Peter Zijlstra, Ard Biesheuvel,
Paul E. McKenney, Josh Poimboeuf, Xiongwei Song, Xin Li,
Mike Rapoport (IBM), Brijesh Singh, Michael Roth, Tony Luck,
Alexey Kardashevskiy, Alexander Shishkin, X86-kernel
On August 2, 2025 11:51:28 AM PDT, Kees Cook <kees@kernel.org> wrote:
>On Thu, Jul 31, 2025 at 05:01:37PM -0700, Dave Hansen wrote:
>> On 7/31/25 16:45, Sohil Mehta wrote:
>> > On 7/9/2025 10:00 AM, Dave Hansen wrote:
>> >> On 7/7/25 01:03, Kirill A. Shutemov wrote:
>> >>> Instead of moving setup_cr_pinning() below efi_enter_virtual_mode() in
>> >>> arch_cpu_finalize_init(), defer it until core initcall.
>> >> What are the side effects of this move? Are there other benefits? What
>> >> are the risks?
>> >>
>> > Picking this up from Kirill.. Reevaluating this, core_initcall() seems
>> > too late for setup_cr_pinning().
>> >
>> > We need to have CR pinning completed, and the associated static key
>> > enabled before AP bring up. start_secondary()->cr4_init() depends on the
>> > cr_pinning static key to initialize CR4 for APs.
>>
>> Sure, if you leave cr4_init() completely as-is.
>>
>> 'cr4_pinned_bits' should be set by the boot CPU. Secondary CPUs should
>> also read 'cr4_pinned_bits' when setting up their own cr4's,
>> unconditionally, independent of 'cr_pinning'.
>>
>> The thing I think we should change is the pinning _enforcement_. The
>> easiest way to do that is to remove the static_branch_likely() in
>> cr4_init() and then delay flipping the static branch until just before
>> userspace starts.
>
>Yeah, this is fine from my perspective. The goal with the pinning was
>about keeping things safe in the face of an attack from userspace that
>managed to get at MSR values and keeping them from being trivially
>changed.
>
I have mentioned this before: I would like to see CR4-pinning use a patchable immediate to make it harder to manipulate. If the mask is final when alternatives are run, that would be a good time to install it; the code can just contain a zero immediate (no pinning) or a very limited set of bits that must never be changed at all up to that point.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-08-04 6:56 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250707080317.3791624-1-kirill.shutemov@linux.intel.com>
[not found] ` <20250707080317.3791624-3-kirill.shutemov@linux.intel.com>
[not found] ` <7d93b343-b275-4edb-ae26-4578ae53652f@intel.com>
2025-07-25 2:35 ` [PATCHv9 02/16] x86/alternatives: Disable LASS when patching kernel alternatives Sohil Mehta
2025-07-28 19:11 ` David Laight
2025-07-28 19:28 ` H. Peter Anvin
2025-07-28 19:38 ` David Laight
2025-08-01 0:15 ` Sohil Mehta
[not found] ` <20250707080317.3791624-5-kirill.shutemov@linux.intel.com>
[not found] ` <6075af69-299f-43d2-a3c8-353a2a3b7ee7@intel.com>
2025-07-31 23:45 ` [PATCHv9 04/16] x86/cpu: Defer CR pinning setup until core initcall Sohil Mehta
2025-08-01 0:01 ` Dave Hansen
2025-08-01 4:43 ` Sohil Mehta
2025-08-01 14:22 ` Dave Hansen
2025-08-02 18:51 ` Kees Cook
2025-08-04 6:55 ` H. Peter Anvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox