Linux Hardening
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/15] pkeys-based page table hardening
@ 2025-01-08 10:32 Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 01/15] mm: Introduce kpkeys Kevin Brodsky
                   ` (15 more replies)
  0 siblings, 16 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

This is a proposal to leverage protection keys (pkeys) to harden
critical kernel data, by making it mostly read-only. The series includes
a simple framework called "kpkeys" to manipulate pkeys for in-kernel use,
as well as a page table hardening feature based on that framework
(kpkeys_hardened_pgtables). Both are implemented on arm64 as a proof of
concept, but they are designed to be compatible with any architecture
implementing pkeys.

The proposed approach is a typical use of pkeys: the data to protect is
mapped with a given pkey P, and the pkey register is initially configured
to grant read-only access to P. Where the protected data needs to be
written to, the pkey register is temporarily switched to grant write
access to P on the current CPU.

The key fact this approach relies on is that the target data is
only written to via a limited and well-defined API. This makes it
possible to explicitly switch the pkey register where needed, without
introducing excessively invasive changes, and only for a small amount of
trusted code.

Page tables were chosen as they are a popular (and critical) target for
attacks, but there are of course many others - this is only a starting
point (see section "Further use-cases"). It has become more and more
common for accesses to such target data to be mediated by a hypervisor
in vendor kernels; the hope is that kpkeys can provide much of that
protection in a simpler manner. No benchmarking has been performed at
this stage, but the runtime overhead should also be lower (though likely
not negligible).

# kpkeys

The use of pkeys involves two separate mechanisms: assigning a pkey to
pages, and defining the pkeys -> permissions mapping via the pkey
register. This is implemented through the following interface:

- Pages in the linear mapping are assigned a pkey using set_memory_pkey().
  This is sufficient for this series, but of course higher-level
  interfaces can be introduced later to ask allocators to return pages
  marked with a given pkey. It should also be possible to extend this to
  vmalloc() if needed.

- The pkey register is configured based on a *kpkeys level*. kpkeys
  levels are simple integers that correspond to a given configuration,
  for instance:

  KPKEYS_LVL_DEFAULT:
        RW access to KPKEYS_PKEY_DEFAULT
        RO access to any other KPKEYS_PKEY_*

  KPKEYS_LVL_<FEAT>:
        RW access to KPKEYS_PKEY_DEFAULT
        RW access to KPKEYS_PKEY_<FEAT>
        RO access to any other KPKEYS_PKEY_*

  Only pkeys that are managed by the kpkeys framework are impacted;
  permissions for other pkeys are left unchanged (this allows for other
  schemes using pkeys to be used in parallel, and arch-specific use of
  certain pkeys).

  The kpkeys level is changed by calling kpkeys_set_level(), setting the
  pkey register accordingly and returning the original value. A
  subsequent call to kpkeys_restore_pkey_reg() restores the kpkeys
  level. The numeric value of KPKEYS_LVL_* (kpkeys level) is purely
  symbolic and thus generic, however each architecture is free to define
  KPKEYS_PKEY_* (pkey value).

# kpkeys_hardened_pgtables

The kpkeys_hardened_pgtables feature uses the interface above to make
the (kernel and user) page tables read-only by default, enabling write
access only in helpers such as set_pte(). One complication is that those
helpers as well as page table allocators are used very early, before
kpkeys become available. Enabling kpkeys_hardened_pgtables, if and when
kpkeys become available, is therefore done as follows:

1. A static key is turned on. This enables a transition to
   KPKEYS_LVL_PGTABLES in all helpers writing to page tables, and also
   impacts page table allocators (see step 3).

2. All pages holding kernel page tables are set to KPKEYS_PKEY_PGTABLES.
   This ensures they can only be written when runnning at
   KPKEYS_LVL_PGTABLES.

3. Page table allocators set the returned pages to KPKEYS_PKEY_PGTABLES
   (and the pkey is reset upon freeing). This ensures that all page
   tables are mapped with that privileged pkey.

# Threat model

The proposed scheme aims at mitigating data-only attacks (e.g.
use-after-free/cross-cache attacks). In other words, it is assumed that
control flow is not corrupted, and that the attacker does not achieve
arbitrary code execution. Nothing prevents the pkey register from being
set to its most permissive state - the assumption is that the register
is only modified on legitimate code paths.

A few related notes:

- Functions that set the pkey register are all implemented inline.
  Besides performance considerations, this is meant to avoid creating
  a function that can be used as a straightforward gadget to set the
  pkey register to an arbitrary value.

- kpkeys_set_level() only accepts a compile-time constant as argument,
  as a variable could be manipulated by an attacker. This could be
  relaxed but it seems unlikely that a variable kpkeys level would be
  needed in practice.

# Further use-cases

It should be possible to harden various targets using kpkeys, including:

- struct cred (enforcing a "mostly read-only" state once committed)

- fixmap (occasionally used even after early boot, e.g.
  set_swapper_pgd() in arch/arm64/mm/mmu.c)

- SELinux state (e.g. struct selinux_state::initialized)

... and many others.

kpkeys could also be used to strengthen the confidentiality of secret
data by making it completely inaccessible by default, and granting
read-only or read-write access as needed. This requires such data to be
rarely accessed (or via a limited interface only). One example on arm64
is the pointer authentication keys in thread_struct, whose leakage to
userspace would lead to pointer authentication being easily defeated.

# This series

The series is composed of two parts:

- The kpkeys framework (patch 1-7). The main API is introduced in
  <linux/kpkeys.h>, and it is implemented on arm64 using the POE
  (Permission Overlay Extension) feature.

- The kpkeys_hardened_pgtables feature (patch 8-15). <linux/kpkeys.h> is
  extended with an API to set page table pages to a given pkey and a
  guard object to switch kpkeys level accordingly, both gated on a
  static key. This is then used in generic and arm64 pgtable handling
  code as needed. Finally a simple KUnit-based test suite is added to
  demonstrate the page table protection.

The arm64 implementation should be considered a proof of concept only.
The enablement of POE for in-kernel use is incomplete; in particular
POR_EL1 (pkey register) should be reset on exception entry and restored
on exception return.

# Performance

No particular efforts were made to optimise the use of kpkeys at this
stage (and no benchmarking was performed either). There are two obvious
low-hanging fruits in the kpkeys_hardened_pgtables feature:

- Always switching kpkeys level in leaf helpers such as set_pte() can be
  very inefficient if many page table entries are updated in a row. Some
  sort of batching may be desirable.

- On arm64 specifically, the page table helpers typically perform an
  expensive ISB (Instruction Synchronisation Barrier) after writing to
  page tables. Since most of the cost of switching the arm64 pkey
  register (POR_EL1) comes from the following ISB, the overhead incurred
  by kpkeys_restore_pkey_reg() would be significantly reduced by merging
  its ISB with the pgtable helper's. That would however require more
  invasive changes, beyond simply adding a guard object.

# Open questions

A few aspects in this RFC that are debatable and/or worth discussing:

- There is currently no restriction on how kpkeys levels map to pkeys
  permissions. A typical approach is to allocate one pkey per level and
  make it writable at that level only. As the number of levels
  increases, we may however run out of pkeys, especially on arm64 (just
  8 pkeys with POE). Depending on the use-cases, it may be acceptable to
  use the same pkey for the data associated to multiple levels.

  Another potential concern is that a given piece of code may require
  write access to multiple privileged pkeys. This could be addressed by
  introducing a notion of hierarchy in trust levels, where Tn is able to
  write to memory owned by Tm if n >= m, for instance.

- kpkeys_set_level() and kpkeys_restore_pkey_reg() are not symmetric:
  the former takes a kpkeys level and returns a pkey register value, to
  be consumed by the latter. It would be more intuitive to manipulate
  kpkeys levels only. However this assumes that there is a 1:1 mapping
  between kpkeys levels and pkey register values, while in principle
  the mapping is 1:n (certain pkeys may be used outside the kpkeys
  framework).

- An architecture that supports kpkeys is expected to select
  CONFIG_ARCH_HAS_KPKEYS and always enable them if available - there is
  no CONFIG_KPKEYS to control this behaviour. Since this creates no
  significant overhead (at least on arm64), it seemed better to keep it
  simple. Each hardening feature does have its own option and arch
  opt-in if needed (CONFIG_KPKEYS_HARDENED_PGTABLES,
  CONFIG_ARCH_HAS_KPKEYS_HARDENED_PGTABLES).


Any comment or feedback will be highly appreciated, be it on the
high-level approach or implementation choices!

- Kevin

---
Changelog RFC v1..v2:

- A new approach is used to set the pkey of page table pages. Thanks to
  Qi Zheng's and my own series [1][2], pagetable_*_ctor is
  systematically called when a PTP is allocated at any level (PTE to
  PGD), and pagetable_*_dtor when it is freed, on all architectures.
  Patch 11 makes use of this to call kpkeys_{,un}protect_pgtable_memory
  from the common ctor/dtor helper. The arm64 patches from v1 (patch 12
  and 13) are dropped as they are no longer needed. Patch 10 is
  introduced to allow pagetable_*_ctor to fail at all levels, since
  kpkeys_protect_pgtable_memory may itself fail.
  [Original suggestion by Peter Zijlstra]

- Changed the prototype of kpkeys_{,un}protect_pgtable_memory in patch 9
  to take a struct folio * for more convenience, and implemented them
  out-of-line to avoid a circular dependency with <linux/mm.h>.

- Rebased on next-20250107, which includes [1] and [2].

- Added locking in patch 8. [Peter Zijlstra's suggestion]

RFC v1: https://lore.kernel.org/linux-hardening/20241206101110.1646108-1-kevin.brodsky@arm.com/

[1] https://lore.kernel.org/linux-mm/cover.1736317725.git.zhengqi.arch@bytedance.com/
[2] https://lore.kernel.org/linux-mm/20250103184415.2744423-1-kevin.brodsky@arm.com/
---
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pierre Langlois <pierre.langlois@arm.com>
Cc: Quentin Perret <qperret@google.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: x86@kernel.org
---
Kevin Brodsky (15):
  mm: Introduce kpkeys
  set_memory: Introduce set_memory_pkey() stub
  arm64: mm: Enable overlays for all EL1 indirect permissions
  arm64: Introduce por_set_pkey_perms() helper
  arm64: Implement asm/kpkeys.h using POE
  arm64: set_memory: Implement set_memory_pkey()
  arm64: Enable kpkeys
  mm: Introduce kernel_pgtables_set_pkey()
  mm: Introduce kpkeys_hardened_pgtables
  mm: Allow __pagetable_ctor() to fail
  mm: Map page tables with privileged pkey
  arm64: kpkeys: Support KPKEYS_LVL_PGTABLES
  arm64: mm: Guard page table writes with kpkeys
  arm64: Enable kpkeys_hardened_pgtables support
  mm: Add basic tests for kpkeys_hardened_pgtables

 arch/arm64/Kconfig                    |   2 +
 arch/arm64/include/asm/kpkeys.h       |  45 +++++++++
 arch/arm64/include/asm/pgtable-prot.h |  16 +--
 arch/arm64/include/asm/pgtable.h      |  19 +++-
 arch/arm64/include/asm/por.h          |   9 ++
 arch/arm64/include/asm/set_memory.h   |   4 +
 arch/arm64/kernel/cpufeature.c        |   5 +-
 arch/arm64/kernel/smp.c               |   2 +
 arch/arm64/mm/fault.c                 |   2 +
 arch/arm64/mm/mmu.c                   |  28 ++----
 arch/arm64/mm/pageattr.c              |  21 ++++
 include/asm-generic/kpkeys.h          |  21 ++++
 include/asm-generic/pgalloc.h         |  15 ++-
 include/linux/kpkeys.h                | 112 +++++++++++++++++++++
 include/linux/mm.h                    |  27 ++---
 include/linux/set_memory.h            |   7 ++
 mm/Kconfig                            |   5 +
 mm/Makefile                           |   2 +
 mm/kpkeys_hardened_pgtables.c         |  44 +++++++++
 mm/kpkeys_hardened_pgtables_test.c    |  72 ++++++++++++++
 mm/memory.c                           | 137 ++++++++++++++++++++++++++
 security/Kconfig.hardening            |  24 +++++
 22 files changed, 576 insertions(+), 43 deletions(-)
 create mode 100644 arch/arm64/include/asm/kpkeys.h
 create mode 100644 include/asm-generic/kpkeys.h
 create mode 100644 include/linux/kpkeys.h
 create mode 100644 mm/kpkeys_hardened_pgtables.c
 create mode 100644 mm/kpkeys_hardened_pgtables_test.c


base-commit: 7b4b9bf203da94fbeac75ed3116c84aa03e74578
-- 
2.47.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 01/15] mm: Introduce kpkeys
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 02/15] set_memory: Introduce set_memory_pkey() stub Kevin Brodsky
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

kpkeys is a simple framework to enable the use of protection keys
(pkeys) to harden the kernel itself. This patch introduces the basic
API in <linux/kpkeys.h>: a couple of functions to set and restore
the pkey register and a macro to define guard objects.

kpkeys introduces a new concept on top of pkeys: the kpkeys level.
Each level is associated to a set of permissions for the pkeys
managed by the kpkeys framework. kpkeys_set_level(lvl) sets those
permissions according to lvl, and returns the original pkey
register, to be later restored by kpkeys_restore_pkey_reg(). To
start with, only KPKEYS_LVL_DEFAULT is available, which is meant
to grant RW access to KPKEYS_PKEY_DEFAULT (i.e. all memory since
this is the only available pkey for now).

Because each architecture implementing pkeys uses a different
representation for the pkey register, and may reserve certain pkeys
for specific uses, support for kpkeys must be explicitly indicated
by selecting ARCH_HAS_KPKEYS and defining the following functions in
<asm/kpkeys.h>, in addition to the macros provided in
<asm-generic/kpkeys.h>:

- arch_kpkeys_set_level()
- arch_kpkeys_restore_pkey_reg()
- arch_kpkeys_enabled()

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 include/asm-generic/kpkeys.h |  9 +++++
 include/linux/kpkeys.h       | 66 ++++++++++++++++++++++++++++++++++++
 mm/Kconfig                   |  2 ++
 3 files changed, 77 insertions(+)
 create mode 100644 include/asm-generic/kpkeys.h
 create mode 100644 include/linux/kpkeys.h

diff --git a/include/asm-generic/kpkeys.h b/include/asm-generic/kpkeys.h
new file mode 100644
index 000000000000..3404ce249757
--- /dev/null
+++ b/include/asm-generic/kpkeys.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_GENERIC_KPKEYS_H
+#define __ASM_GENERIC_KPKEYS_H
+
+#ifndef KPKEYS_PKEY_DEFAULT
+#define KPKEYS_PKEY_DEFAULT	0
+#endif
+
+#endif	/* __ASM_GENERIC_KPKEYS_H */
diff --git a/include/linux/kpkeys.h b/include/linux/kpkeys.h
new file mode 100644
index 000000000000..70e44b0db150
--- /dev/null
+++ b/include/linux/kpkeys.h
@@ -0,0 +1,66 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _LINUX_KPKEYS_H
+#define _LINUX_KPKEYS_H
+
+#include <linux/bug.h>
+#include <linux/cleanup.h>
+
+#define KPKEYS_LVL_DEFAULT	0
+
+#define KPKEYS_LVL_MIN		KPKEYS_LVL_DEFAULT
+#define KPKEYS_LVL_MAX		KPKEYS_LVL_DEFAULT
+
+#define KPKEYS_GUARD(_name, set_level, restore_pkey_reg)		\
+	__DEFINE_CLASS_IS_CONDITIONAL(_name, false);			\
+	DEFINE_CLASS(_name, u64,					\
+		     restore_pkey_reg, set_level, void);		\
+	static inline void *class_##_name##_lock_ptr(u64 *_T)		\
+	{ return _T; }
+
+#ifdef CONFIG_ARCH_HAS_KPKEYS
+
+#include <asm/kpkeys.h>
+
+/**
+ * kpkeys_set_level() - switch kpkeys level
+ * @level: the level to switch to
+ *
+ * Switches the kpkeys level to the specified value. @level must be a
+ * compile-time constant. The arch-specific pkey register will be updated
+ * accordingly, and the original value returned.
+ *
+ * Return: the original pkey register value.
+ */
+static inline u64 kpkeys_set_level(int level)
+{
+	BUILD_BUG_ON_MSG(!__builtin_constant_p(level),
+			 "kpkeys_set_level() only takes constant levels");
+	BUILD_BUG_ON_MSG(level < KPKEYS_LVL_MIN || level > KPKEYS_LVL_MAX,
+			 "Invalid level passed to kpkeys_set_level()");
+
+	return arch_kpkeys_set_level(level);
+}
+
+/**
+ * kpkeys_restore_pkey_reg() - restores a pkey register value
+ * @pkey_reg: the pkey register value to restore
+ *
+ * This function is meant to be passed the value returned by kpkeys_set_level(),
+ * in order to restore the pkey register to its original value (thus restoring
+ * the original kpkeys level).
+ */
+static inline void kpkeys_restore_pkey_reg(u64 pkey_reg)
+{
+	arch_kpkeys_restore_pkey_reg(pkey_reg);
+}
+
+#else /* CONFIG_ARCH_HAS_KPKEYS */
+
+static inline bool arch_kpkeys_enabled(void)
+{
+	return false;
+}
+
+#endif /* CONFIG_ARCH_HAS_KPKEYS */
+
+#endif /* _LINUX_KPKEYS_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 1b501db06417..71edc478f111 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1147,6 +1147,8 @@ config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
 	bool
+config ARCH_HAS_KPKEYS
+	bool
 
 config ARCH_USES_PG_ARCH_2
 	bool
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 02/15] set_memory: Introduce set_memory_pkey() stub
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 01/15] mm: Introduce kpkeys Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 03/15] arm64: mm: Enable overlays for all EL1 indirect permissions Kevin Brodsky
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

Introduce a new function, set_memory_pkey(), which sets the
protection key (pkey) of pages in the specified linear mapping
range. Architectures implementing kernel pkeys (kpkeys) must
provide a suitable implementation; an empty stub is added as
fallback.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 include/linux/set_memory.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index 3030d9245f5a..7b3a8bfde3c6 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -84,4 +84,11 @@ static inline int set_memory_decrypted(unsigned long addr, int numpages)
 }
 #endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
+#ifndef CONFIG_ARCH_HAS_KPKEYS
+static inline int set_memory_pkey(unsigned long addr, int numpages, int pkey)
+{
+	return 0;
+}
+#endif
+
 #endif /* _LINUX_SET_MEMORY_H_ */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 03/15] arm64: mm: Enable overlays for all EL1 indirect permissions
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 01/15] mm: Introduce kpkeys Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 02/15] set_memory: Introduce set_memory_pkey() stub Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 04/15] arm64: Introduce por_set_pkey_perms() helper Kevin Brodsky
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

In preparation of using POE inside the kernel, enable "Overlay
applied" for all stage 1 base permissions in PIR_EL1. This ensures
that the permissions set in POR_EL1 affect all kernel mappings.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/include/asm/pgtable-prot.h | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index a95f1f77bb39..7c0c30460900 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -181,13 +181,13 @@ static inline bool __pure lpa2_is_enabled(void)
 	PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS),           PIE_NONE_O) | \
 	PIRx_ELx_PERM(pte_pi_index(_PAGE_GCS_RO),        PIE_NONE_O) | \
 	PIRx_ELx_PERM(pte_pi_index(_PAGE_EXECONLY),      PIE_NONE_O) | \
-	PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_R)      | \
-	PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC),   PIE_RW)     | \
-	PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY),      PIE_R)      | \
-	PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED),        PIE_RW)     | \
-	PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_ROX),    PIE_RX)     | \
-	PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_EXEC),   PIE_RWX)    | \
-	PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_RO),     PIE_R)      | \
-	PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL),        PIE_RW))
+	PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY_EXEC), PIE_R_O)    | \
+	PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED_EXEC),   PIE_RW_O)   | \
+	PIRx_ELx_PERM(pte_pi_index(_PAGE_READONLY),      PIE_R_O)    | \
+	PIRx_ELx_PERM(pte_pi_index(_PAGE_SHARED),        PIE_RW_O)   | \
+	PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_ROX),    PIE_RX_O)   | \
+	PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_EXEC),   PIE_RWX_O)  | \
+	PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL_RO),     PIE_R_O)    | \
+	PIRx_ELx_PERM(pte_pi_index(_PAGE_KERNEL),        PIE_RW_O))
 
 #endif /* __ASM_PGTABLE_PROT_H */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 04/15] arm64: Introduce por_set_pkey_perms() helper
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (2 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 03/15] arm64: mm: Enable overlays for all EL1 indirect permissions Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 05/15] arm64: Implement asm/kpkeys.h using POE Kevin Brodsky
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

Introduce a helper that sets the permissions of a given pkey
(POIndex) in the POR_ELx format, and make use of it in
arch_set_user_pkey_access().

Also ensure that <asm/sysreg.h> is included in asm/por.h to provide
the POE_* definitions.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/include/asm/por.h |  9 +++++++++
 arch/arm64/mm/mmu.c          | 28 ++++++++++------------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/por.h b/arch/arm64/include/asm/por.h
index e06e9f473675..7f0d73980cce 100644
--- a/arch/arm64/include/asm/por.h
+++ b/arch/arm64/include/asm/por.h
@@ -6,6 +6,8 @@
 #ifndef _ASM_ARM64_POR_H
 #define _ASM_ARM64_POR_H
 
+#include <asm/sysreg.h>
+
 #define POR_BITS_PER_PKEY		4
 #define POR_ELx_IDX(por_elx, idx)	(((por_elx) >> ((idx) * POR_BITS_PER_PKEY)) & 0xf)
 
@@ -30,4 +32,11 @@ static inline bool por_elx_allows_exec(u64 por, u8 pkey)
 	return perm & POE_X;
 }
 
+static inline u64 por_set_pkey_perms(u64 por, u8 pkey, u64 perms)
+{
+	u64 shift = pkey * POR_BITS_PER_PKEY;
+
+	return (por & ~(POE_MASK << shift)) | (perms << shift);
+}
+
 #endif /* _ASM_ARM64_POR_H */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e2739b69e11b..20e0390ee382 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1554,9 +1554,8 @@ void __cpu_replace_ttbr1(pgd_t *pgdp, bool cnp)
 #ifdef CONFIG_ARCH_HAS_PKEYS
 int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val)
 {
-	u64 new_por = POE_RXW;
-	u64 old_por;
-	u64 pkey_shift;
+	u64 new_perms;
+	u64 por;
 
 	if (!system_supports_poe())
 		return -ENOSPC;
@@ -1570,26 +1569,19 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long i
 		return -EINVAL;
 
 	/* Set the bits we need in POR:  */
-	new_por = POE_RXW;
+	new_perms = POE_RXW;
 	if (init_val & PKEY_DISABLE_WRITE)
-		new_por &= ~POE_W;
+		new_perms &= ~POE_W;
 	if (init_val & PKEY_DISABLE_ACCESS)
-		new_por &= ~POE_RW;
+		new_perms &= ~POE_RW;
 	if (init_val & PKEY_DISABLE_READ)
-		new_por &= ~POE_R;
+		new_perms &= ~POE_R;
 	if (init_val & PKEY_DISABLE_EXECUTE)
-		new_por &= ~POE_X;
+		new_perms &= ~POE_X;
 
-	/* Shift the bits in to the correct place in POR for pkey: */
-	pkey_shift = pkey * POR_BITS_PER_PKEY;
-	new_por <<= pkey_shift;
-
-	/* Get old POR and mask off any old bits in place: */
-	old_por = read_sysreg_s(SYS_POR_EL0);
-	old_por &= ~(POE_MASK << pkey_shift);
-
-	/* Write old part along with new part: */
-	write_sysreg_s(old_por | new_por, SYS_POR_EL0);
+	por = read_sysreg_s(SYS_POR_EL0);
+	por = por_set_pkey_perms(por, pkey, new_perms);
+	write_sysreg_s(por, SYS_POR_EL0);
 
 	return 0;
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 05/15] arm64: Implement asm/kpkeys.h using POE
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (3 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 04/15] arm64: Introduce por_set_pkey_perms() helper Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 06/15] arm64: set_memory: Implement set_memory_pkey() Kevin Brodsky
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

Implement the kpkeys interface if CONFIG_ARM64_POE is enabled.
The permissions for KPKEYS_PKEY_DEFAULT (pkey 0) are set to RWX as
this pkey is also used for code mappings.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/include/asm/kpkeys.h | 43 +++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100644 arch/arm64/include/asm/kpkeys.h

diff --git a/arch/arm64/include/asm/kpkeys.h b/arch/arm64/include/asm/kpkeys.h
new file mode 100644
index 000000000000..e17f6df41873
--- /dev/null
+++ b/arch/arm64/include/asm/kpkeys.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_KPKEYS_H
+#define __ASM_KPKEYS_H
+
+#include <asm/barrier.h>
+#include <asm/cpufeature.h>
+#include <asm/por.h>
+
+#include <asm-generic/kpkeys.h>
+
+static inline bool arch_kpkeys_enabled(void)
+{
+	return system_supports_poe();
+}
+
+#ifdef CONFIG_ARM64_POE
+
+static inline u64 por_set_kpkeys_level(u64 por, int level)
+{
+	por = por_set_pkey_perms(por, KPKEYS_PKEY_DEFAULT, POE_RXW);
+
+	return por;
+}
+
+static inline int arch_kpkeys_set_level(int level)
+{
+	u64 prev_por = read_sysreg_s(SYS_POR_EL1);
+
+	write_sysreg_s(por_set_kpkeys_level(prev_por, level), SYS_POR_EL1);
+	isb();
+
+	return prev_por;
+}
+
+static inline void arch_kpkeys_restore_pkey_reg(u64 pkey_reg)
+{
+	write_sysreg_s(pkey_reg, SYS_POR_EL1);
+	isb();
+}
+
+#endif /* CONFIG_ARM64_POE */
+
+#endif	/* __ASM_KPKEYS_H */
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 06/15] arm64: set_memory: Implement set_memory_pkey()
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (4 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 05/15] arm64: Implement asm/kpkeys.h using POE Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 07/15] arm64: Enable kpkeys Kevin Brodsky
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

Implement set_memory_pkey() using POE if supported.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/include/asm/set_memory.h |  4 ++++
 arch/arm64/mm/pageattr.c            | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/arch/arm64/include/asm/set_memory.h b/arch/arm64/include/asm/set_memory.h
index 90f61b17275e..b6cd6de34abf 100644
--- a/arch/arm64/include/asm/set_memory.h
+++ b/arch/arm64/include/asm/set_memory.h
@@ -19,4 +19,8 @@ bool kernel_page_present(struct page *page);
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
 
+#ifdef CONFIG_ARCH_HAS_KPKEYS
+int set_memory_pkey(unsigned long addr, int numpages, int pkey);
+#endif
+
 #endif /* _ASM_ARM64_SET_MEMORY_H */
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 39fd1f7ff02a..3b8fec532b18 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -292,6 +292,27 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
 	return set_memory_valid(addr, nr, valid);
 }
 
+#ifdef CONFIG_ARCH_HAS_KPKEYS
+int set_memory_pkey(unsigned long addr, int numpages, int pkey)
+{
+	unsigned long set_prot = 0;
+
+	if (!system_supports_poe())
+		return 0;
+
+	if (!__is_lm_address(addr))
+		return -EINVAL;
+
+	set_prot |= pkey & BIT(0) ? PTE_PO_IDX_0 : 0;
+	set_prot |= pkey & BIT(1) ? PTE_PO_IDX_1 : 0;
+	set_prot |= pkey & BIT(2) ? PTE_PO_IDX_2 : 0;
+
+	return __change_memory_common(addr, PAGE_SIZE * numpages,
+				      __pgprot(set_prot),
+				      __pgprot(PTE_PO_IDX_MASK));
+}
+#endif
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 /*
  * This is - apart from the return value - doing the same
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 07/15] arm64: Enable kpkeys
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (5 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 06/15] arm64: set_memory: Implement set_memory_pkey() Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 08/15] mm: Introduce kernel_pgtables_set_pkey() Kevin Brodsky
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

This is the final step to enable kpkeys on arm64. We enable
POE at EL1 by setting TCR2_EL1.POE, and initialise POR_EL1 so that
it enables access to the default pkey/POIndex (default kpkeys
level). An ISB is added so that POE restrictions are enforced
immediately.

Having done this, we can now select ARCH_HAS_KPKEYS if ARM64_POE is
enabled.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/Kconfig             | 1 +
 arch/arm64/kernel/cpufeature.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8bd31e754e79..688ffd9bf503 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2184,6 +2184,7 @@ config ARM64_POE
 	def_bool y
 	select ARCH_USES_HIGH_VMA_FLAGS
 	select ARCH_HAS_PKEYS
+	select ARCH_HAS_KPKEYS
 	help
 	  The Permission Overlay Extension is used to implement Memory
 	  Protection Keys. Memory Protection Keys provides a mechanism for
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 829c9f6d326a..94735c91b980 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -76,6 +76,7 @@
 #include <linux/kasan.h>
 #include <linux/percpu.h>
 #include <linux/sched/isolation.h>
+#include <linux/kpkeys.h>
 
 #include <asm/cpu.h>
 #include <asm/cpufeature.h>
@@ -2387,8 +2388,10 @@ static void cpu_enable_mops(const struct arm64_cpu_capabilities *__unused)
 #ifdef CONFIG_ARM64_POE
 static void cpu_enable_poe(const struct arm64_cpu_capabilities *__unused)
 {
-	sysreg_clear_set(REG_TCR2_EL1, 0, TCR2_EL1_E0POE);
+	write_sysreg_s(por_set_kpkeys_level(0, KPKEYS_LVL_DEFAULT), SYS_POR_EL1);
+	sysreg_clear_set(REG_TCR2_EL1, 0, TCR2_EL1_E0POE | TCR2_EL1_POE);
 	sysreg_clear_set(CPACR_EL1, 0, CPACR_EL1_E0POE);
+	isb();
 }
 #endif
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 08/15] mm: Introduce kernel_pgtables_set_pkey()
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (6 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 07/15] arm64: Enable kpkeys Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 09/15] mm: Introduce kpkeys_hardened_pgtables Kevin Brodsky
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

kernel_pgtables_set_pkey() allows setting the pkey of all page table
pages in swapper_pg_dir, recursively. This will be needed by
kpkeys_hardened_pgtables, as it relies on all PTPs being mapped with
a non-default pkey. Those initial kernel page tables cannot
practically be assigned a non-default pkey right when they are
allocated, so mutating them during (early) boot is required.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 include/linux/mm.h |   2 +
 mm/memory.c        | 137 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 139 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f2a144c4734d..453a26bcad1a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4239,4 +4239,6 @@ int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *st
 int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
 int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
 
+int kernel_pgtables_set_pkey(int pkey);
+
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory.c b/mm/memory.c
index e3f34a179f4a..fbc4ac25d19b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -77,6 +77,8 @@
 #include <linux/vmalloc.h>
 #include <linux/sched/sysctl.h>
 #include <linux/fsnotify.h>
+#include <linux/kpkeys.h>
+#include <linux/set_memory.h>
 
 #include <trace/events/kmem.h>
 
@@ -7129,3 +7131,138 @@ void vma_pgtable_walk_end(struct vm_area_struct *vma)
 	if (is_vm_hugetlb_page(vma))
 		hugetlb_vma_unlock_read(vma);
 }
+
+static int set_page_pkey(void *p, int pkey)
+{
+	unsigned long addr = (unsigned long)p;
+
+	/*
+	 * swapper_pg_dir itself will be made read-only by mark_rodata_ro()
+	 * so there is no point in changing its pkey.
+	 */
+	if (p == swapper_pg_dir)
+		return 0;
+
+	return set_memory_pkey(addr, 1, pkey);
+}
+
+static int set_pkey_pte(pmd_t *pmd, int pkey)
+{
+	pte_t *pte;
+	int err;
+
+	pte = pte_offset_kernel(pmd, 0);
+	err = set_page_pkey(pte, pkey);
+
+	return err;
+}
+
+static int set_pkey_pmd(pud_t *pud, int pkey)
+{
+	pmd_t *pmd;
+	int i, err = 0;
+
+	pmd = pmd_offset(pud, 0);
+
+	err = set_page_pkey(pmd, pkey);
+	if (err)
+		return err;
+
+	for (i = 0; i < PTRS_PER_PMD; i++) {
+		if (pmd_none(pmd[i]) || pmd_bad(pmd[i]) || pmd_leaf(pmd[i]))
+			continue;
+		err = set_pkey_pte(&pmd[i], pkey);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static int set_pkey_pud(p4d_t *p4d, int pkey)
+{
+	pud_t *pud;
+	int i, err = 0;
+
+	if (mm_pmd_folded(&init_mm))
+		return set_pkey_pmd((pud_t *)p4d, pkey);
+
+	pud = pud_offset(p4d, 0);
+
+	err = set_page_pkey(pud, pkey);
+	if (err)
+		return err;
+
+	for (i = 0; i < PTRS_PER_PUD; i++) {
+		if (pud_none(pud[i]) || pud_bad(pud[i]) || pud_leaf(pud[i]))
+			continue;
+		err = set_pkey_pmd(&pud[i], pkey);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static int set_pkey_p4d(pgd_t *pgd, int pkey)
+{
+	p4d_t *p4d;
+	int i, err = 0;
+
+	if (mm_pud_folded(&init_mm))
+		return set_pkey_pud((p4d_t *)pgd, pkey);
+
+	p4d = p4d_offset(pgd, 0);
+
+	err = set_page_pkey(p4d, pkey);
+	if (err)
+		return err;
+
+	for (i = 0; i < PTRS_PER_P4D; i++) {
+		if (p4d_none(p4d[i]) || p4d_bad(p4d[i]) || p4d_leaf(p4d[i]))
+			continue;
+		err = set_pkey_pud(&p4d[i], pkey);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+/**
+ * kernel_pgtables_set_pkey - set pkey for all kernel page table pages
+ * @pkey: pkey to set the page table pages to
+ *
+ * Walks swapper_pg_dir setting the protection key of every page table page (at
+ * all levels) to @pkey. swapper_pg_dir itself is left untouched as it is
+ * expected to be mapped read-only by mark_rodata_ro().
+ *
+ * No-op if the architecture does not support kpkeys.
+ */
+int kernel_pgtables_set_pkey(int pkey)
+{
+	pgd_t *pgd = swapper_pg_dir;
+	int i, err = 0;
+
+	if (!arch_kpkeys_enabled())
+		return 0;
+
+	spin_lock(&init_mm.page_table_lock);
+
+	if (mm_p4d_folded(&init_mm)) {
+		err = set_pkey_p4d(pgd, pkey);
+		goto out;
+	}
+
+	for (i = 0; i < PTRS_PER_PGD; i++) {
+		if (pgd_none(pgd[i]) || pgd_bad(pgd[i]) || pgd_leaf(pgd[i]))
+			continue;
+		err = set_pkey_p4d(&pgd[i], pkey);
+		if (err)
+			break;
+	}
+
+out:
+	spin_unlock(&init_mm.page_table_lock);
+	return err;
+}
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 09/15] mm: Introduce kpkeys_hardened_pgtables
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (7 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 08/15] mm: Introduce kernel_pgtables_set_pkey() Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 10/15] mm: Allow __pagetable_ctor() to fail Kevin Brodsky
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

kpkeys_hardened_pgtables is a hardening feature based on kpkeys. It
aims to prevent the corruption of page tables by: 1. mapping all
page table pages, both kernel and user, with a privileged pkey
(KPKEYS_PKEY_PGTABLES), and 2. granting write access to that pkey
only when running at a higher kpkeys level (KPKEYS_LVL_PGTABLES).

The feature is exposed as CONFIG_KPKEYS_HARDENED_PGTABLES; it
requires explicit architecture opt-in by selecting
ARCH_HAS_KPKEYS_HARDENED_PGTABLES, since much of the page table
handling is arch-specific.

This patch introduces an API to modify the PTPs' pkey and switch
kpkeys level using a guard object. Because this API is going to be
called from low-level pgtable helpers (setters, allocators), it must
be inactive on boot and explicitly switched on if and when kpkeys
become available. A static key is used for that purpose; it is the
responsibility of each architecture supporting
kpkeys_hardened_pgtables to call kpkeys_hardened_pgtables_enable()
as early as possible to switch on that static key. The initial
kernel page tables are also walked to set their pkey, since they
have already been allocated at that point.

The definition of the kpkeys_hardened_pgtables guard class does not
use the static key on the restore path to avoid mismatched
set/restore pairs. Indeed, enabling the static key itself involves
modifying page tables, and it is thus possible that the guard object
is created when the static key appears as false, and destroyed when it
appears as true. To avoid this situation, we reserve an invalid value
for the pkey register and use it to disable the restore path.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 include/asm-generic/kpkeys.h  | 12 +++++++++
 include/linux/kpkeys.h        | 48 ++++++++++++++++++++++++++++++++++-
 mm/Kconfig                    |  3 +++
 mm/Makefile                   |  1 +
 mm/kpkeys_hardened_pgtables.c | 44 ++++++++++++++++++++++++++++++++
 security/Kconfig.hardening    | 12 +++++++++
 6 files changed, 119 insertions(+), 1 deletion(-)
 create mode 100644 mm/kpkeys_hardened_pgtables.c

diff --git a/include/asm-generic/kpkeys.h b/include/asm-generic/kpkeys.h
index 3404ce249757..cec92334a9f3 100644
--- a/include/asm-generic/kpkeys.h
+++ b/include/asm-generic/kpkeys.h
@@ -2,8 +2,20 @@
 #ifndef __ASM_GENERIC_KPKEYS_H
 #define __ASM_GENERIC_KPKEYS_H
 
+#ifndef KPKEYS_PKEY_PGTABLES
+#define KPKEYS_PKEY_PGTABLES	1
+#endif
+
 #ifndef KPKEYS_PKEY_DEFAULT
 #define KPKEYS_PKEY_DEFAULT	0
 #endif
 
+/*
+ * Represents a pkey register value that cannot be used, typically disabling
+ * access to all keys.
+ */
+#ifndef KPKEYS_PKEY_REG_INVAL
+#define KPKEYS_PKEY_REG_INVAL	0
+#endif
+
 #endif	/* __ASM_GENERIC_KPKEYS_H */
diff --git a/include/linux/kpkeys.h b/include/linux/kpkeys.h
index 70e44b0db150..587cf8b4bd33 100644
--- a/include/linux/kpkeys.h
+++ b/include/linux/kpkeys.h
@@ -4,11 +4,13 @@
 
 #include <linux/bug.h>
 #include <linux/cleanup.h>
+#include <linux/jump_label.h>
 
 #define KPKEYS_LVL_DEFAULT	0
+#define KPKEYS_LVL_PGTABLES	1
 
 #define KPKEYS_LVL_MIN		KPKEYS_LVL_DEFAULT
-#define KPKEYS_LVL_MAX		KPKEYS_LVL_DEFAULT
+#define KPKEYS_LVL_MAX		KPKEYS_LVL_PGTABLES
 
 #define KPKEYS_GUARD(_name, set_level, restore_pkey_reg)		\
 	__DEFINE_CLASS_IS_CONDITIONAL(_name, false);			\
@@ -63,4 +65,48 @@ static inline bool arch_kpkeys_enabled(void)
 
 #endif /* CONFIG_ARCH_HAS_KPKEYS */
 
+#ifdef CONFIG_KPKEYS_HARDENED_PGTABLES
+
+DECLARE_STATIC_KEY_FALSE(kpkeys_hardened_pgtables_enabled);
+
+/*
+ * Use guard(kpkeys_hardened_pgtables)() to temporarily grant write access
+ * to page tables.
+ */
+KPKEYS_GUARD(kpkeys_hardened_pgtables,
+	     static_branch_unlikely(&kpkeys_hardened_pgtables_enabled) ?
+		     kpkeys_set_level(KPKEYS_LVL_PGTABLES) :
+		     KPKEYS_PKEY_REG_INVAL,
+	     _T != KPKEYS_PKEY_REG_INVAL ?
+		     kpkeys_restore_pkey_reg(_T) :
+		     (void)0)
+
+int kpkeys_protect_pgtable_memory(struct folio *folio);
+int kpkeys_unprotect_pgtable_memory(struct folio *folio);
+
+/*
+ * Enables kpkeys_hardened_pgtables and switches existing kernel page tables to
+ * a privileged pkey (KPKEYS_PKEY_PGTABLES).
+ *
+ * Should be called as early as possible by architecture code, after (k)pkeys
+ * are initialised and before any user task is spawned.
+ */
+void kpkeys_hardened_pgtables_enable(void);
+
+#else /* CONFIG_KPKEYS_HARDENED_PGTABLES */
+
+KPKEYS_GUARD(kpkeys_hardened_pgtables, 0, (void)_T)
+
+static inline int kpkeys_protect_pgtable_memory(struct folio *folio)
+{
+	return 0;
+}
+static inline int kpkeys_unprotect_pgtable_memory(struct folio *folio)
+{
+	return 0;
+}
+static inline void kpkeys_hardened_pgtables_enable(void) {}
+
+#endif /* CONFIG_KPKEYS_HARDENED_PGTABLES */
+
 #endif /* _LINUX_KPKEYS_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 71edc478f111..2a8ebe780e64 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1149,6 +1149,9 @@ config ARCH_HAS_PKEYS
 	bool
 config ARCH_HAS_KPKEYS
 	bool
+# ARCH_HAS_KPKEYS must be selected when selecting this option
+config ARCH_HAS_KPKEYS_HARDENED_PGTABLES
+	bool
 
 config ARCH_USES_PG_ARCH_2
 	bool
diff --git a/mm/Makefile b/mm/Makefile
index 850386a67b3e..130691364172 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -147,3 +147,4 @@ obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o
 obj-$(CONFIG_EXECMEM) += execmem.o
 obj-$(CONFIG_TMPFS_QUOTA) += shmem_quota.o
 obj-$(CONFIG_PT_RECLAIM) += pt_reclaim.o
+obj-$(CONFIG_KPKEYS_HARDENED_PGTABLES) += kpkeys_hardened_pgtables.o
diff --git a/mm/kpkeys_hardened_pgtables.c b/mm/kpkeys_hardened_pgtables.c
new file mode 100644
index 000000000000..c6eb7fb6ae56
--- /dev/null
+++ b/mm/kpkeys_hardened_pgtables.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/mm.h>
+#include <linux/kpkeys.h>
+#include <linux/set_memory.h>
+
+DEFINE_STATIC_KEY_FALSE(kpkeys_hardened_pgtables_enabled);
+
+int kpkeys_protect_pgtable_memory(struct folio *folio)
+{
+	unsigned long addr = (unsigned long)folio_address(folio);
+	unsigned int order = folio_order(folio);
+	int ret = 0;
+
+	if (static_branch_unlikely(&kpkeys_hardened_pgtables_enabled))
+		ret = set_memory_pkey(addr, 1 << order, KPKEYS_PKEY_PGTABLES);
+
+	WARN_ON(ret);
+	return ret;
+}
+
+int kpkeys_unprotect_pgtable_memory(struct folio *folio)
+{
+	unsigned long addr = (unsigned long)folio_address(folio);
+	unsigned int order = folio_order(folio);
+	int ret = 0;
+
+	if (static_branch_unlikely(&kpkeys_hardened_pgtables_enabled))
+		ret = set_memory_pkey(addr, 1 << order, KPKEYS_PKEY_DEFAULT);
+
+	WARN_ON(ret);
+	return ret;
+}
+
+void __init kpkeys_hardened_pgtables_enable(void)
+{
+	int ret;
+
+	if (!arch_kpkeys_enabled())
+		return;
+
+	static_branch_enable(&kpkeys_hardened_pgtables_enabled);
+	ret = kernel_pgtables_set_pkey(KPKEYS_PKEY_PGTABLES);
+	WARN_ON(ret);
+}
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index c9d5ca3d8d08..95f93f1d4055 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -300,6 +300,18 @@ config BUG_ON_DATA_CORRUPTION
 
 	  If unsure, say N.
 
+config KPKEYS_HARDENED_PGTABLES
+	bool "Harden page tables using kernel pkeys"
+	depends on ARCH_HAS_KPKEYS_HARDENED_PGTABLES
+	help
+	  This option makes all page tables mostly read-only by
+	  allocating them with a non-default protection key (pkey) and
+	  only enabling write access to that pkey in routines that are
+	  expected to write to page table entries.
+
+	  This option has no effect if the system does not support
+	  kernel pkeys.
+
 endmenu
 
 config CC_HAS_RANDSTRUCT
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 10/15] mm: Allow __pagetable_ctor() to fail
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (8 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 09/15] mm: Introduce kpkeys_hardened_pgtables Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 11/15] mm: Map page tables with privileged pkey Kevin Brodsky
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

In preparation for adding construction hooks (that may fail) to
__pagetable_ctor(), make __pagetable_ctor() return a bool,
propagate it to pagetable_*_ctor() and handle failure in
the generic {pud,p4d,pgd}_alloc.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 include/asm-generic/pgalloc.h | 15 ++++++++++++---
 include/linux/mm.h            | 21 ++++++++++-----------
 2 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 892ece4558a2..9962f7454d0c 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -173,7 +173,10 @@ static inline pud_t *__pud_alloc_one_noprof(struct mm_struct *mm, unsigned long
 	if (!ptdesc)
 		return NULL;
 
-	pagetable_pud_ctor(ptdesc);
+	if (!pagetable_pud_ctor(ptdesc)) {
+		pagetable_free(ptdesc);
+		return NULL;
+	}
 	return ptdesc_address(ptdesc);
 }
 #define __pud_alloc_one(...)	alloc_hooks(__pud_alloc_one_noprof(__VA_ARGS__))
@@ -227,7 +230,10 @@ static inline p4d_t *__p4d_alloc_one_noprof(struct mm_struct *mm, unsigned long
 	if (!ptdesc)
 		return NULL;
 
-	pagetable_p4d_ctor(ptdesc);
+	if (!pagetable_p4d_ctor(ptdesc)) {
+		pagetable_free(ptdesc);
+		return NULL;
+	}
 	return ptdesc_address(ptdesc);
 }
 #define __p4d_alloc_one(...)	alloc_hooks(__p4d_alloc_one_noprof(__VA_ARGS__))
@@ -271,7 +277,10 @@ static inline pgd_t *__pgd_alloc_noprof(struct mm_struct *mm, unsigned int order
 	if (!ptdesc)
 		return NULL;
 
-	pagetable_pgd_ctor(ptdesc);
+	if (!pagetable_pgd_ctor(ptdesc)) {
+		pagetable_free(ptdesc);
+		return NULL;
+	}
 	return ptdesc_address(ptdesc);
 }
 #define __pgd_alloc(...)	alloc_hooks(__pgd_alloc_noprof(__VA_ARGS__))
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 453a26bcad1a..e99040be477f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3076,12 +3076,13 @@ static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
 static inline void ptlock_free(struct ptdesc *ptdesc) {}
 #endif /* defined(CONFIG_SPLIT_PTE_PTLOCKS) */
 
-static inline void __pagetable_ctor(struct ptdesc *ptdesc)
+static inline bool __pagetable_ctor(struct ptdesc *ptdesc)
 {
 	struct folio *folio = ptdesc_folio(ptdesc);
 
 	__folio_set_pgtable(folio);
 	lruvec_stat_add_folio(folio, NR_PAGETABLE);
+	return true;
 }
 
 static inline void pagetable_dtor(struct ptdesc *ptdesc)
@@ -3103,8 +3104,7 @@ static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
 {
 	if (!ptlock_init(ptdesc))
 		return false;
-	__pagetable_ctor(ptdesc);
-	return true;
+	return __pagetable_ctor(ptdesc);
 }
 
 pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp);
@@ -3210,8 +3210,7 @@ static inline bool pagetable_pmd_ctor(struct ptdesc *ptdesc)
 	if (!pmd_ptlock_init(ptdesc))
 		return false;
 	ptdesc_pmd_pts_init(ptdesc);
-	__pagetable_ctor(ptdesc);
-	return true;
+	return __pagetable_ctor(ptdesc);
 }
 
 /*
@@ -3233,19 +3232,19 @@ static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud)
 	return ptl;
 }
 
-static inline void pagetable_pud_ctor(struct ptdesc *ptdesc)
+static inline bool pagetable_pud_ctor(struct ptdesc *ptdesc)
 {
-	__pagetable_ctor(ptdesc);
+	return __pagetable_ctor(ptdesc);
 }
 
-static inline void pagetable_p4d_ctor(struct ptdesc *ptdesc)
+static inline bool pagetable_p4d_ctor(struct ptdesc *ptdesc)
 {
-	__pagetable_ctor(ptdesc);
+	return __pagetable_ctor(ptdesc);
 }
 
-static inline void pagetable_pgd_ctor(struct ptdesc *ptdesc)
+static inline bool pagetable_pgd_ctor(struct ptdesc *ptdesc)
 {
-	__pagetable_ctor(ptdesc);
+	return __pagetable_ctor(ptdesc);
 }
 
 extern void __init pagecache_init(void);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 11/15] mm: Map page tables with privileged pkey
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (9 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 10/15] mm: Allow __pagetable_ctor() to fail Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 12/15] arm64: kpkeys: Support KPKEYS_LVL_PGTABLES Kevin Brodsky
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

If CONFIG_KPKEYS_HARDENED_PGTABLES is enabled, map allocated page
table pages using a privileged pkey (KPKEYS_PKEY_PGTABLES), so that
page tables can only be written under guard(kpkeys_hardened_pgtables).

This patch is a no-op if CONFIG_KPKEYS_HARDENED_PGTABLES is disabled
(default).

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 include/linux/mm.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e99040be477f..714e1af91752 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -33,6 +33,7 @@
 #include <linux/slab.h>
 #include <linux/cacheinfo.h>
 #include <linux/rcuwait.h>
+#include <linux/kpkeys.h>
 
 struct mempolicy;
 struct anon_vma;
@@ -3082,6 +3083,8 @@ static inline bool __pagetable_ctor(struct ptdesc *ptdesc)
 
 	__folio_set_pgtable(folio);
 	lruvec_stat_add_folio(folio, NR_PAGETABLE);
+	if (kpkeys_protect_pgtable_memory(folio))
+		return false;
 	return true;
 }
 
@@ -3092,6 +3095,7 @@ static inline void pagetable_dtor(struct ptdesc *ptdesc)
 	ptlock_free(ptdesc);
 	__folio_clear_pgtable(folio);
 	lruvec_stat_sub_folio(folio, NR_PAGETABLE);
+	kpkeys_unprotect_pgtable_memory(folio);
 }
 
 static inline void pagetable_dtor_free(struct ptdesc *ptdesc)
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 12/15] arm64: kpkeys: Support KPKEYS_LVL_PGTABLES
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (10 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 11/15] mm: Map page tables with privileged pkey Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 13/15] arm64: mm: Guard page table writes with kpkeys Kevin Brodsky
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

Enable RW access to KPKEYS_PKEY_PGTABLES (used to map page table
pages) if switching to KPKEYS_LVL_PGTABLES, otherwise only grant RO
access.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/include/asm/kpkeys.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/kpkeys.h b/arch/arm64/include/asm/kpkeys.h
index e17f6df41873..4854e1f3babd 100644
--- a/arch/arm64/include/asm/kpkeys.h
+++ b/arch/arm64/include/asm/kpkeys.h
@@ -18,6 +18,8 @@ static inline bool arch_kpkeys_enabled(void)
 static inline u64 por_set_kpkeys_level(u64 por, int level)
 {
 	por = por_set_pkey_perms(por, KPKEYS_PKEY_DEFAULT, POE_RXW);
+	por = por_set_pkey_perms(por, KPKEYS_PKEY_PGTABLES,
+				 level == KPKEYS_LVL_PGTABLES ? POE_RW : POE_R);
 
 	return por;
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 13/15] arm64: mm: Guard page table writes with kpkeys
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (11 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 12/15] arm64: kpkeys: Support KPKEYS_LVL_PGTABLES Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-09  7:17   ` Qi Zheng
  2025-01-08 10:32 ` [RFC PATCH v2 14/15] arm64: Enable kpkeys_hardened_pgtables support Kevin Brodsky
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

When CONFIG_KPKEYS_HARDENED_PGTABLES is enabled, page tables (both
user and kernel) are mapped with a privileged pkey in the linear
mapping. As a result, they can only be written under the
kpkeys_hardened_pgtables guard, which sets POR_EL1 appropriately to
allow such writes.

Use this guard wherever page tables genuinely need to be written,
keeping its scope as small as possible (so that POR_EL1 is reset as
fast as possible). Where atomics are involved, the guard's scope
encompasses the whole loop to avoid switching POR_EL1 unnecessarily.

This patch is a no-op if CONFIG_KPKEYS_HARDENED_PGTABLES is disabled
(default).

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 19 +++++++++++++++++--
 arch/arm64/mm/fault.c            |  2 ++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index f8dac6673887..0d60a49dc234 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -39,6 +39,7 @@
 #include <linux/mm_types.h>
 #include <linux/sched.h>
 #include <linux/page_table_check.h>
+#include <linux/kpkeys.h>
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
@@ -314,6 +315,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte)
 
 static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
 {
+	guard(kpkeys_hardened_pgtables)();
 	WRITE_ONCE(*ptep, pte);
 }
 
@@ -758,6 +760,7 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 	}
 #endif /* __PAGETABLE_PMD_FOLDED */
 
+	guard(kpkeys_hardened_pgtables)();
 	WRITE_ONCE(*pmdp, pmd);
 
 	if (pmd_valid(pmd)) {
@@ -825,6 +828,7 @@ static inline void set_pud(pud_t *pudp, pud_t pud)
 		return;
 	}
 
+	guard(kpkeys_hardened_pgtables)();
 	WRITE_ONCE(*pudp, pud);
 
 	if (pud_valid(pud)) {
@@ -906,6 +910,7 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d)
 		return;
 	}
 
+	guard(kpkeys_hardened_pgtables)();
 	WRITE_ONCE(*p4dp, p4d);
 	dsb(ishst);
 	isb();
@@ -1033,6 +1038,7 @@ static inline void set_pgd(pgd_t *pgdp, pgd_t pgd)
 		return;
 	}
 
+	guard(kpkeys_hardened_pgtables)();
 	WRITE_ONCE(*pgdp, pgd);
 	dsb(ishst);
 	isb();
@@ -1233,6 +1239,7 @@ static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma,
 {
 	pte_t old_pte, pte;
 
+	guard(kpkeys_hardened_pgtables)();
 	pte = __ptep_get(ptep);
 	do {
 		old_pte = pte;
@@ -1279,7 +1286,10 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
 				       unsigned long address, pte_t *ptep)
 {
-	pte_t pte = __pte(xchg_relaxed(&pte_val(*ptep), 0));
+	pte_t pte;
+
+	scoped_guard(kpkeys_hardened_pgtables)
+		pte = __pte(xchg_relaxed(&pte_val(*ptep), 0));
 
 	page_table_check_pte_clear(mm, pte);
 
@@ -1322,7 +1332,10 @@ static inline pte_t __get_and_clear_full_ptes(struct mm_struct *mm,
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
 					    unsigned long address, pmd_t *pmdp)
 {
-	pmd_t pmd = __pmd(xchg_relaxed(&pmd_val(*pmdp), 0));
+	pmd_t pmd;
+
+	scoped_guard(kpkeys_hardened_pgtables)
+		pmd = __pmd(xchg_relaxed(&pmd_val(*pmdp), 0));
 
 	page_table_check_pmd_clear(mm, pmd);
 
@@ -1336,6 +1349,7 @@ static inline void ___ptep_set_wrprotect(struct mm_struct *mm,
 {
 	pte_t old_pte;
 
+	guard(kpkeys_hardened_pgtables)();
 	do {
 		old_pte = pte;
 		pte = pte_wrprotect(pte);
@@ -1416,6 +1430,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 		unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
 	page_table_check_pmd_set(vma->vm_mm, pmdp, pmd);
+	guard(kpkeys_hardened_pgtables)();
 	return __pmd(xchg_relaxed(&pmd_val(*pmdp), pmd_val(pmd)));
 }
 #endif
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index ef63651099a9..ab45047155b9 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -220,6 +220,8 @@ int __ptep_set_access_flags(struct vm_area_struct *vma,
 	if (pte_same(pte, entry))
 		return 0;
 
+	guard(kpkeys_hardened_pgtables)();
+
 	/* only preserve the access flags and write permission */
 	pte_val(entry) &= PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY;
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 14/15] arm64: Enable kpkeys_hardened_pgtables support
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (12 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 13/15] arm64: mm: Guard page table writes with kpkeys Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-08 10:32 ` [RFC PATCH v2 15/15] mm: Add basic tests for kpkeys_hardened_pgtables Kevin Brodsky
  2025-01-09 16:30 ` [RFC PATCH v2 00/15] pkeys-based page table hardening Dave Hansen
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

kpkeys_hardened_pgtables should be enabled as early as possible (if
selected). It does however require kpkeys being available, which
means on arm64 POE being detected and enabled. POE is a boot
feature, so calling kpkeys_hardened_pgtables_enable() just after
setup_boot_cpu_features() in smp_prepare_boot_cpu() is the best we
can do.

With that done, all the bits are in place and we can advertise
support for kpkeys_hardened_pgtables by selecting
ARCH_HAS_KPKEYS_HARDENED_PGTABLES if ARM64_POE is enabled.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 arch/arm64/Kconfig      | 1 +
 arch/arm64/kernel/smp.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 688ffd9bf503..654e6dff9a79 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2185,6 +2185,7 @@ config ARM64_POE
 	select ARCH_USES_HIGH_VMA_FLAGS
 	select ARCH_HAS_PKEYS
 	select ARCH_HAS_KPKEYS
+	select ARCH_HAS_KPKEYS_HARDENED_PGTABLES
 	help
 	  The Permission Overlay Extension is used to implement Memory
 	  Protection Keys. Memory Protection Keys provides a mechanism for
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 3b3f6b56e733..074cab55f9db 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -35,6 +35,7 @@
 #include <linux/kgdb.h>
 #include <linux/kvm_host.h>
 #include <linux/nmi.h>
+#include <linux/kpkeys.h>
 
 #include <asm/alternative.h>
 #include <asm/atomic.h>
@@ -468,6 +469,7 @@ void __init smp_prepare_boot_cpu(void)
 	if (system_uses_irq_prio_masking())
 		init_gic_priority_masking();
 
+	kpkeys_hardened_pgtables_enable();
 	kasan_init_hw_tags();
 	/* Init percpu seeds for random tags after cpus are set up. */
 	kasan_init_sw_tags();
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH v2 15/15] mm: Add basic tests for kpkeys_hardened_pgtables
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (13 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 14/15] arm64: Enable kpkeys_hardened_pgtables support Kevin Brodsky
@ 2025-01-08 10:32 ` Kevin Brodsky
  2025-01-09 16:30 ` [RFC PATCH v2 00/15] pkeys-based page table hardening Dave Hansen
  15 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-08 10:32 UTC (permalink / raw)
  To: linux-hardening
  Cc: linux-kernel, Kevin Brodsky, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, Qi Zheng, linux-arm-kernel, x86

Add basic tests for the kpkeys_hardened_pgtables feature: try to
perform a direct write to some kernel and user page table entry and
ensure it fails.

Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
 mm/Makefile                        |  1 +
 mm/kpkeys_hardened_pgtables_test.c | 72 ++++++++++++++++++++++++++++++
 security/Kconfig.hardening         | 12 +++++
 3 files changed, 85 insertions(+)
 create mode 100644 mm/kpkeys_hardened_pgtables_test.c

diff --git a/mm/Makefile b/mm/Makefile
index 130691364172..f7263b7f45b8 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -148,3 +148,4 @@ obj-$(CONFIG_EXECMEM) += execmem.o
 obj-$(CONFIG_TMPFS_QUOTA) += shmem_quota.o
 obj-$(CONFIG_PT_RECLAIM) += pt_reclaim.o
 obj-$(CONFIG_KPKEYS_HARDENED_PGTABLES) += kpkeys_hardened_pgtables.o
+obj-$(CONFIG_KPKEYS_HARDENED_PGTABLES_TEST) += kpkeys_hardened_pgtables_test.o
diff --git a/mm/kpkeys_hardened_pgtables_test.c b/mm/kpkeys_hardened_pgtables_test.c
new file mode 100644
index 000000000000..86d862d43bea
--- /dev/null
+++ b/mm/kpkeys_hardened_pgtables_test.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <kunit/test.h>
+#include <linux/pgtable.h>
+#include <linux/mman.h>
+
+static void write_kernel_pte(struct kunit *test)
+{
+	pte_t *ptep;
+	pte_t pte;
+	int ret;
+
+	/*
+	 * The choice of address is mostly arbitrary - we just need a page
+	 * that is definitely mapped, such as the current function.
+	 */
+	ptep = virt_to_kpte((unsigned long)&write_kernel_pte);
+	KUNIT_ASSERT_NOT_NULL_MSG(test, ptep, "Failed to get PTE");
+
+	pte = ptep_get(ptep);
+	pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
+	ret = copy_to_kernel_nofault(ptep, &pte, sizeof(pte));
+	KUNIT_EXPECT_EQ_MSG(test, ret, -EFAULT,
+			    "Direct PTE write wasn't prevented");
+}
+
+static void write_user_pmd(struct kunit *test)
+{
+	pmd_t *pmdp;
+	pmd_t pmd;
+	unsigned long uaddr;
+	int ret;
+
+	uaddr = kunit_vm_mmap(test, NULL, 0, PAGE_SIZE, PROT_READ,
+			      MAP_ANONYMOUS | MAP_PRIVATE | MAP_POPULATE, 0);
+	KUNIT_ASSERT_NE_MSG(test, uaddr, 0, "Could not create userspace mm");
+
+	/* We passed MAP_POPULATE so a PMD should already be allocated */
+	pmdp = pmd_off(current->mm, uaddr);
+	KUNIT_ASSERT_NOT_NULL_MSG(test, pmdp, "Failed to get PMD");
+
+	pmd = pmdp_get(pmdp);
+	pmd = set_pmd_bit(pmd, __pgprot(PROT_SECT_NORMAL));
+	ret = copy_to_kernel_nofault(pmdp, &pmd, sizeof(pmd));
+	KUNIT_EXPECT_EQ_MSG(test, ret, -EFAULT,
+			    "Direct PMD write wasn't prevented");
+}
+
+static int kpkeys_hardened_pgtables_suite_init(struct kunit_suite *suite)
+{
+	if (!arch_kpkeys_enabled()) {
+		pr_err("Cannot run kpkeys_hardened_pgtables tests: kpkeys are not supported\n");
+		return 1;
+	}
+
+	return 0;
+}
+
+static struct kunit_case kpkeys_hardened_pgtables_test_cases[] = {
+	KUNIT_CASE(write_kernel_pte),
+	KUNIT_CASE(write_user_pmd),
+	{}
+};
+
+static struct kunit_suite kpkeys_hardened_pgtables_test_suite = {
+	.name = "Hardened pgtables using kpkeys",
+	.test_cases = kpkeys_hardened_pgtables_test_cases,
+	.suite_init = kpkeys_hardened_pgtables_suite_init,
+};
+kunit_test_suite(kpkeys_hardened_pgtables_test_suite);
+
+MODULE_DESCRIPTION("Tests for the kpkeys_hardened_pgtables feature");
+MODULE_LICENSE("GPL");
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index 95f93f1d4055..8bc5d7235f6d 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -312,6 +312,18 @@ config KPKEYS_HARDENED_PGTABLES
 	  This option has no effect if the system does not support
 	  kernel pkeys.
 
+config KPKEYS_HARDENED_PGTABLES_TEST
+	tristate "KUnit tests for kpkeys_hardened_pgtables" if !KUNIT_ALL_TESTS
+	depends on KPKEYS_HARDENED_PGTABLES
+	depends on KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to check that the kpkeys_hardened_pgtables feature
+	  functions as intended, i.e. prevents arbitrary writes to user and
+	  kernel page tables.
+
+	  If unsure, say N.
+
 endmenu
 
 config CC_HAS_RANDSTRUCT
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 13/15] arm64: mm: Guard page table writes with kpkeys
  2025-01-08 10:32 ` [RFC PATCH v2 13/15] arm64: mm: Guard page table writes with kpkeys Kevin Brodsky
@ 2025-01-09  7:17   ` Qi Zheng
  2025-01-10 14:05     ` Kevin Brodsky
  0 siblings, 1 reply; 20+ messages in thread
From: Qi Zheng @ 2025-01-09  7:17 UTC (permalink / raw)
  To: Kevin Brodsky
  Cc: linux-hardening, linux-kernel, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, linux-arm-kernel, x86

Hi Kevin,

On 2025/1/8 18:32, Kevin Brodsky wrote:
> When CONFIG_KPKEYS_HARDENED_PGTABLES is enabled, page tables (both
> user and kernel) are mapped with a privileged pkey in the linear
> mapping. As a result, they can only be written under the
> kpkeys_hardened_pgtables guard, which sets POR_EL1 appropriately to
> allow such writes.
> 
> Use this guard wherever page tables genuinely need to be written,
> keeping its scope as small as possible (so that POR_EL1 is reset as
> fast as possible). Where atomics are involved, the guard's scope
> encompasses the whole loop to avoid switching POR_EL1 unnecessarily.
> 
> This patch is a no-op if CONFIG_KPKEYS_HARDENED_PGTABLES is disabled
> (default).
> 
> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
> ---
>   arch/arm64/include/asm/pgtable.h | 19 +++++++++++++++++--
>   arch/arm64/mm/fault.c            |  2 ++
>   2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index f8dac6673887..0d60a49dc234 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -39,6 +39,7 @@
>   #include <linux/mm_types.h>
>   #include <linux/sched.h>
>   #include <linux/page_table_check.h>
> +#include <linux/kpkeys.h>
>   
>   #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
> @@ -314,6 +315,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte)
>   
>   static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
>   {
> +	guard(kpkeys_hardened_pgtables)();
>   	WRITE_ONCE(*ptep, pte);
>   }
>   
> @@ -758,6 +760,7 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
>   	}
>   #endif /* __PAGETABLE_PMD_FOLDED */
>   
> +	guard(kpkeys_hardened_pgtables)();
>   	WRITE_ONCE(*pmdp, pmd);
>   
>   	if (pmd_valid(pmd)) {

I noticed a long time ago that set_pte/set_pmd/... was implemented
separately by each architecture without a unified entry point. This
makes it difficult to add some hooks for them.

Taking set_pte() as an example, is it possible to do the following:

1) add a generic set_pte() in include/asm-generic/tlb.h (Or other more
    appropriate files)

static inline void set_pte(pte_t *ptep, pte_t pte)
{
	arch_set_pte(ptep, pte);
}

2) let each architecture include this file and rename the original
    set_pte() to arch_set_pte().

3) then we can add hooks for generic set_pte():

static inline void set_pte(pte_t *ptep, pte_t pte)
{
	guard(kpkeys_hardened_pgtables)();
	arch_set_pte(ptep, pte);
}

4) in this way, the architecture that supports
    ARCH_HAS_KPKEYS_HARDENED_PGTABLES only needs to implement
    the kpkeys_hardened_pgtables(), otherwise it is no-op.

Just some immature ideas, and the related set/clear interfaces
are currently quite messy. ;)

Of course, this does not affect the feature to be implemented
in this patch series.

Thanks!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 00/15] pkeys-based page table hardening
  2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
                   ` (14 preceding siblings ...)
  2025-01-08 10:32 ` [RFC PATCH v2 15/15] mm: Add basic tests for kpkeys_hardened_pgtables Kevin Brodsky
@ 2025-01-09 16:30 ` Dave Hansen
  2025-01-13 10:10   ` Kevin Brodsky
  15 siblings, 1 reply; 20+ messages in thread
From: Dave Hansen @ 2025-01-09 16:30 UTC (permalink / raw)
  To: Kevin Brodsky, linux-hardening
  Cc: linux-kernel, Andrew Morton, Mark Brown, Catalin Marinas,
	Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly, Kees Cook,
	Linus Walleij, Andy Lutomirski, Marc Zyngier, Peter Zijlstra,
	Pierre Langlois, Quentin Perret, Mike Rapoport (IBM),
	Ryan Roberts, Thomas Gleixner, Will Deacon, Matthew Wilcox,
	Qi Zheng, linux-arm-kernel, x86

One of the sticker things in the x86 attempt to do the same thing was
context switching, both between normal tasks and in/out of exceptions
and interrupts.

The easiest place this manifested for us was code chunk like this:

	kpkeys_set_level(KPKEYS_LVL_PGTABLES);
	// modify page tables here
	kpkeys_restore_pkey_reg();

We had to make sure that we didn't get preempted and context switch over
to some other task that _wasn't_ doing page table manipulation while
page table writes were allowed.

On x86, we had to basically start context-switching the kernel pkey
register the same way we do GPRs.

How is SYS_POR_EL0 being context switched?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 13/15] arm64: mm: Guard page table writes with kpkeys
  2025-01-09  7:17   ` Qi Zheng
@ 2025-01-10 14:05     ` Kevin Brodsky
  0 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-10 14:05 UTC (permalink / raw)
  To: Qi Zheng
  Cc: linux-hardening, linux-kernel, Andrew Morton, Mark Brown,
	Catalin Marinas, Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly,
	Kees Cook, Linus Walleij, Andy Lutomirski, Marc Zyngier,
	Peter Zijlstra, Pierre Langlois, Quentin Perret,
	Mike Rapoport (IBM), Ryan Roberts, Thomas Gleixner, Will Deacon,
	Matthew Wilcox, linux-arm-kernel, x86

On 09/01/2025 08:17, Qi Zheng wrote:
> [...]
>
>> @@ -314,6 +315,7 @@ static inline pte_t pte_clear_uffd_wp(pte_t pte)
>>     static inline void __set_pte_nosync(pte_t *ptep, pte_t pte)
>>   {
>> +    guard(kpkeys_hardened_pgtables)();
>>       WRITE_ONCE(*ptep, pte);
>>   }
>>   @@ -758,6 +760,7 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
>>       }
>>   #endif /* __PAGETABLE_PMD_FOLDED */
>>   +    guard(kpkeys_hardened_pgtables)();
>>       WRITE_ONCE(*pmdp, pmd);
>>         if (pmd_valid(pmd)) {
>
> I noticed a long time ago that set_pte/set_pmd/... was implemented
> separately by each architecture without a unified entry point. This
> makes it difficult to add some hooks for them.
>
> Taking set_pte() as an example, is it possible to do the following:
>
> 1) add a generic set_pte() in include/asm-generic/tlb.h (Or other more
>    appropriate files)
>
> static inline void set_pte(pte_t *ptep, pte_t pte)
> {
>     arch_set_pte(ptep, pte);
> }
>
> 2) let each architecture include this file and rename the original
>    set_pte() to arch_set_pte().
>
> 3) then we can add hooks for generic set_pte():
>
> static inline void set_pte(pte_t *ptep, pte_t pte)
> {
>     guard(kpkeys_hardened_pgtables)();
>     arch_set_pte(ptep, pte);
> }
>
> 4) in this way, the architecture that supports
>    ARCH_HAS_KPKEYS_HARDENED_PGTABLES only needs to implement
>    the kpkeys_hardened_pgtables(), otherwise it is no-op.

Thanks for chiming in, it is an interesting idea for sure. I think the
issue here might be the benefit/churn ratio, because this would simply
be adding a layer of (generic) function calls without unifying any
existing arch code. Unfortunately, unlike the pagetable_alloc/tlb stuff,
the majority of what happens in the page table modifiers is arch-specific.

For set_p*(), we could potentially have it call a generic __set_p*()
that does a WRITE_ONCE(), which is what most architectures do (in
addition to various other things, like DSB/ISB on arm64). Adding the
kpkeys_hardened_pgtables guard there would be better, as it reduces the
"privileged" window to just the write itself. However for the other
modifiers, say ptep_get_and_clear(), the implementation seems to vary
wildly between arch's, including the atomic operation itself. Any
unification of those seems therefore difficult.

That said, I'd be happy to look into adding that generic layer on top
(i.e. generic set_pte() calls arch_set_pte()) if there is enough
consensus that the churn is justified. We could potentially do a mix of
both as well (arch-defined set_pte() calls generic __set_pte(), while
generic ptep_get_and_clear() calls arch_ptep_get_and_clear()).

- Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v2 00/15] pkeys-based page table hardening
  2025-01-09 16:30 ` [RFC PATCH v2 00/15] pkeys-based page table hardening Dave Hansen
@ 2025-01-13 10:10   ` Kevin Brodsky
  0 siblings, 0 replies; 20+ messages in thread
From: Kevin Brodsky @ 2025-01-13 10:10 UTC (permalink / raw)
  To: Dave Hansen, linux-hardening
  Cc: linux-kernel, Andrew Morton, Mark Brown, Catalin Marinas,
	Dave Hansen, Jann Horn, Jeff Xu, Joey Gouly, Kees Cook,
	Linus Walleij, Andy Lutomirski, Marc Zyngier, Peter Zijlstra,
	Pierre Langlois, Quentin Perret, Mike Rapoport (IBM),
	Ryan Roberts, Thomas Gleixner, Will Deacon, Matthew Wilcox,
	Qi Zheng, linux-arm-kernel, x86

On 09/01/2025 17:30, Dave Hansen wrote:
> One of the sticker things in the x86 attempt to do the same thing was
> context switching, both between normal tasks and in/out of exceptions
> and interrupts.
>
> The easiest place this manifested for us was code chunk like this:
>
> 	kpkeys_set_level(KPKEYS_LVL_PGTABLES);
> 	// modify page tables here
> 	kpkeys_restore_pkey_reg();
>
> We had to make sure that we didn't get preempted and context switch over
> to some other task that _wasn't_ doing page table manipulation while
> page table writes were allowed.
>
> On x86, we had to basically start context-switching the kernel pkey
> register the same way we do GPRs.
>
> How is SYS_POR_EL0 being context switched?

I think this is pretty much the same situation with POR_EL1 on arm64. I
mentioned on the cover letter that resetting POR_EL1 on exception entry
is required (and not done yet), but in fact as you say it also needs to
be context-switched per-thread. This does sound pretty similar to GPRs
(unlike POR_EL0, which is switched in __switch_to() like the user TLS
register for instance).

Is there a particular concern about that extra switching? I don't expect
it to be a significant cost on arm64. In the vast majority of cases,
POR_EL1 will remain set to its default value, meaning that the overhead
is limited to reading POR_EL1, a load and a branch. The only situation
where an expensive write to POR_EL1 is needed is an interrupt firing
right in the middle of a page table setter - possible but pretty
unlikely. Writing to POR_EL1 on exception return isn't really a concern
either, as no additional barrier (ISB) is required in that case.

By the way thank your for mentioning the x86 attempt, I wasn't aware of
it. I'll have a better look and make sure to Cc anyone involved in that
work in future versions.

- Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-01-13 10:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-08 10:32 [RFC PATCH v2 00/15] pkeys-based page table hardening Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 01/15] mm: Introduce kpkeys Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 02/15] set_memory: Introduce set_memory_pkey() stub Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 03/15] arm64: mm: Enable overlays for all EL1 indirect permissions Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 04/15] arm64: Introduce por_set_pkey_perms() helper Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 05/15] arm64: Implement asm/kpkeys.h using POE Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 06/15] arm64: set_memory: Implement set_memory_pkey() Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 07/15] arm64: Enable kpkeys Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 08/15] mm: Introduce kernel_pgtables_set_pkey() Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 09/15] mm: Introduce kpkeys_hardened_pgtables Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 10/15] mm: Allow __pagetable_ctor() to fail Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 11/15] mm: Map page tables with privileged pkey Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 12/15] arm64: kpkeys: Support KPKEYS_LVL_PGTABLES Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 13/15] arm64: mm: Guard page table writes with kpkeys Kevin Brodsky
2025-01-09  7:17   ` Qi Zheng
2025-01-10 14:05     ` Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 14/15] arm64: Enable kpkeys_hardened_pgtables support Kevin Brodsky
2025-01-08 10:32 ` [RFC PATCH v2 15/15] mm: Add basic tests for kpkeys_hardened_pgtables Kevin Brodsky
2025-01-09 16:30 ` [RFC PATCH v2 00/15] pkeys-based page table hardening Dave Hansen
2025-01-13 10:10   ` Kevin Brodsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox