* [PATCH v6 0/4] rseq: Make rseq work with protection keys
@ 2025-02-27 14:03 Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 1/4] pkeys: add API to switch to permissive/zero pkey register Dmitry Vyukov
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Dmitry Vyukov @ 2025-02-27 14:03 UTC (permalink / raw)
To: mathieu.desnoyers, peterz, boqun.feng, tglx, mingo, bp,
dave.hansen, hpa, aruna.ramakrishna, elver
Cc: Dmitry Vyukov, Paul E. McKenney, x86, linux-kernel
If an application registers rseq, and ever switches to another pkey
protection (such that the rseq becomes inaccessible), then any
context switch will cause failure in __rseq_handle_notify_resume()
attempting to read/write struct rseq and/or rseq_cs. Since context
switches are asynchronous and are outside of the application control
(not part of the restricted code scope), temporarily enable access
to 0 (default) PKEY to read/write rseq/rseq_cs.
0 is the only PKEY supported for rseq for now.
Theoretically other PKEYs can be supported, but it's unclear
how/if that can work. So for now we don't support that to simplify
code.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Dmitry Vyukov (4):
pkeys: add API to switch to permissive/zero pkey register
x86/signal: Use write_permissive_pkey_val() helper
rseq: Make rseq work with protection keys
selftests/rseq: Add test for rseq+pkeys
arch/x86/Kconfig | 1 +
arch/x86/include/asm/pkeys.h | 30 ++++++++
arch/x86/include/asm/pkru.h | 10 ++-
arch/x86/kernel/signal.c | 6 +-
include/linux/pkeys.h | 31 ++++++++
include/uapi/linux/rseq.h | 4 +
kernel/rseq.c | 11 +++
mm/Kconfig | 2 +
tools/testing/selftests/rseq/Makefile | 2 +-
tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
tools/testing/selftests/rseq/rseq.h | 1 +
11 files changed, 188 insertions(+), 8 deletions(-)
create mode 100644 tools/testing/selftests/rseq/pkey_test.c
base-commit: dd83757f6e686a2188997cb58b5975f744bb7786
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v6 1/4] pkeys: add API to switch to permissive/zero pkey register
2025-02-27 14:03 [PATCH v6 0/4] rseq: Make rseq work with protection keys Dmitry Vyukov
@ 2025-02-27 14:03 ` Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 2/4] x86/signal: Use write_permissive_pkey_val() helper Dmitry Vyukov
` (2 subsequent siblings)
3 siblings, 0 replies; 18+ messages in thread
From: Dmitry Vyukov @ 2025-02-27 14:03 UTC (permalink / raw)
To: mathieu.desnoyers, peterz, boqun.feng, tglx, mingo, bp,
dave.hansen, hpa, aruna.ramakrishna, elver
Cc: Dmitry Vyukov, Paul E. McKenney, x86, linux-kernel
The API allows to switch to permissive pkey register that allows accesses
to all PKEYs, and to a value that allows access to the 0 (default) PKEY.
x86 signal delivery already uses switching to permissive PKEY register
value, and rseq needs to allow access to PKEY 0 while accessing
struct rseq/rseq_cs.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
---
Changes in v5:
- Removed leftover dead code in enable_zero_pkey_val
- Clarified commit message
Changes in v4:
- Added Fixes tag
Changes in v3:
- Renamed API functions to write_permissive_pkey_val/write_pkey_val
- Added enable_zero_pkey_val for rseq
- Added Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Changes in v2:
- Fixed typo in commit description
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/pkeys.h | 30 ++++++++++++++++++++++++++++++
arch/x86/include/asm/pkru.h | 10 +++++++---
include/linux/pkeys.h | 31 +++++++++++++++++++++++++++++++
mm/Kconfig | 2 ++
5 files changed, 71 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index be2c311f5118d..43af2840d098f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1881,6 +1881,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
select ARCH_USES_HIGH_VMA_FLAGS
select ARCH_HAS_PKEYS
+ select ARCH_HAS_PERMISSIVE_PKEY
help
Memory Protection Keys provides a mechanism for enforcing
page-based protections, but without requiring modification of the
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 2e6c04d8a45b4..614099766d5f2 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_PKEYS_H
#define _ASM_X86_PKEYS_H
+#include "pkru.h"
+
/*
* If more than 16 keys are ever supported, a thorough audit
* will be necessary to ensure that the types that store key
@@ -123,4 +125,32 @@ static inline int vma_pkey(struct vm_area_struct *vma)
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
}
+typedef u32 pkey_reg_t;
+
+static inline pkey_reg_t write_permissive_pkey_val(void)
+{
+ return write_pkru(0);
+}
+
+static inline pkey_reg_t enable_zero_pkey_val(void)
+{
+ u32 pkru;
+
+ if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
+ return 0;
+ /*
+ * WRPKRU is relatively expensive compared to RDPKRU,
+ * avoid it if possible.
+ */
+ pkru = rdpkru();
+ if ((pkru & (PKRU_AD_BIT|PKRU_WD_BIT)) != 0)
+ wrpkru(pkru & ~(PKRU_AD_BIT|PKRU_WD_BIT));
+ return pkru;
+}
+
+static inline void write_pkey_val(pkey_reg_t val)
+{
+ write_pkru(val);
+}
+
#endif /*_ASM_X86_PKEYS_H */
diff --git a/arch/x86/include/asm/pkru.h b/arch/x86/include/asm/pkru.h
index 74f0a2d34ffdd..b9bf9b7f2753b 100644
--- a/arch/x86/include/asm/pkru.h
+++ b/arch/x86/include/asm/pkru.h
@@ -39,16 +39,20 @@ static inline u32 read_pkru(void)
return 0;
}
-static inline void write_pkru(u32 pkru)
+static inline u32 write_pkru(u32 pkru)
{
+ u32 old_pkru;
+
if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
- return;
+ return 0;
/*
* WRPKRU is relatively expensive compared to RDPKRU.
* Avoid WRPKRU when it would not change the value.
*/
- if (pkru != rdpkru())
+ old_pkru = rdpkru();
+ if (pkru != old_pkru)
wrpkru(pkru);
+ return old_pkru;
}
static inline void pkru_write_default(void)
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 86be8bf27b41b..262d60f6a15f8 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -48,4 +48,35 @@ static inline bool arch_pkeys_enabled(void)
#endif /* ! CONFIG_ARCH_HAS_PKEYS */
+#ifndef CONFIG_ARCH_HAS_PERMISSIVE_PKEY
+
+/*
+ * Common name for value of the register that controls access to PKEYs
+ * (called differently on different arches: PKRU, POR, AMR).
+ */
+typedef char pkey_reg_t;
+
+/*
+ * Sets PKEY access register to the most permissive value that allows
+ * accesses to all PKEYs. Returns the current value of PKEY register.
+ * Code should generally arrange switching back to the old value
+ * using write_pkey_val(old_value).
+ */
+static inline pkey_reg_t write_permissive_pkey_val(void)
+{
+ return 0;
+}
+
+/*
+ * Sets PKEY access register to a value that allows access to the 0 (default)
+ * PKEY. Returns the current value of PKEY register.
+ */
+static inline pkey_reg_t enable_zero_pkey_val(void)
+{
+ return 0;
+}
+
+static inline void write_pkey_val(pkey_reg_t val) {}
+#endif /* ! CONFIG_ARCH_HAS_PERMISSIVE_PKEY */
+
#endif /* _LINUX_PKEYS_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 1b501db064172..9e874f7713a2b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1147,6 +1147,8 @@ config ARCH_USES_HIGH_VMA_FLAGS
bool
config ARCH_HAS_PKEYS
bool
+config ARCH_HAS_PERMISSIVE_PKEY
+ bool
config ARCH_USES_PG_ARCH_2
bool
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 2/4] x86/signal: Use write_permissive_pkey_val() helper
2025-02-27 14:03 [PATCH v6 0/4] rseq: Make rseq work with protection keys Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 1/4] pkeys: add API to switch to permissive/zero pkey register Dmitry Vyukov
@ 2025-02-27 14:03 ` Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 3/4] rseq: Make rseq work with protection keys Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys Dmitry Vyukov
3 siblings, 0 replies; 18+ messages in thread
From: Dmitry Vyukov @ 2025-02-27 14:03 UTC (permalink / raw)
To: mathieu.desnoyers, peterz, boqun.feng, tglx, mingo, bp,
dave.hansen, hpa, aruna.ramakrishna, elver
Cc: Dmitry Vyukov, Paul E. McKenney, x86, linux-kernel
Use the new switch_to_permissive_pkey_reg() helper instead of the
custom code. No functional changes intended.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes in v3:
- restore sig_prepare_pkru with the large comment and
make it call the new write_permissive_pkey_val
---
arch/x86/kernel/signal.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 5f441039b5725..27a66a0697dd2 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -28,6 +28,7 @@
#include <linux/entry-common.h>
#include <linux/syscalls.h>
#include <linux/rseq.h>
+#include <linux/pkeys.h>
#include <asm/processor.h>
#include <asm/ucontext.h>
@@ -72,10 +73,7 @@ static inline int is_x32_frame(struct ksignal *ksig)
*/
static inline u32 sig_prepare_pkru(void)
{
- u32 orig_pkru = read_pkru();
-
- write_pkru(0);
- return orig_pkru;
+ return write_permissive_pkey_val();
}
/*
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 3/4] rseq: Make rseq work with protection keys
2025-02-27 14:03 [PATCH v6 0/4] rseq: Make rseq work with protection keys Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 1/4] pkeys: add API to switch to permissive/zero pkey register Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 2/4] x86/signal: Use write_permissive_pkey_val() helper Dmitry Vyukov
@ 2025-02-27 14:03 ` Dmitry Vyukov
2025-03-08 10:02 ` Dmitry Vyukov
2025-03-10 14:26 ` Mathieu Desnoyers
2025-02-27 14:03 ` [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys Dmitry Vyukov
3 siblings, 2 replies; 18+ messages in thread
From: Dmitry Vyukov @ 2025-02-27 14:03 UTC (permalink / raw)
To: mathieu.desnoyers, peterz, boqun.feng, tglx, mingo, bp,
dave.hansen, hpa, aruna.ramakrishna, elver
Cc: Dmitry Vyukov, Paul E. McKenney, x86, linux-kernel
If an application registers rseq, and ever switches to another pkey
protection (such that the rseq becomes inaccessible), then any
context switch will cause failure in __rseq_handle_notify_resume()
attempting to read/write struct rseq and/or rseq_cs. Since context
switches are asynchronous and are outside of the application control
(not part of the restricted code scope), temporarily switch to
pkey value that allows access to the 0 (default) PKEY.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
---
Changes in v6:
- Added a comment to struct rseq with MPK rules
Changes in v4:
- Added Fixes tag
Changes in v3:
- simplify control flow to always enable access to 0 pkey
Changes in v2:
- fixed typos and reworded the comment
---
include/uapi/linux/rseq.h | 4 ++++
kernel/rseq.c | 11 +++++++++++
2 files changed, 15 insertions(+)
diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
index c233aae5eac90..019fd248cf749 100644
--- a/include/uapi/linux/rseq.h
+++ b/include/uapi/linux/rseq.h
@@ -58,6 +58,10 @@ struct rseq_cs {
* contained within a single cache-line.
*
* A single struct rseq per thread is allowed.
+ *
+ * If struct rseq or struct rseq_cs is used with Memory Protection Keys,
+ * then the assigned pkey should either be accessible whenever these structs
+ * are registered/installed, or they should be protected with pkey 0.
*/
struct rseq {
/*
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 2cb16091ec0ae..9d9c976d3b78c 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -10,6 +10,7 @@
#include <linux/sched.h>
#include <linux/uaccess.h>
+#include <linux/pkeys.h>
#include <linux/syscalls.h>
#include <linux/rseq.h>
#include <linux/types.h>
@@ -402,11 +403,19 @@ static int rseq_ip_fixup(struct pt_regs *regs)
void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
{
struct task_struct *t = current;
+ pkey_reg_t saved_pkey;
int ret, sig;
if (unlikely(t->flags & PF_EXITING))
return;
+ /*
+ * Enable access to the default (0) pkey in case the thread has
+ * currently disabled access to it and struct rseq/rseq_cs has
+ * 0 pkey assigned (the only supported value for now).
+ */
+ saved_pkey = enable_zero_pkey_val();
+
/*
* regs is NULL if and only if the caller is in a syscall path. Skip
* fixup and leave rseq_cs as is so that rseq_sycall() will detect and
@@ -419,9 +428,11 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
}
if (unlikely(rseq_update_cpu_node_id(t)))
goto error;
+ write_pkey_val(saved_pkey);
return;
error:
+ write_pkey_val(saved_pkey);
sig = ksig ? ksig->sig : 0;
force_sigsegv(sig);
}
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-02-27 14:03 [PATCH v6 0/4] rseq: Make rseq work with protection keys Dmitry Vyukov
` (2 preceding siblings ...)
2025-02-27 14:03 ` [PATCH v6 3/4] rseq: Make rseq work with protection keys Dmitry Vyukov
@ 2025-02-27 14:03 ` Dmitry Vyukov
2025-03-10 14:30 ` Mathieu Desnoyers
3 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2025-02-27 14:03 UTC (permalink / raw)
To: mathieu.desnoyers, peterz, boqun.feng, tglx, mingo, bp,
dave.hansen, hpa, aruna.ramakrishna, elver
Cc: Dmitry Vyukov, Paul E. McKenney, x86, linux-kernel
Add a test that ensures that PKEY-protected struct rseq_cs
works and does not lead to process kills.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
---
Changes in v5:
- Use static for variables/functions
- Use RSEQ_READ/WRITE_ONCE instead of volatile
Changes in v4:
- Added Fixes tag
Changes in v3:
- added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
- rework the test to work when only pkey 0 is supported for rseq
Changes in v2:
- change test to install protected rseq_cs instead of rseq
---
tools/testing/selftests/rseq/Makefile | 2 +-
tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
tools/testing/selftests/rseq/rseq.h | 1 +
3 files changed, 100 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..9111d25fea3af 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
new file mode 100644
index 0000000000000..cc4dd98190942
--- /dev/null
+++ b/tools/testing/selftests/rseq/pkey_test.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: LGPL-2.1
+/*
+ * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
+ */
+
+#define _GNU_SOURCE
+#include <err.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <ucontext.h>
+#include <unistd.h>
+
+#include "rseq.h"
+#include "rseq-abi.h"
+
+static int pkey;
+static ucontext_t ucp0, ucp1;
+
+static void coroutine(void)
+{
+ int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
+ /*
+ * When we disable access to pkey 0, globals and TLS become
+ * inaccessible too, so we need to tread carefully.
+ * Pkey is global so we need to copy it onto the stack.
+ */
+ int pk = RSEQ_READ_ONCE(pkey);
+ struct timespec ts;
+
+ orig_pk0 = pkey_get(0);
+ if (pkey_set(0, PKEY_DISABLE_ACCESS))
+ err(1, "pkey_set failed");
+ old_pk0 = pkey_get(0);
+ old_pk1 = pkey_get(pk);
+
+ /*
+ * Prevent compiler from initializing it by loading a 16-global.
+ */
+ RSEQ_WRITE_ONCE(ts.tv_sec, 0);
+ RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
+ /*
+ * If the kernel misbehaves, context switches in the following loop
+ * will terminate the process with SIGSEGV.
+ * Trigger preemption w/o accessing TLS.
+ * Note that glibc's usleep touches errno always.
+ */
+ for (i = 0; i < 10; i++)
+ syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
+
+ pk0 = pkey_get(0);
+ pk1 = pkey_get(pk);
+ if (pkey_set(0, orig_pk0))
+ err(1, "pkey_set failed");
+
+ /*
+ * Ensure that the kernel has restored the previous value of pkeys
+ * register after changing them.
+ */
+ if (old_pk0 != pk0)
+ errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
+ if (old_pk1 != pk1)
+ errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
+
+ swapcontext(&ucp1, &ucp0);
+ abort();
+}
+
+int main(int argc, char **argv)
+{
+ pkey = pkey_alloc(0, 0);
+ if (pkey == -1) {
+ printf("[SKIP]\tKernel does not support PKEYs: %s\n",
+ strerror(errno));
+ return 0;
+ }
+
+ if (rseq_register_current_thread())
+ err(1, "rseq_register_current_thread failed");
+
+ if (getcontext(&ucp1))
+ err(1, "getcontext failed");
+ ucp1.uc_stack.ss_size = getpagesize() * 4;
+ ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
+ PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
+ if (ucp1.uc_stack.ss_sp == MAP_FAILED)
+ err(1, "mmap failed");
+ if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
+ PROT_READ | PROT_WRITE, pkey))
+ err(1, "pkey_mprotect failed");
+ makecontext(&ucp1, coroutine, 0);
+ if (swapcontext(&ucp0, &ucp1))
+ err(1, "swapcontext failed");
+ return 0;
+}
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
index ba424ce80a719..65da4a727c550 100644
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -8,6 +8,7 @@
#ifndef RSEQ_H
#define RSEQ_H
+#include <assert.h>
#include <stdint.h>
#include <stdbool.h>
#include <pthread.h>
--
2.48.1.658.g4767266eb4-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] rseq: Make rseq work with protection keys
2025-02-27 14:03 ` [PATCH v6 3/4] rseq: Make rseq work with protection keys Dmitry Vyukov
@ 2025-03-08 10:02 ` Dmitry Vyukov
2025-03-10 14:31 ` Mathieu Desnoyers
2025-03-10 14:26 ` Mathieu Desnoyers
1 sibling, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2025-03-08 10:02 UTC (permalink / raw)
To: mathieu.desnoyers, peterz, boqun.feng, tglx, mingo, bp,
dave.hansen, hpa, aruna.ramakrishna, elver
Cc: Paul E. McKenney, x86, linux-kernel
On Thu, 27 Feb 2025 at 15:03, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> If an application registers rseq, and ever switches to another pkey
> protection (such that the rseq becomes inaccessible), then any
> context switch will cause failure in __rseq_handle_notify_resume()
> attempting to read/write struct rseq and/or rseq_cs. Since context
> switches are asynchronous and are outside of the application control
> (not part of the restricted code scope), temporarily switch to
> pkey value that allows access to the 0 (default) PKEY.
>
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
Any remaining concerns with this series?
What tree should it go into?
> ---
> Changes in v6:
> - Added a comment to struct rseq with MPK rules
>
> Changes in v4:
> - Added Fixes tag
>
> Changes in v3:
> - simplify control flow to always enable access to 0 pkey
>
> Changes in v2:
> - fixed typos and reworded the comment
> ---
> include/uapi/linux/rseq.h | 4 ++++
> kernel/rseq.c | 11 +++++++++++
> 2 files changed, 15 insertions(+)
>
> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
> index c233aae5eac90..019fd248cf749 100644
> --- a/include/uapi/linux/rseq.h
> +++ b/include/uapi/linux/rseq.h
> @@ -58,6 +58,10 @@ struct rseq_cs {
> * contained within a single cache-line.
> *
> * A single struct rseq per thread is allowed.
> + *
> + * If struct rseq or struct rseq_cs is used with Memory Protection Keys,
> + * then the assigned pkey should either be accessible whenever these structs
> + * are registered/installed, or they should be protected with pkey 0.
> */
> struct rseq {
> /*
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 2cb16091ec0ae..9d9c976d3b78c 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -10,6 +10,7 @@
>
> #include <linux/sched.h>
> #include <linux/uaccess.h>
> +#include <linux/pkeys.h>
> #include <linux/syscalls.h>
> #include <linux/rseq.h>
> #include <linux/types.h>
> @@ -402,11 +403,19 @@ static int rseq_ip_fixup(struct pt_regs *regs)
> void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
> {
> struct task_struct *t = current;
> + pkey_reg_t saved_pkey;
> int ret, sig;
>
> if (unlikely(t->flags & PF_EXITING))
> return;
>
> + /*
> + * Enable access to the default (0) pkey in case the thread has
> + * currently disabled access to it and struct rseq/rseq_cs has
> + * 0 pkey assigned (the only supported value for now).
> + */
> + saved_pkey = enable_zero_pkey_val();
> +
> /*
> * regs is NULL if and only if the caller is in a syscall path. Skip
> * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
> @@ -419,9 +428,11 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
> }
> if (unlikely(rseq_update_cpu_node_id(t)))
> goto error;
> + write_pkey_val(saved_pkey);
> return;
>
> error:
> + write_pkey_val(saved_pkey);
> sig = ksig ? ksig->sig : 0;
> force_sigsegv(sig);
> }
> --
> 2.48.1.658.g4767266eb4-goog
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] rseq: Make rseq work with protection keys
2025-02-27 14:03 ` [PATCH v6 3/4] rseq: Make rseq work with protection keys Dmitry Vyukov
2025-03-08 10:02 ` Dmitry Vyukov
@ 2025-03-10 14:26 ` Mathieu Desnoyers
1 sibling, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2025-03-10 14:26 UTC (permalink / raw)
To: Dmitry Vyukov, peterz, boqun.feng, tglx, mingo, bp, dave.hansen,
hpa, aruna.ramakrishna, elver
Cc: Paul E. McKenney, x86, linux-kernel
On 2025-02-27 09:03, Dmitry Vyukov wrote:
> If an application registers rseq, and ever switches to another pkey
> protection (such that the rseq becomes inaccessible), then any
> context switch will cause failure in __rseq_handle_notify_resume()
> attempting to read/write struct rseq and/or rseq_cs. Since context
> switches are asynchronous and are outside of the application control
> (not part of the restricted code scope), temporarily switch to
> pkey value that allows access to the 0 (default) PKEY.
>
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>
> ---
> Changes in v6:
> - Added a comment to struct rseq with MPK rules
>
> Changes in v4:
> - Added Fixes tag
>
> Changes in v3:
> - simplify control flow to always enable access to 0 pkey
>
> Changes in v2:
> - fixed typos and reworded the comment
> ---
> include/uapi/linux/rseq.h | 4 ++++
> kernel/rseq.c | 11 +++++++++++
> 2 files changed, 15 insertions(+)
>
> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
> index c233aae5eac90..019fd248cf749 100644
> --- a/include/uapi/linux/rseq.h
> +++ b/include/uapi/linux/rseq.h
> @@ -58,6 +58,10 @@ struct rseq_cs {
> * contained within a single cache-line.
> *
> * A single struct rseq per thread is allowed.
> + *
> + * If struct rseq or struct rseq_cs is used with Memory Protection Keys,
> + * then the assigned pkey should either be accessible whenever these structs
> + * are registered/installed, or they should be protected with pkey 0.
> */
> struct rseq {
> /*
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 2cb16091ec0ae..9d9c976d3b78c 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -10,6 +10,7 @@
>
> #include <linux/sched.h>
> #include <linux/uaccess.h>
> +#include <linux/pkeys.h>
> #include <linux/syscalls.h>
> #include <linux/rseq.h>
> #include <linux/types.h>
> @@ -402,11 +403,19 @@ static int rseq_ip_fixup(struct pt_regs *regs)
> void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
> {
> struct task_struct *t = current;
> + pkey_reg_t saved_pkey;
> int ret, sig;
>
> if (unlikely(t->flags & PF_EXITING))
> return;
>
> + /*
> + * Enable access to the default (0) pkey in case the thread has
> + * currently disabled access to it and struct rseq/rseq_cs has
> + * 0 pkey assigned (the only supported value for now).
> + */
> + saved_pkey = enable_zero_pkey_val();
> +
> /*
> * regs is NULL if and only if the caller is in a syscall path. Skip
> * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
> @@ -419,9 +428,11 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
> }
> if (unlikely(rseq_update_cpu_node_id(t)))
> goto error;
> + write_pkey_val(saved_pkey);
> return;
>
> error:
> + write_pkey_val(saved_pkey);
> sig = ksig ? ksig->sig : 0;
> force_sigsegv(sig);
> }
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-02-27 14:03 ` [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys Dmitry Vyukov
@ 2025-03-10 14:30 ` Mathieu Desnoyers
2025-03-10 14:36 ` Dmitry Vyukov
0 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2025-03-10 14:30 UTC (permalink / raw)
To: Dmitry Vyukov, peterz, boqun.feng, tglx, mingo, bp, dave.hansen,
hpa, aruna.ramakrishna, elver
Cc: Paul E. McKenney, x86, linux-kernel
On 2025-02-27 09:03, Dmitry Vyukov wrote:
> Add a test that ensures that PKEY-protected struct rseq_cs
> works and does not lead to process kills.
>
> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>
> ---
> Changes in v5:
> - Use static for variables/functions
> - Use RSEQ_READ/WRITE_ONCE instead of volatile
>
> Changes in v4:
> - Added Fixes tag
>
> Changes in v3:
> - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> - rework the test to work when only pkey 0 is supported for rseq
>
> Changes in v2:
> - change test to install protected rseq_cs instead of rseq
> ---
> tools/testing/selftests/rseq/Makefile | 2 +-
> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> tools/testing/selftests/rseq/rseq.h | 1 +
> 3 files changed, 100 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> index 5a3432fceb586..9111d25fea3af 100644
> --- a/tools/testing/selftests/rseq/Makefile
> +++ b/tools/testing/selftests/rseq/Makefile
> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>
> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> param_test_benchmark param_test_compare_twice param_test_mm_cid \
> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>
> TEST_GEN_PROGS_EXTENDED = librseq.so
>
> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> new file mode 100644
> index 0000000000000..cc4dd98190942
> --- /dev/null
> +++ b/tools/testing/selftests/rseq/pkey_test.c
> @@ -0,0 +1,98 @@
> +// SPDX-License-Identifier: LGPL-2.1
> +/*
> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> + */
> +
> +#define _GNU_SOURCE
> +#include <err.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +#include <sys/syscall.h>
> +#include <ucontext.h>
> +#include <unistd.h>
> +
> +#include "rseq.h"
> +#include "rseq-abi.h"
> +
> +static int pkey;
> +static ucontext_t ucp0, ucp1;
> +
> +static void coroutine(void)
> +{
> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> + /*
> + * When we disable access to pkey 0, globals and TLS become
> + * inaccessible too, so we need to tread carefully.
> + * Pkey is global so we need to copy it onto the stack.
> + */
> + int pk = RSEQ_READ_ONCE(pkey);
> + struct timespec ts;
> +
> + orig_pk0 = pkey_get(0);
> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
> + err(1, "pkey_set failed");
> + old_pk0 = pkey_get(0);
> + old_pk1 = pkey_get(pk);
> +
> + /*
> + * Prevent compiler from initializing it by loading a 16-global.
> + */
> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> + /*
> + * If the kernel misbehaves, context switches in the following loop
> + * will terminate the process with SIGSEGV.
> + * Trigger preemption w/o accessing TLS.
> + * Note that glibc's usleep touches errno always.
> + */
> + for (i = 0; i < 10; i++)
> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> +
> + pk0 = pkey_get(0);
> + pk1 = pkey_get(pk);
> + if (pkey_set(0, orig_pk0))
> + err(1, "pkey_set failed");
> +
> + /*
> + * Ensure that the kernel has restored the previous value of pkeys
> + * register after changing them.
> + */
> + if (old_pk0 != pk0)
> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> + if (old_pk1 != pk1)
> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> +
> + swapcontext(&ucp1, &ucp0);
> + abort();
> +}
> +
> +int main(int argc, char **argv)
> +{
> + pkey = pkey_alloc(0, 0);
> + if (pkey == -1) {
> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> + strerror(errno));
> + return 0;
> + }
> +
> + if (rseq_register_current_thread())
> + err(1, "rseq_register_current_thread failed");
> +
> + if (getcontext(&ucp1))
> + err(1, "getcontext failed");
> + ucp1.uc_stack.ss_size = getpagesize() * 4;
> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> + err(1, "mmap failed");
> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> + PROT_READ | PROT_WRITE, pkey))
> + err(1, "pkey_mprotect failed");
> + makecontext(&ucp1, coroutine, 0);
> + if (swapcontext(&ucp0, &ucp1))
> + err(1, "swapcontext failed");
Should the rseq register be paired with a rseq unregister here ?
Thanks,
Mathieu
> + return 0;
> +}
> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> index ba424ce80a719..65da4a727c550 100644
> --- a/tools/testing/selftests/rseq/rseq.h
> +++ b/tools/testing/selftests/rseq/rseq.h
> @@ -8,6 +8,7 @@
> #ifndef RSEQ_H
> #define RSEQ_H
>
> +#include <assert.h>
> #include <stdint.h>
> #include <stdbool.h>
> #include <pthread.h>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] rseq: Make rseq work with protection keys
2025-03-08 10:02 ` Dmitry Vyukov
@ 2025-03-10 14:31 ` Mathieu Desnoyers
2025-03-10 14:39 ` Dmitry Vyukov
0 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2025-03-10 14:31 UTC (permalink / raw)
To: Dmitry Vyukov, peterz, boqun.feng, tglx, mingo, bp, dave.hansen,
hpa, aruna.ramakrishna, elver
Cc: Paul E. McKenney, x86, linux-kernel
On 2025-03-08 05:02, Dmitry Vyukov wrote:
> On Thu, 27 Feb 2025 at 15:03, Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> If an application registers rseq, and ever switches to another pkey
>> protection (such that the rseq becomes inaccessible), then any
>> context switch will cause failure in __rseq_handle_notify_resume()
>> attempting to read/write struct rseq and/or rseq_cs. Since context
>> switches are asynchronous and are outside of the application control
>> (not part of the restricted code scope), temporarily switch to
>> pkey value that allows access to the 0 (default) PKEY.
>>
>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
>> Cc: Boqun Feng <boqun.feng@gmail.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>> Cc: x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>
> Any remaining concerns with this series?
>
> What tree should it go into?
Usually the rseq bits go through the -tip tree.
Thanks,
Mathieu
>
>> ---
>> Changes in v6:
>> - Added a comment to struct rseq with MPK rules
>>
>> Changes in v4:
>> - Added Fixes tag
>>
>> Changes in v3:
>> - simplify control flow to always enable access to 0 pkey
>>
>> Changes in v2:
>> - fixed typos and reworded the comment
>> ---
>> include/uapi/linux/rseq.h | 4 ++++
>> kernel/rseq.c | 11 +++++++++++
>> 2 files changed, 15 insertions(+)
>>
>> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
>> index c233aae5eac90..019fd248cf749 100644
>> --- a/include/uapi/linux/rseq.h
>> +++ b/include/uapi/linux/rseq.h
>> @@ -58,6 +58,10 @@ struct rseq_cs {
>> * contained within a single cache-line.
>> *
>> * A single struct rseq per thread is allowed.
>> + *
>> + * If struct rseq or struct rseq_cs is used with Memory Protection Keys,
>> + * then the assigned pkey should either be accessible whenever these structs
>> + * are registered/installed, or they should be protected with pkey 0.
>> */
>> struct rseq {
>> /*
>> diff --git a/kernel/rseq.c b/kernel/rseq.c
>> index 2cb16091ec0ae..9d9c976d3b78c 100644
>> --- a/kernel/rseq.c
>> +++ b/kernel/rseq.c
>> @@ -10,6 +10,7 @@
>>
>> #include <linux/sched.h>
>> #include <linux/uaccess.h>
>> +#include <linux/pkeys.h>
>> #include <linux/syscalls.h>
>> #include <linux/rseq.h>
>> #include <linux/types.h>
>> @@ -402,11 +403,19 @@ static int rseq_ip_fixup(struct pt_regs *regs)
>> void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
>> {
>> struct task_struct *t = current;
>> + pkey_reg_t saved_pkey;
>> int ret, sig;
>>
>> if (unlikely(t->flags & PF_EXITING))
>> return;
>>
>> + /*
>> + * Enable access to the default (0) pkey in case the thread has
>> + * currently disabled access to it and struct rseq/rseq_cs has
>> + * 0 pkey assigned (the only supported value for now).
>> + */
>> + saved_pkey = enable_zero_pkey_val();
>> +
>> /*
>> * regs is NULL if and only if the caller is in a syscall path. Skip
>> * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
>> @@ -419,9 +428,11 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
>> }
>> if (unlikely(rseq_update_cpu_node_id(t)))
>> goto error;
>> + write_pkey_val(saved_pkey);
>> return;
>>
>> error:
>> + write_pkey_val(saved_pkey);
>> sig = ksig ? ksig->sig : 0;
>> force_sigsegv(sig);
>> }
>> --
>> 2.48.1.658.g4767266eb4-goog
>>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-03-10 14:30 ` Mathieu Desnoyers
@ 2025-03-10 14:36 ` Dmitry Vyukov
2025-03-10 14:38 ` Mathieu Desnoyers
0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2025-03-10 14:36 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: peterz, boqun.feng, tglx, mingo, bp, dave.hansen, hpa,
aruna.ramakrishna, elver, Paul E. McKenney, x86, linux-kernel
On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> > Add a test that ensures that PKEY-protected struct rseq_cs
> > works and does not lead to process kills.
> >
> > Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > Cc: Boqun Feng <boqun.feng@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >
> > ---
> > Changes in v5:
> > - Use static for variables/functions
> > - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >
> > Changes in v4:
> > - Added Fixes tag
> >
> > Changes in v3:
> > - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> > - rework the test to work when only pkey 0 is supported for rseq
> >
> > Changes in v2:
> > - change test to install protected rseq_cs instead of rseq
> > ---
> > tools/testing/selftests/rseq/Makefile | 2 +-
> > tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> > tools/testing/selftests/rseq/rseq.h | 1 +
> > 3 files changed, 100 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> > index 5a3432fceb586..9111d25fea3af 100644
> > --- a/tools/testing/selftests/rseq/Makefile
> > +++ b/tools/testing/selftests/rseq/Makefile
> > @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >
> > TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> > param_test_benchmark param_test_compare_twice param_test_mm_cid \
> > - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> > + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >
> > TEST_GEN_PROGS_EXTENDED = librseq.so
> >
> > diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> > new file mode 100644
> > index 0000000000000..cc4dd98190942
> > --- /dev/null
> > +++ b/tools/testing/selftests/rseq/pkey_test.c
> > @@ -0,0 +1,98 @@
> > +// SPDX-License-Identifier: LGPL-2.1
> > +/*
> > + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> > + */
> > +
> > +#define _GNU_SOURCE
> > +#include <err.h>
> > +#include <errno.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <sys/mman.h>
> > +#include <sys/syscall.h>
> > +#include <ucontext.h>
> > +#include <unistd.h>
> > +
> > +#include "rseq.h"
> > +#include "rseq-abi.h"
> > +
> > +static int pkey;
> > +static ucontext_t ucp0, ucp1;
> > +
> > +static void coroutine(void)
> > +{
> > + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> > + /*
> > + * When we disable access to pkey 0, globals and TLS become
> > + * inaccessible too, so we need to tread carefully.
> > + * Pkey is global so we need to copy it onto the stack.
> > + */
> > + int pk = RSEQ_READ_ONCE(pkey);
> > + struct timespec ts;
> > +
> > + orig_pk0 = pkey_get(0);
> > + if (pkey_set(0, PKEY_DISABLE_ACCESS))
> > + err(1, "pkey_set failed");
> > + old_pk0 = pkey_get(0);
> > + old_pk1 = pkey_get(pk);
> > +
> > + /*
> > + * Prevent compiler from initializing it by loading a 16-global.
> > + */
> > + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> > + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> > + /*
> > + * If the kernel misbehaves, context switches in the following loop
> > + * will terminate the process with SIGSEGV.
> > + * Trigger preemption w/o accessing TLS.
> > + * Note that glibc's usleep touches errno always.
> > + */
> > + for (i = 0; i < 10; i++)
> > + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> > +
> > + pk0 = pkey_get(0);
> > + pk1 = pkey_get(pk);
> > + if (pkey_set(0, orig_pk0))
> > + err(1, "pkey_set failed");
> > +
> > + /*
> > + * Ensure that the kernel has restored the previous value of pkeys
> > + * register after changing them.
> > + */
> > + if (old_pk0 != pk0)
> > + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> > + if (old_pk1 != pk1)
> > + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> > +
> > + swapcontext(&ucp1, &ucp0);
> > + abort();
> > +}
> > +
> > +int main(int argc, char **argv)
> > +{
> > + pkey = pkey_alloc(0, 0);
> > + if (pkey == -1) {
> > + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> > + strerror(errno));
> > + return 0;
> > + }
> > +
> > + if (rseq_register_current_thread())
> > + err(1, "rseq_register_current_thread failed");
> > +
> > + if (getcontext(&ucp1))
> > + err(1, "getcontext failed");
> > + ucp1.uc_stack.ss_size = getpagesize() * 4;
> > + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> > + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> > + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> > + err(1, "mmap failed");
> > + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> > + PROT_READ | PROT_WRITE, pkey))
> > + err(1, "pkey_mprotect failed");
> > + makecontext(&ucp1, coroutine, 0);
> > + if (swapcontext(&ucp0, &ucp1))
> > + err(1, "swapcontext failed");
>
> Should the rseq register be paired with a rseq unregister here ?
I dunno. It's necessary for this test and in general. Tcmalloc won't
unregister on thread exit. Even for a libc it may be a bad idea due to
signals.
> Thanks,
>
> Mathieu
>
> > + return 0;
> > +}
> > diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> > index ba424ce80a719..65da4a727c550 100644
> > --- a/tools/testing/selftests/rseq/rseq.h
> > +++ b/tools/testing/selftests/rseq/rseq.h
> > @@ -8,6 +8,7 @@
> > #ifndef RSEQ_H
> > #define RSEQ_H
> >
> > +#include <assert.h>
> > #include <stdint.h>
> > #include <stdbool.h>
> > #include <pthread.h>
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-03-10 14:36 ` Dmitry Vyukov
@ 2025-03-10 14:38 ` Mathieu Desnoyers
2025-03-10 14:43 ` Dmitry Vyukov
0 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2025-03-10 14:38 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: peterz, boqun.feng, tglx, mingo, bp, dave.hansen, hpa,
aruna.ramakrishna, elver, Paul E. McKenney, x86, linux-kernel
On 2025-03-10 10:36, Dmitry Vyukov wrote:
> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
>>> Add a test that ensures that PKEY-protected struct rseq_cs
>>> works and does not lead to process kills.
>>>
>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Borislav Petkov <bp@alien8.de>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>> Cc: x86@kernel.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>
>>> ---
>>> Changes in v5:
>>> - Use static for variables/functions
>>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
>>>
>>> Changes in v4:
>>> - Added Fixes tag
>>>
>>> Changes in v3:
>>> - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>> - rework the test to work when only pkey 0 is supported for rseq
>>>
>>> Changes in v2:
>>> - change test to install protected rseq_cs instead of rseq
>>> ---
>>> tools/testing/selftests/rseq/Makefile | 2 +-
>>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
>>> tools/testing/selftests/rseq/rseq.h | 1 +
>>> 3 files changed, 100 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
>>> index 5a3432fceb586..9111d25fea3af 100644
>>> --- a/tools/testing/selftests/rseq/Makefile
>>> +++ b/tools/testing/selftests/rseq/Makefile
>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>>>
>>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
>>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
>>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>>>
>>> TEST_GEN_PROGS_EXTENDED = librseq.so
>>>
>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
>>> new file mode 100644
>>> index 0000000000000..cc4dd98190942
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
>>> @@ -0,0 +1,98 @@
>>> +// SPDX-License-Identifier: LGPL-2.1
>>> +/*
>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
>>> + */
>>> +
>>> +#define _GNU_SOURCE
>>> +#include <err.h>
>>> +#include <errno.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <string.h>
>>> +#include <sys/mman.h>
>>> +#include <sys/syscall.h>
>>> +#include <ucontext.h>
>>> +#include <unistd.h>
>>> +
>>> +#include "rseq.h"
>>> +#include "rseq-abi.h"
>>> +
>>> +static int pkey;
>>> +static ucontext_t ucp0, ucp1;
>>> +
>>> +static void coroutine(void)
>>> +{
>>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
>>> + /*
>>> + * When we disable access to pkey 0, globals and TLS become
>>> + * inaccessible too, so we need to tread carefully.
>>> + * Pkey is global so we need to copy it onto the stack.
>>> + */
>>> + int pk = RSEQ_READ_ONCE(pkey);
>>> + struct timespec ts;
>>> +
>>> + orig_pk0 = pkey_get(0);
>>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
>>> + err(1, "pkey_set failed");
>>> + old_pk0 = pkey_get(0);
>>> + old_pk1 = pkey_get(pk);
>>> +
>>> + /*
>>> + * Prevent compiler from initializing it by loading a 16-global.
>>> + */
>>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
>>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
>>> + /*
>>> + * If the kernel misbehaves, context switches in the following loop
>>> + * will terminate the process with SIGSEGV.
>>> + * Trigger preemption w/o accessing TLS.
>>> + * Note that glibc's usleep touches errno always.
>>> + */
>>> + for (i = 0; i < 10; i++)
>>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
>>> +
>>> + pk0 = pkey_get(0);
>>> + pk1 = pkey_get(pk);
>>> + if (pkey_set(0, orig_pk0))
>>> + err(1, "pkey_set failed");
>>> +
>>> + /*
>>> + * Ensure that the kernel has restored the previous value of pkeys
>>> + * register after changing them.
>>> + */
>>> + if (old_pk0 != pk0)
>>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
>>> + if (old_pk1 != pk1)
>>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
>>> +
>>> + swapcontext(&ucp1, &ucp0);
>>> + abort();
>>> +}
>>> +
>>> +int main(int argc, char **argv)
>>> +{
>>> + pkey = pkey_alloc(0, 0);
>>> + if (pkey == -1) {
>>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
>>> + strerror(errno));
>>> + return 0;
>>> + }
>>> +
>>> + if (rseq_register_current_thread())
>>> + err(1, "rseq_register_current_thread failed");
>>> +
>>> + if (getcontext(&ucp1))
>>> + err(1, "getcontext failed");
>>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
>>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
>>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
>>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
>>> + err(1, "mmap failed");
>>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
>>> + PROT_READ | PROT_WRITE, pkey))
>>> + err(1, "pkey_mprotect failed");
>>> + makecontext(&ucp1, coroutine, 0);
>>> + if (swapcontext(&ucp0, &ucp1))
>>> + err(1, "swapcontext failed");
>>
>> Should the rseq register be paired with a rseq unregister here ?
>
> I dunno. It's necessary for this test and in general. Tcmalloc won't
> unregister on thread exit. Even for a libc it may be a bad idea due to
> signals.
The rseq register/unregister is only for the case where libc does not
support rseq. But if you want to use rseq_register_current_thread(),
then you'll want to pair it with unregister.
Thanks,
Mathieu
>
>> Thanks,
>>
>> Mathieu
>>
>>> + return 0;
>>> +}
>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
>>> index ba424ce80a719..65da4a727c550 100644
>>> --- a/tools/testing/selftests/rseq/rseq.h
>>> +++ b/tools/testing/selftests/rseq/rseq.h
>>> @@ -8,6 +8,7 @@
>>> #ifndef RSEQ_H
>>> #define RSEQ_H
>>>
>>> +#include <assert.h>
>>> #include <stdint.h>
>>> #include <stdbool.h>
>>> #include <pthread.h>
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] rseq: Make rseq work with protection keys
2025-03-10 14:31 ` Mathieu Desnoyers
@ 2025-03-10 14:39 ` Dmitry Vyukov
2025-03-10 18:51 ` Ingo Molnar
0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2025-03-10 14:39 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: peterz, boqun.feng, tglx, mingo, bp, dave.hansen, hpa,
aruna.ramakrishna, elver, Paul E. McKenney, x86, linux-kernel
On Mon, 10 Mar 2025 at 15:31, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-03-08 05:02, Dmitry Vyukov wrote:
> > On Thu, 27 Feb 2025 at 15:03, Dmitry Vyukov <dvyukov@google.com> wrote:
> >>
> >> If an application registers rseq, and ever switches to another pkey
> >> protection (such that the rseq becomes inaccessible), then any
> >> context switch will cause failure in __rseq_handle_notify_resume()
> >> attempting to read/write struct rseq and/or rseq_cs. Since context
> >> switches are asynchronous and are outside of the application control
> >> (not part of the restricted code scope), temporarily switch to
> >> pkey value that allows access to the 0 (default) PKEY.
> >>
> >> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> >> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> >> Cc: Boqun Feng <boqun.feng@gmail.com>
> >> Cc: Thomas Gleixner <tglx@linutronix.de>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: Borislav Petkov <bp@alien8.de>
> >> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> >> Cc: x86@kernel.org
> >> Cc: linux-kernel@vger.kernel.org
> >> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >
> > Any remaining concerns with this series?
> >
> > What tree should it go into?
>
> Usually the rseq bits go through the -tip tree.
Thomas, Ingo, can you please take this to -tip tree? Or who would that be?
> Thanks,
>
> Mathieu
>
> >
> >> ---
> >> Changes in v6:
> >> - Added a comment to struct rseq with MPK rules
> >>
> >> Changes in v4:
> >> - Added Fixes tag
> >>
> >> Changes in v3:
> >> - simplify control flow to always enable access to 0 pkey
> >>
> >> Changes in v2:
> >> - fixed typos and reworded the comment
> >> ---
> >> include/uapi/linux/rseq.h | 4 ++++
> >> kernel/rseq.c | 11 +++++++++++
> >> 2 files changed, 15 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
> >> index c233aae5eac90..019fd248cf749 100644
> >> --- a/include/uapi/linux/rseq.h
> >> +++ b/include/uapi/linux/rseq.h
> >> @@ -58,6 +58,10 @@ struct rseq_cs {
> >> * contained within a single cache-line.
> >> *
> >> * A single struct rseq per thread is allowed.
> >> + *
> >> + * If struct rseq or struct rseq_cs is used with Memory Protection Keys,
> >> + * then the assigned pkey should either be accessible whenever these structs
> >> + * are registered/installed, or they should be protected with pkey 0.
> >> */
> >> struct rseq {
> >> /*
> >> diff --git a/kernel/rseq.c b/kernel/rseq.c
> >> index 2cb16091ec0ae..9d9c976d3b78c 100644
> >> --- a/kernel/rseq.c
> >> +++ b/kernel/rseq.c
> >> @@ -10,6 +10,7 @@
> >>
> >> #include <linux/sched.h>
> >> #include <linux/uaccess.h>
> >> +#include <linux/pkeys.h>
> >> #include <linux/syscalls.h>
> >> #include <linux/rseq.h>
> >> #include <linux/types.h>
> >> @@ -402,11 +403,19 @@ static int rseq_ip_fixup(struct pt_regs *regs)
> >> void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
> >> {
> >> struct task_struct *t = current;
> >> + pkey_reg_t saved_pkey;
> >> int ret, sig;
> >>
> >> if (unlikely(t->flags & PF_EXITING))
> >> return;
> >>
> >> + /*
> >> + * Enable access to the default (0) pkey in case the thread has
> >> + * currently disabled access to it and struct rseq/rseq_cs has
> >> + * 0 pkey assigned (the only supported value for now).
> >> + */
> >> + saved_pkey = enable_zero_pkey_val();
> >> +
> >> /*
> >> * regs is NULL if and only if the caller is in a syscall path. Skip
> >> * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
> >> @@ -419,9 +428,11 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
> >> }
> >> if (unlikely(rseq_update_cpu_node_id(t)))
> >> goto error;
> >> + write_pkey_val(saved_pkey);
> >> return;
> >>
> >> error:
> >> + write_pkey_val(saved_pkey);
> >> sig = ksig ? ksig->sig : 0;
> >> force_sigsegv(sig);
> >> }
> >> --
> >> 2.48.1.658.g4767266eb4-goog
> >>
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-03-10 14:38 ` Mathieu Desnoyers
@ 2025-03-10 14:43 ` Dmitry Vyukov
2025-03-10 15:41 ` Mathieu Desnoyers
0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2025-03-10 14:43 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: peterz, boqun.feng, tglx, mingo, bp, dave.hansen, hpa,
aruna.ramakrishna, elver, Paul E. McKenney, x86, linux-kernel
On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >>
> >> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>> works and does not lead to process kills.
> >>>
> >>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> >>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> >>> Cc: Boqun Feng <boqun.feng@gmail.com>
> >>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>> Cc: Ingo Molnar <mingo@redhat.com>
> >>> Cc: Borislav Petkov <bp@alien8.de>
> >>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> >>> Cc: x86@kernel.org
> >>> Cc: linux-kernel@vger.kernel.org
> >>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>
> >>> ---
> >>> Changes in v5:
> >>> - Use static for variables/functions
> >>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>
> >>> Changes in v4:
> >>> - Added Fixes tag
> >>>
> >>> Changes in v3:
> >>> - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>> - rework the test to work when only pkey 0 is supported for rseq
> >>>
> >>> Changes in v2:
> >>> - change test to install protected rseq_cs instead of rseq
> >>> ---
> >>> tools/testing/selftests/rseq/Makefile | 2 +-
> >>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>> tools/testing/selftests/rseq/rseq.h | 1 +
> >>> 3 files changed, 100 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>> index 5a3432fceb586..9111d25fea3af 100644
> >>> --- a/tools/testing/selftests/rseq/Makefile
> >>> +++ b/tools/testing/selftests/rseq/Makefile
> >>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>
> >>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>
> >>> TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>
> >>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>> new file mode 100644
> >>> index 0000000000000..cc4dd98190942
> >>> --- /dev/null
> >>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>> @@ -0,0 +1,98 @@
> >>> +// SPDX-License-Identifier: LGPL-2.1
> >>> +/*
> >>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>> + */
> >>> +
> >>> +#define _GNU_SOURCE
> >>> +#include <err.h>
> >>> +#include <errno.h>
> >>> +#include <stdio.h>
> >>> +#include <stdlib.h>
> >>> +#include <string.h>
> >>> +#include <sys/mman.h>
> >>> +#include <sys/syscall.h>
> >>> +#include <ucontext.h>
> >>> +#include <unistd.h>
> >>> +
> >>> +#include "rseq.h"
> >>> +#include "rseq-abi.h"
> >>> +
> >>> +static int pkey;
> >>> +static ucontext_t ucp0, ucp1;
> >>> +
> >>> +static void coroutine(void)
> >>> +{
> >>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>> + /*
> >>> + * When we disable access to pkey 0, globals and TLS become
> >>> + * inaccessible too, so we need to tread carefully.
> >>> + * Pkey is global so we need to copy it onto the stack.
> >>> + */
> >>> + int pk = RSEQ_READ_ONCE(pkey);
> >>> + struct timespec ts;
> >>> +
> >>> + orig_pk0 = pkey_get(0);
> >>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>> + err(1, "pkey_set failed");
> >>> + old_pk0 = pkey_get(0);
> >>> + old_pk1 = pkey_get(pk);
> >>> +
> >>> + /*
> >>> + * Prevent compiler from initializing it by loading a 16-global.
> >>> + */
> >>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>> + /*
> >>> + * If the kernel misbehaves, context switches in the following loop
> >>> + * will terminate the process with SIGSEGV.
> >>> + * Trigger preemption w/o accessing TLS.
> >>> + * Note that glibc's usleep touches errno always.
> >>> + */
> >>> + for (i = 0; i < 10; i++)
> >>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>> +
> >>> + pk0 = pkey_get(0);
> >>> + pk1 = pkey_get(pk);
> >>> + if (pkey_set(0, orig_pk0))
> >>> + err(1, "pkey_set failed");
> >>> +
> >>> + /*
> >>> + * Ensure that the kernel has restored the previous value of pkeys
> >>> + * register after changing them.
> >>> + */
> >>> + if (old_pk0 != pk0)
> >>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>> + if (old_pk1 != pk1)
> >>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>> +
> >>> + swapcontext(&ucp1, &ucp0);
> >>> + abort();
> >>> +}
> >>> +
> >>> +int main(int argc, char **argv)
> >>> +{
> >>> + pkey = pkey_alloc(0, 0);
> >>> + if (pkey == -1) {
> >>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>> + strerror(errno));
> >>> + return 0;
> >>> + }
> >>> +
> >>> + if (rseq_register_current_thread())
> >>> + err(1, "rseq_register_current_thread failed");
> >>> +
> >>> + if (getcontext(&ucp1))
> >>> + err(1, "getcontext failed");
> >>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>> + err(1, "mmap failed");
> >>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>> + PROT_READ | PROT_WRITE, pkey))
> >>> + err(1, "pkey_mprotect failed");
> >>> + makecontext(&ucp1, coroutine, 0);
> >>> + if (swapcontext(&ucp0, &ucp1))
> >>> + err(1, "swapcontext failed");
> >>
> >> Should the rseq register be paired with a rseq unregister here ?
> >
> > I dunno. It's necessary for this test and in general. Tcmalloc won't
> > unregister on thread exit. Even for a libc it may be a bad idea due to
> > signals.
>
> The rseq register/unregister is only for the case where libc does not
> support rseq. But if you want to use rseq_register_current_thread(),
> then you'll want to pair it with unregister.
Why? Isn't it better to have tests more realitic?
> Thanks,
>
> Mathieu
>
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>> + return 0;
> >>> +}
> >>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>> index ba424ce80a719..65da4a727c550 100644
> >>> --- a/tools/testing/selftests/rseq/rseq.h
> >>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>> @@ -8,6 +8,7 @@
> >>> #ifndef RSEQ_H
> >>> #define RSEQ_H
> >>>
> >>> +#include <assert.h>
> >>> #include <stdint.h>
> >>> #include <stdbool.h>
> >>> #include <pthread.h>
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-03-10 14:43 ` Dmitry Vyukov
@ 2025-03-10 15:41 ` Mathieu Desnoyers
2025-03-10 16:31 ` Dmitry Vyukov
0 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2025-03-10 15:41 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: peterz, boqun.feng, tglx, mingo, bp, dave.hansen, hpa,
aruna.ramakrishna, elver, Paul E. McKenney, x86, linux-kernel
On 2025-03-10 10:43, Dmitry Vyukov wrote:
> On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2025-03-10 10:36, Dmitry Vyukov wrote:
>>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
>>> <mathieu.desnoyers@efficios.com> wrote:
>>>>
>>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
>>>>> Add a test that ensures that PKEY-protected struct rseq_cs
>>>>> works and does not lead to process kills.
>>>>>
>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: Borislav Petkov <bp@alien8.de>
>>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>>>> Cc: x86@kernel.org
>>>>> Cc: linux-kernel@vger.kernel.org
>>>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>>>
>>>>> ---
>>>>> Changes in v5:
>>>>> - Use static for variables/functions
>>>>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
>>>>>
>>>>> Changes in v4:
>>>>> - Added Fixes tag
>>>>>
>>>>> Changes in v3:
>>>>> - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>>> - rework the test to work when only pkey 0 is supported for rseq
>>>>>
>>>>> Changes in v2:
>>>>> - change test to install protected rseq_cs instead of rseq
>>>>> ---
>>>>> tools/testing/selftests/rseq/Makefile | 2 +-
>>>>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
>>>>> tools/testing/selftests/rseq/rseq.h | 1 +
>>>>> 3 files changed, 100 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
>>>>> index 5a3432fceb586..9111d25fea3af 100644
>>>>> --- a/tools/testing/selftests/rseq/Makefile
>>>>> +++ b/tools/testing/selftests/rseq/Makefile
>>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>>>>>
>>>>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>>>>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
>>>>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
>>>>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>>>>>
>>>>> TEST_GEN_PROGS_EXTENDED = librseq.so
>>>>>
>>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
>>>>> new file mode 100644
>>>>> index 0000000000000..cc4dd98190942
>>>>> --- /dev/null
>>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
>>>>> @@ -0,0 +1,98 @@
>>>>> +// SPDX-License-Identifier: LGPL-2.1
>>>>> +/*
>>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
>>>>> + */
>>>>> +
>>>>> +#define _GNU_SOURCE
>>>>> +#include <err.h>
>>>>> +#include <errno.h>
>>>>> +#include <stdio.h>
>>>>> +#include <stdlib.h>
>>>>> +#include <string.h>
>>>>> +#include <sys/mman.h>
>>>>> +#include <sys/syscall.h>
>>>>> +#include <ucontext.h>
>>>>> +#include <unistd.h>
>>>>> +
>>>>> +#include "rseq.h"
>>>>> +#include "rseq-abi.h"
>>>>> +
>>>>> +static int pkey;
>>>>> +static ucontext_t ucp0, ucp1;
>>>>> +
>>>>> +static void coroutine(void)
>>>>> +{
>>>>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
>>>>> + /*
>>>>> + * When we disable access to pkey 0, globals and TLS become
>>>>> + * inaccessible too, so we need to tread carefully.
>>>>> + * Pkey is global so we need to copy it onto the stack.
>>>>> + */
>>>>> + int pk = RSEQ_READ_ONCE(pkey);
>>>>> + struct timespec ts;
>>>>> +
>>>>> + orig_pk0 = pkey_get(0);
>>>>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
>>>>> + err(1, "pkey_set failed");
>>>>> + old_pk0 = pkey_get(0);
>>>>> + old_pk1 = pkey_get(pk);
>>>>> +
>>>>> + /*
>>>>> + * Prevent compiler from initializing it by loading a 16-global.
>>>>> + */
>>>>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
>>>>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
>>>>> + /*
>>>>> + * If the kernel misbehaves, context switches in the following loop
>>>>> + * will terminate the process with SIGSEGV.
>>>>> + * Trigger preemption w/o accessing TLS.
>>>>> + * Note that glibc's usleep touches errno always.
>>>>> + */
>>>>> + for (i = 0; i < 10; i++)
>>>>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
>>>>> +
>>>>> + pk0 = pkey_get(0);
>>>>> + pk1 = pkey_get(pk);
>>>>> + if (pkey_set(0, orig_pk0))
>>>>> + err(1, "pkey_set failed");
>>>>> +
>>>>> + /*
>>>>> + * Ensure that the kernel has restored the previous value of pkeys
>>>>> + * register after changing them.
>>>>> + */
>>>>> + if (old_pk0 != pk0)
>>>>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
>>>>> + if (old_pk1 != pk1)
>>>>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
>>>>> +
>>>>> + swapcontext(&ucp1, &ucp0);
>>>>> + abort();
>>>>> +}
>>>>> +
>>>>> +int main(int argc, char **argv)
>>>>> +{
>>>>> + pkey = pkey_alloc(0, 0);
>>>>> + if (pkey == -1) {
>>>>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
>>>>> + strerror(errno));
>>>>> + return 0;
>>>>> + }
>>>>> +
>>>>> + if (rseq_register_current_thread())
>>>>> + err(1, "rseq_register_current_thread failed");
>>>>> +
>>>>> + if (getcontext(&ucp1))
>>>>> + err(1, "getcontext failed");
>>>>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
>>>>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
>>>>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
>>>>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
>>>>> + err(1, "mmap failed");
>>>>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
>>>>> + PROT_READ | PROT_WRITE, pkey))
>>>>> + err(1, "pkey_mprotect failed");
>>>>> + makecontext(&ucp1, coroutine, 0);
>>>>> + if (swapcontext(&ucp0, &ucp1))
>>>>> + err(1, "swapcontext failed");
>>>>
>>>> Should the rseq register be paired with a rseq unregister here ?
>>>
>>> I dunno. It's necessary for this test and in general. Tcmalloc won't
>>> unregister on thread exit. Even for a libc it may be a bad idea due to
>>> signals.
>>
>> The rseq register/unregister is only for the case where libc does not
>> support rseq. But if you want to use rseq_register_current_thread(),
>> then you'll want to pair it with unregister.
>
> Why? Isn't it better to have tests more realitic?
If you use rseq.c rseq_register_current_thread without
rseq_unregister_current_thread, then you rely on implicit
unregistration done by the kernel at thread exit.
Then you need to ensure your userspace runtime keep the TLS
that holds the rseq area allocated for the entire execution
of the thread until it exits in the kernel. Note that
disabling signals is not sufficient to prevent the kernel
from accessing the rseq area.
GNU libc gets away with automatic unregistration at
thread exit because it completely controls the lifetime
of the registered rseq area, keeping it allocated for as
long as the thread is executing.
So in order to minimize the dependency on specific libc
behavior in the kernel sefltests, the selftests rseq.h
requires explicit registration *and* unregistration.
Thanks,
Mathieu
>
>
>> Thanks,
>>
>> Mathieu
>>
>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>> + return 0;
>>>>> +}
>>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
>>>>> index ba424ce80a719..65da4a727c550 100644
>>>>> --- a/tools/testing/selftests/rseq/rseq.h
>>>>> +++ b/tools/testing/selftests/rseq/rseq.h
>>>>> @@ -8,6 +8,7 @@
>>>>> #ifndef RSEQ_H
>>>>> #define RSEQ_H
>>>>>
>>>>> +#include <assert.h>
>>>>> #include <stdint.h>
>>>>> #include <stdbool.h>
>>>>> #include <pthread.h>
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> https://www.efficios.com
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-03-10 15:41 ` Mathieu Desnoyers
@ 2025-03-10 16:31 ` Dmitry Vyukov
2025-03-10 17:26 ` Mathieu Desnoyers
0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2025-03-10 16:31 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: peterz, boqun.feng, tglx, mingo, bp, dave.hansen, hpa,
aruna.ramakrishna, elver, Paul E. McKenney, x86, linux-kernel
On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-03-10 10:43, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >>
> >> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> >>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> >>> <mathieu.desnoyers@efficios.com> wrote:
> >>>>
> >>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>>>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>>>> works and does not lead to process kills.
> >>>>>
> >>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> >>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >>>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> >>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
> >>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>>> Cc: Ingo Molnar <mingo@redhat.com>
> >>>>> Cc: Borislav Petkov <bp@alien8.de>
> >>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> >>>>> Cc: x86@kernel.org
> >>>>> Cc: linux-kernel@vger.kernel.org
> >>>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>>>
> >>>>> ---
> >>>>> Changes in v5:
> >>>>> - Use static for variables/functions
> >>>>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>>>
> >>>>> Changes in v4:
> >>>>> - Added Fixes tag
> >>>>>
> >>>>> Changes in v3:
> >>>>> - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>> - rework the test to work when only pkey 0 is supported for rseq
> >>>>>
> >>>>> Changes in v2:
> >>>>> - change test to install protected rseq_cs instead of rseq
> >>>>> ---
> >>>>> tools/testing/selftests/rseq/Makefile | 2 +-
> >>>>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>>>> tools/testing/selftests/rseq/rseq.h | 1 +
> >>>>> 3 files changed, 100 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>>>> index 5a3432fceb586..9111d25fea3af 100644
> >>>>> --- a/tools/testing/selftests/rseq/Makefile
> >>>>> +++ b/tools/testing/selftests/rseq/Makefile
> >>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>>>
> >>>>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>>>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>>>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>>>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>>>
> >>>>> TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>>>
> >>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>>>> new file mode 100644
> >>>>> index 0000000000000..cc4dd98190942
> >>>>> --- /dev/null
> >>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>>>> @@ -0,0 +1,98 @@
> >>>>> +// SPDX-License-Identifier: LGPL-2.1
> >>>>> +/*
> >>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>>>> + */
> >>>>> +
> >>>>> +#define _GNU_SOURCE
> >>>>> +#include <err.h>
> >>>>> +#include <errno.h>
> >>>>> +#include <stdio.h>
> >>>>> +#include <stdlib.h>
> >>>>> +#include <string.h>
> >>>>> +#include <sys/mman.h>
> >>>>> +#include <sys/syscall.h>
> >>>>> +#include <ucontext.h>
> >>>>> +#include <unistd.h>
> >>>>> +
> >>>>> +#include "rseq.h"
> >>>>> +#include "rseq-abi.h"
> >>>>> +
> >>>>> +static int pkey;
> >>>>> +static ucontext_t ucp0, ucp1;
> >>>>> +
> >>>>> +static void coroutine(void)
> >>>>> +{
> >>>>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>>>> + /*
> >>>>> + * When we disable access to pkey 0, globals and TLS become
> >>>>> + * inaccessible too, so we need to tread carefully.
> >>>>> + * Pkey is global so we need to copy it onto the stack.
> >>>>> + */
> >>>>> + int pk = RSEQ_READ_ONCE(pkey);
> >>>>> + struct timespec ts;
> >>>>> +
> >>>>> + orig_pk0 = pkey_get(0);
> >>>>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>>>> + err(1, "pkey_set failed");
> >>>>> + old_pk0 = pkey_get(0);
> >>>>> + old_pk1 = pkey_get(pk);
> >>>>> +
> >>>>> + /*
> >>>>> + * Prevent compiler from initializing it by loading a 16-global.
> >>>>> + */
> >>>>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>>>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>>>> + /*
> >>>>> + * If the kernel misbehaves, context switches in the following loop
> >>>>> + * will terminate the process with SIGSEGV.
> >>>>> + * Trigger preemption w/o accessing TLS.
> >>>>> + * Note that glibc's usleep touches errno always.
> >>>>> + */
> >>>>> + for (i = 0; i < 10; i++)
> >>>>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>>>> +
> >>>>> + pk0 = pkey_get(0);
> >>>>> + pk1 = pkey_get(pk);
> >>>>> + if (pkey_set(0, orig_pk0))
> >>>>> + err(1, "pkey_set failed");
> >>>>> +
> >>>>> + /*
> >>>>> + * Ensure that the kernel has restored the previous value of pkeys
> >>>>> + * register after changing them.
> >>>>> + */
> >>>>> + if (old_pk0 != pk0)
> >>>>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>>>> + if (old_pk1 != pk1)
> >>>>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>>>> +
> >>>>> + swapcontext(&ucp1, &ucp0);
> >>>>> + abort();
> >>>>> +}
> >>>>> +
> >>>>> +int main(int argc, char **argv)
> >>>>> +{
> >>>>> + pkey = pkey_alloc(0, 0);
> >>>>> + if (pkey == -1) {
> >>>>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>>>> + strerror(errno));
> >>>>> + return 0;
> >>>>> + }
> >>>>> +
> >>>>> + if (rseq_register_current_thread())
> >>>>> + err(1, "rseq_register_current_thread failed");
> >>>>> +
> >>>>> + if (getcontext(&ucp1))
> >>>>> + err(1, "getcontext failed");
> >>>>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>>>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>>>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>>>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>>>> + err(1, "mmap failed");
> >>>>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>>>> + PROT_READ | PROT_WRITE, pkey))
> >>>>> + err(1, "pkey_mprotect failed");
> >>>>> + makecontext(&ucp1, coroutine, 0);
> >>>>> + if (swapcontext(&ucp0, &ucp1))
> >>>>> + err(1, "swapcontext failed");
> >>>>
> >>>> Should the rseq register be paired with a rseq unregister here ?
> >>>
> >>> I dunno. It's necessary for this test and in general. Tcmalloc won't
> >>> unregister on thread exit. Even for a libc it may be a bad idea due to
> >>> signals.
> >>
> >> The rseq register/unregister is only for the case where libc does not
> >> support rseq. But if you want to use rseq_register_current_thread(),
> >> then you'll want to pair it with unregister.
> >
> > Why? Isn't it better to have tests more realitic?
>
> If you use rseq.c rseq_register_current_thread without
> rseq_unregister_current_thread, then you rely on implicit
> unregistration done by the kernel at thread exit.
>
> Then you need to ensure your userspace runtime keep the TLS
> that holds the rseq area allocated for the entire execution
> of the thread until it exits in the kernel. Note that
> disabling signals is not sufficient to prevent the kernel
> from accessing the rseq area.
>
> GNU libc gets away with automatic unregistration at
> thread exit because it completely controls the lifetime
> of the registered rseq area, keeping it allocated for as
> long as the thread is executing.
>
> So in order to minimize the dependency on specific libc
> behavior in the kernel sefltests, the selftests rseq.h
> requires explicit registration *and* unregistration.
If libc registers rseq (!rseq_ownership), then
rseq_unregister_current_thread is a no-op. And if libc has not
registered rseq, then we are not relying on any libc behavior. I don't
see how calling rseq_unregister_current_thread helps to rely less on
libc. Am I missing something?
> Thanks,
>
> Mathieu
>
>
> >
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>>
> >>>> Thanks,
> >>>>
> >>>> Mathieu
> >>>>
> >>>>> + return 0;
> >>>>> +}
> >>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>>>> index ba424ce80a719..65da4a727c550 100644
> >>>>> --- a/tools/testing/selftests/rseq/rseq.h
> >>>>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>>>> @@ -8,6 +8,7 @@
> >>>>> #ifndef RSEQ_H
> >>>>> #define RSEQ_H
> >>>>>
> >>>>> +#include <assert.h>
> >>>>> #include <stdint.h>
> >>>>> #include <stdbool.h>
> >>>>> #include <pthread.h>
> >>>>
> >>>>
> >>>> --
> >>>> Mathieu Desnoyers
> >>>> EfficiOS Inc.
> >>>> https://www.efficios.com
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-03-10 16:31 ` Dmitry Vyukov
@ 2025-03-10 17:26 ` Mathieu Desnoyers
2025-05-21 8:57 ` Dmitry Vyukov
0 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2025-03-10 17:26 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: peterz, boqun.feng, tglx, mingo, bp, dave.hansen, hpa,
aruna.ramakrishna, elver, Paul E. McKenney, x86, linux-kernel,
Michael Jeanson
On 2025-03-10 12:31, Dmitry Vyukov wrote:
> On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2025-03-10 10:43, Dmitry Vyukov wrote:
>>> On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
>>> <mathieu.desnoyers@efficios.com> wrote:
>>>>
>>>> On 2025-03-10 10:36, Dmitry Vyukov wrote:
>>>>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
>>>>> <mathieu.desnoyers@efficios.com> wrote:
>>>>>>
>>>>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
>>>>>>> Add a test that ensures that PKEY-protected struct rseq_cs
>>>>>>> works and does not lead to process kills.
>>>>>>>
>>>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
>>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>>>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
>>>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>>>> Cc: Borislav Petkov <bp@alien8.de>
>>>>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
>>>>>>> Cc: x86@kernel.org
>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
>>>>>>>
>>>>>>> ---
>>>>>>> Changes in v5:
>>>>>>> - Use static for variables/functions
>>>>>>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
>>>>>>>
>>>>>>> Changes in v4:
>>>>>>> - Added Fixes tag
>>>>>>>
>>>>>>> Changes in v3:
>>>>>>> - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
>>>>>>> - rework the test to work when only pkey 0 is supported for rseq
>>>>>>>
>>>>>>> Changes in v2:
>>>>>>> - change test to install protected rseq_cs instead of rseq
>>>>>>> ---
>>>>>>> tools/testing/selftests/rseq/Makefile | 2 +-
>>>>>>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
>>>>>>> tools/testing/selftests/rseq/rseq.h | 1 +
>>>>>>> 3 files changed, 100 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
>>>>>>> index 5a3432fceb586..9111d25fea3af 100644
>>>>>>> --- a/tools/testing/selftests/rseq/Makefile
>>>>>>> +++ b/tools/testing/selftests/rseq/Makefile
>>>>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
>>>>>>>
>>>>>>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
>>>>>>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
>>>>>>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
>>>>>>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
>>>>>>>
>>>>>>> TEST_GEN_PROGS_EXTENDED = librseq.so
>>>>>>>
>>>>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
>>>>>>> new file mode 100644
>>>>>>> index 0000000000000..cc4dd98190942
>>>>>>> --- /dev/null
>>>>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
>>>>>>> @@ -0,0 +1,98 @@
>>>>>>> +// SPDX-License-Identifier: LGPL-2.1
>>>>>>> +/*
>>>>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
>>>>>>> + */
>>>>>>> +
>>>>>>> +#define _GNU_SOURCE
>>>>>>> +#include <err.h>
>>>>>>> +#include <errno.h>
>>>>>>> +#include <stdio.h>
>>>>>>> +#include <stdlib.h>
>>>>>>> +#include <string.h>
>>>>>>> +#include <sys/mman.h>
>>>>>>> +#include <sys/syscall.h>
>>>>>>> +#include <ucontext.h>
>>>>>>> +#include <unistd.h>
>>>>>>> +
>>>>>>> +#include "rseq.h"
>>>>>>> +#include "rseq-abi.h"
>>>>>>> +
>>>>>>> +static int pkey;
>>>>>>> +static ucontext_t ucp0, ucp1;
>>>>>>> +
>>>>>>> +static void coroutine(void)
>>>>>>> +{
>>>>>>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
>>>>>>> + /*
>>>>>>> + * When we disable access to pkey 0, globals and TLS become
>>>>>>> + * inaccessible too, so we need to tread carefully.
>>>>>>> + * Pkey is global so we need to copy it onto the stack.
>>>>>>> + */
>>>>>>> + int pk = RSEQ_READ_ONCE(pkey);
>>>>>>> + struct timespec ts;
>>>>>>> +
>>>>>>> + orig_pk0 = pkey_get(0);
>>>>>>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
>>>>>>> + err(1, "pkey_set failed");
>>>>>>> + old_pk0 = pkey_get(0);
>>>>>>> + old_pk1 = pkey_get(pk);
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Prevent compiler from initializing it by loading a 16-global.
>>>>>>> + */
>>>>>>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
>>>>>>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
>>>>>>> + /*
>>>>>>> + * If the kernel misbehaves, context switches in the following loop
>>>>>>> + * will terminate the process with SIGSEGV.
>>>>>>> + * Trigger preemption w/o accessing TLS.
>>>>>>> + * Note that glibc's usleep touches errno always.
>>>>>>> + */
>>>>>>> + for (i = 0; i < 10; i++)
>>>>>>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
>>>>>>> +
>>>>>>> + pk0 = pkey_get(0);
>>>>>>> + pk1 = pkey_get(pk);
>>>>>>> + if (pkey_set(0, orig_pk0))
>>>>>>> + err(1, "pkey_set failed");
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Ensure that the kernel has restored the previous value of pkeys
>>>>>>> + * register after changing them.
>>>>>>> + */
>>>>>>> + if (old_pk0 != pk0)
>>>>>>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
>>>>>>> + if (old_pk1 != pk1)
>>>>>>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
>>>>>>> +
>>>>>>> + swapcontext(&ucp1, &ucp0);
>>>>>>> + abort();
>>>>>>> +}
>>>>>>> +
>>>>>>> +int main(int argc, char **argv)
>>>>>>> +{
>>>>>>> + pkey = pkey_alloc(0, 0);
>>>>>>> + if (pkey == -1) {
>>>>>>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
>>>>>>> + strerror(errno));
>>>>>>> + return 0;
>>>>>>> + }
>>>>>>> +
>>>>>>> + if (rseq_register_current_thread())
>>>>>>> + err(1, "rseq_register_current_thread failed");
>>>>>>> +
>>>>>>> + if (getcontext(&ucp1))
>>>>>>> + err(1, "getcontext failed");
>>>>>>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
>>>>>>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
>>>>>>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
>>>>>>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
>>>>>>> + err(1, "mmap failed");
>>>>>>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
>>>>>>> + PROT_READ | PROT_WRITE, pkey))
>>>>>>> + err(1, "pkey_mprotect failed");
>>>>>>> + makecontext(&ucp1, coroutine, 0);
>>>>>>> + if (swapcontext(&ucp0, &ucp1))
>>>>>>> + err(1, "swapcontext failed");
>>>>>>
>>>>>> Should the rseq register be paired with a rseq unregister here ?
>>>>>
>>>>> I dunno. It's necessary for this test and in general. Tcmalloc won't
>>>>> unregister on thread exit. Even for a libc it may be a bad idea due to
>>>>> signals.
>>>>
>>>> The rseq register/unregister is only for the case where libc does not
>>>> support rseq. But if you want to use rseq_register_current_thread(),
>>>> then you'll want to pair it with unregister.
>>>
>>> Why? Isn't it better to have tests more realitic?
>>
>> If you use rseq.c rseq_register_current_thread without
>> rseq_unregister_current_thread, then you rely on implicit
>> unregistration done by the kernel at thread exit.
>>
>> Then you need to ensure your userspace runtime keep the TLS
>> that holds the rseq area allocated for the entire execution
>> of the thread until it exits in the kernel. Note that
>> disabling signals is not sufficient to prevent the kernel
>> from accessing the rseq area.
>>
>> GNU libc gets away with automatic unregistration at
>> thread exit because it completely controls the lifetime
>> of the registered rseq area, keeping it allocated for as
>> long as the thread is executing.
>>
>> So in order to minimize the dependency on specific libc
>> behavior in the kernel sefltests, the selftests rseq.h
>> requires explicit registration *and* unregistration.
>
> If libc registers rseq (!rseq_ownership), then
> rseq_unregister_current_thread is a no-op. And if libc has not
> registered rseq, then we are not relying on any libc behavior. I don't
> see how calling rseq_unregister_current_thread helps to rely less on
> libc. Am I missing something?
When libc does not support rseq (either because of the
glibc tunable, or having an old glibc, or running another
libc), rseq.c in the selftests registers an initial-exec
TLS rseq area. This TLS rseq area's lifetime is handled
by the libc. I don't want to depend on implementation-specific
libc behavior, unless they are documented and part of the
ABI.
In your case the area won't be re-used because the program
exits, but I'd rather use the same approach everywhere,
which is to unregister explicitly.
Note that in the librseq project, we are currently planning
to remove the explicit thread registration/unregistration
from the API, and will exclusively depend on having rseq support
provided by the libc. It simplifies the implementation, the API,
and will make it OK to link a dlopen'd .so against librseq.
There has been no official release of librseq yet, so now is
a good time to simplify the API and aim at what is becoming
the primary use-case.
Thanks,
Mathieu
>
>
>
>
>> Thanks,
>>
>> Mathieu
>>
>>
>>>
>>>
>>>> Thanks,
>>>>
>>>> Mathieu
>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mathieu
>>>>>>
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
>>>>>>> index ba424ce80a719..65da4a727c550 100644
>>>>>>> --- a/tools/testing/selftests/rseq/rseq.h
>>>>>>> +++ b/tools/testing/selftests/rseq/rseq.h
>>>>>>> @@ -8,6 +8,7 @@
>>>>>>> #ifndef RSEQ_H
>>>>>>> #define RSEQ_H
>>>>>>>
>>>>>>> +#include <assert.h>
>>>>>>> #include <stdint.h>
>>>>>>> #include <stdbool.h>
>>>>>>> #include <pthread.h>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mathieu Desnoyers
>>>>>> EfficiOS Inc.
>>>>>> https://www.efficios.com
>>>>
>>>>
>>>> --
>>>> Mathieu Desnoyers
>>>> EfficiOS Inc.
>>>> https://www.efficios.com
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] rseq: Make rseq work with protection keys
2025-03-10 14:39 ` Dmitry Vyukov
@ 2025-03-10 18:51 ` Ingo Molnar
0 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2025-03-10 18:51 UTC (permalink / raw)
To: Dmitry Vyukov, Dave Hansen
Cc: Mathieu Desnoyers, peterz, boqun.feng, tglx, mingo, bp,
dave.hansen, hpa, aruna.ramakrishna, elver, Paul E. McKenney, x86,
linux-kernel
* Dmitry Vyukov <dvyukov@google.com> wrote:
> > >> If an application registers rseq, and ever switches to another
> > >> pkey protection (such that the rseq becomes inaccessible), then
> > >> any context switch will cause failure in
> > >> __rseq_handle_notify_resume() attempting to read/write struct
> > >> rseq and/or rseq_cs. Since context switches are asynchronous and
> > >> are outside of the application control (not part of the
> > >> restricted code scope), temporarily switch to pkey value that
> > >> allows access to the 0 (default) PKEY.
> > >>
> > >> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> > >> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > >> Cc: Peter Zijlstra <peterz@infradead.org>
> > >> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> > >> Cc: Boqun Feng <boqun.feng@gmail.com>
> > >> Cc: Thomas Gleixner <tglx@linutronix.de>
> > >> Cc: Ingo Molnar <mingo@redhat.com>
> > >> Cc: Borislav Petkov <bp@alien8.de>
> > >> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > >> Cc: "H. Peter Anvin" <hpa@zytor.com>
> > >> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> > >> Cc: x86@kernel.org
> > >> Cc: linux-kernel@vger.kernel.org
> > >> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> > >
> > > Any remaining concerns with this series?
> > >
> > > What tree should it go into?
> >
> > Usually the rseq bits go through the -tip tree.
>
> Thomas, Ingo, can you please take this to -tip tree? Or who would that be?
I was waiting whether Dave Hansen would have an opinion on this series.
Also, could you please add all the new Reviewed-by tags for the next
version, plus there was still a bit of a discussion on patch #4, has
that been resolved?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys
2025-03-10 17:26 ` Mathieu Desnoyers
@ 2025-05-21 8:57 ` Dmitry Vyukov
0 siblings, 0 replies; 18+ messages in thread
From: Dmitry Vyukov @ 2025-05-21 8:57 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: peterz, boqun.feng, tglx, mingo, bp, dave.hansen, hpa,
aruna.ramakrishna, elver, Paul E. McKenney, x86, linux-kernel,
Michael Jeanson
On Mon, 10 Mar 2025 at 18:26, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> On 2025-03-10 12:31, Dmitry Vyukov wrote:
> > On Mon, 10 Mar 2025 at 16:41, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >>
> >> On 2025-03-10 10:43, Dmitry Vyukov wrote:
> >>> On Mon, 10 Mar 2025 at 15:38, Mathieu Desnoyers
> >>> <mathieu.desnoyers@efficios.com> wrote:
> >>>>
> >>>> On 2025-03-10 10:36, Dmitry Vyukov wrote:
> >>>>> On Mon, 10 Mar 2025 at 15:30, Mathieu Desnoyers
> >>>>> <mathieu.desnoyers@efficios.com> wrote:
> >>>>>>
> >>>>>> On 2025-02-27 09:03, Dmitry Vyukov wrote:
> >>>>>>> Add a test that ensures that PKEY-protected struct rseq_cs
> >>>>>>> works and does not lead to process kills.
> >>>>>>>
> >>>>>>> Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> >>>>>>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >>>>>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>>>>> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> >>>>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
> >>>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>>>>>> Cc: Ingo Molnar <mingo@redhat.com>
> >>>>>>> Cc: Borislav Petkov <bp@alien8.de>
> >>>>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>>>>>> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
> >>>>>>> Cc: x86@kernel.org
> >>>>>>> Cc: linux-kernel@vger.kernel.org
> >>>>>>> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>>>> Fixes: d7822b1e24f2 ("rseq: Introduce restartable sequences system call")
> >>>>>>>
> >>>>>>> ---
> >>>>>>> Changes in v5:
> >>>>>>> - Use static for variables/functions
> >>>>>>> - Use RSEQ_READ/WRITE_ONCE instead of volatile
> >>>>>>>
> >>>>>>> Changes in v4:
> >>>>>>> - Added Fixes tag
> >>>>>>>
> >>>>>>> Changes in v3:
> >>>>>>> - added Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
> >>>>>>> - rework the test to work when only pkey 0 is supported for rseq
> >>>>>>>
> >>>>>>> Changes in v2:
> >>>>>>> - change test to install protected rseq_cs instead of rseq
> >>>>>>> ---
> >>>>>>> tools/testing/selftests/rseq/Makefile | 2 +-
> >>>>>>> tools/testing/selftests/rseq/pkey_test.c | 98 ++++++++++++++++++++++++
> >>>>>>> tools/testing/selftests/rseq/rseq.h | 1 +
> >>>>>>> 3 files changed, 100 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
> >>>>>>> index 5a3432fceb586..9111d25fea3af 100644
> >>>>>>> --- a/tools/testing/selftests/rseq/Makefile
> >>>>>>> +++ b/tools/testing/selftests/rseq/Makefile
> >>>>>>> @@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
> >>>>>>>
> >>>>>>> TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
> >>>>>>> param_test_benchmark param_test_compare_twice param_test_mm_cid \
> >>>>>>> - param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
> >>>>>>> + param_test_mm_cid_benchmark param_test_mm_cid_compare_twice pkey_test
> >>>>>>>
> >>>>>>> TEST_GEN_PROGS_EXTENDED = librseq.so
> >>>>>>>
> >>>>>>> diff --git a/tools/testing/selftests/rseq/pkey_test.c b/tools/testing/selftests/rseq/pkey_test.c
> >>>>>>> new file mode 100644
> >>>>>>> index 0000000000000..cc4dd98190942
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/tools/testing/selftests/rseq/pkey_test.c
> >>>>>>> @@ -0,0 +1,98 @@
> >>>>>>> +// SPDX-License-Identifier: LGPL-2.1
> >>>>>>> +/*
> >>>>>>> + * Ensure that rseq works when rseq data is inaccessible due to PKEYs.
> >>>>>>> + */
> >>>>>>> +
> >>>>>>> +#define _GNU_SOURCE
> >>>>>>> +#include <err.h>
> >>>>>>> +#include <errno.h>
> >>>>>>> +#include <stdio.h>
> >>>>>>> +#include <stdlib.h>
> >>>>>>> +#include <string.h>
> >>>>>>> +#include <sys/mman.h>
> >>>>>>> +#include <sys/syscall.h>
> >>>>>>> +#include <ucontext.h>
> >>>>>>> +#include <unistd.h>
> >>>>>>> +
> >>>>>>> +#include "rseq.h"
> >>>>>>> +#include "rseq-abi.h"
> >>>>>>> +
> >>>>>>> +static int pkey;
> >>>>>>> +static ucontext_t ucp0, ucp1;
> >>>>>>> +
> >>>>>>> +static void coroutine(void)
> >>>>>>> +{
> >>>>>>> + int i, orig_pk0, old_pk0, old_pk1, pk0, pk1;
> >>>>>>> + /*
> >>>>>>> + * When we disable access to pkey 0, globals and TLS become
> >>>>>>> + * inaccessible too, so we need to tread carefully.
> >>>>>>> + * Pkey is global so we need to copy it onto the stack.
> >>>>>>> + */
> >>>>>>> + int pk = RSEQ_READ_ONCE(pkey);
> >>>>>>> + struct timespec ts;
> >>>>>>> +
> >>>>>>> + orig_pk0 = pkey_get(0);
> >>>>>>> + if (pkey_set(0, PKEY_DISABLE_ACCESS))
> >>>>>>> + err(1, "pkey_set failed");
> >>>>>>> + old_pk0 = pkey_get(0);
> >>>>>>> + old_pk1 = pkey_get(pk);
> >>>>>>> +
> >>>>>>> + /*
> >>>>>>> + * Prevent compiler from initializing it by loading a 16-global.
> >>>>>>> + */
> >>>>>>> + RSEQ_WRITE_ONCE(ts.tv_sec, 0);
> >>>>>>> + RSEQ_WRITE_ONCE(ts.tv_nsec, 10 * 1000);
> >>>>>>> + /*
> >>>>>>> + * If the kernel misbehaves, context switches in the following loop
> >>>>>>> + * will terminate the process with SIGSEGV.
> >>>>>>> + * Trigger preemption w/o accessing TLS.
> >>>>>>> + * Note that glibc's usleep touches errno always.
> >>>>>>> + */
> >>>>>>> + for (i = 0; i < 10; i++)
> >>>>>>> + syscall(SYS_clock_nanosleep, CLOCK_MONOTONIC, 0, &ts, NULL);
> >>>>>>> +
> >>>>>>> + pk0 = pkey_get(0);
> >>>>>>> + pk1 = pkey_get(pk);
> >>>>>>> + if (pkey_set(0, orig_pk0))
> >>>>>>> + err(1, "pkey_set failed");
> >>>>>>> +
> >>>>>>> + /*
> >>>>>>> + * Ensure that the kernel has restored the previous value of pkeys
> >>>>>>> + * register after changing them.
> >>>>>>> + */
> >>>>>>> + if (old_pk0 != pk0)
> >>>>>>> + errx(1, "pkey 0 changed %d->%d", old_pk0, pk0);
> >>>>>>> + if (old_pk1 != pk1)
> >>>>>>> + errx(1, "pkey 1 changed %d->%d", old_pk1, pk1);
> >>>>>>> +
> >>>>>>> + swapcontext(&ucp1, &ucp0);
> >>>>>>> + abort();
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +int main(int argc, char **argv)
> >>>>>>> +{
> >>>>>>> + pkey = pkey_alloc(0, 0);
> >>>>>>> + if (pkey == -1) {
> >>>>>>> + printf("[SKIP]\tKernel does not support PKEYs: %s\n",
> >>>>>>> + strerror(errno));
> >>>>>>> + return 0;
> >>>>>>> + }
> >>>>>>> +
> >>>>>>> + if (rseq_register_current_thread())
> >>>>>>> + err(1, "rseq_register_current_thread failed");
> >>>>>>> +
> >>>>>>> + if (getcontext(&ucp1))
> >>>>>>> + err(1, "getcontext failed");
> >>>>>>> + ucp1.uc_stack.ss_size = getpagesize() * 4;
> >>>>>>> + ucp1.uc_stack.ss_sp = mmap(NULL, ucp1.uc_stack.ss_size,
> >>>>>>> + PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
> >>>>>>> + if (ucp1.uc_stack.ss_sp == MAP_FAILED)
> >>>>>>> + err(1, "mmap failed");
> >>>>>>> + if (pkey_mprotect(ucp1.uc_stack.ss_sp, ucp1.uc_stack.ss_size,
> >>>>>>> + PROT_READ | PROT_WRITE, pkey))
> >>>>>>> + err(1, "pkey_mprotect failed");
> >>>>>>> + makecontext(&ucp1, coroutine, 0);
> >>>>>>> + if (swapcontext(&ucp0, &ucp1))
> >>>>>>> + err(1, "swapcontext failed");
> >>>>>>
> >>>>>> Should the rseq register be paired with a rseq unregister here ?
> >>>>>
> >>>>> I dunno. It's necessary for this test and in general. Tcmalloc won't
> >>>>> unregister on thread exit. Even for a libc it may be a bad idea due to
> >>>>> signals.
> >>>>
> >>>> The rseq register/unregister is only for the case where libc does not
> >>>> support rseq. But if you want to use rseq_register_current_thread(),
> >>>> then you'll want to pair it with unregister.
> >>>
> >>> Why? Isn't it better to have tests more realitic?
> >>
> >> If you use rseq.c rseq_register_current_thread without
> >> rseq_unregister_current_thread, then you rely on implicit
> >> unregistration done by the kernel at thread exit.
> >>
> >> Then you need to ensure your userspace runtime keep the TLS
> >> that holds the rseq area allocated for the entire execution
> >> of the thread until it exits in the kernel. Note that
> >> disabling signals is not sufficient to prevent the kernel
> >> from accessing the rseq area.
> >>
> >> GNU libc gets away with automatic unregistration at
> >> thread exit because it completely controls the lifetime
> >> of the registered rseq area, keeping it allocated for as
> >> long as the thread is executing.
> >>
> >> So in order to minimize the dependency on specific libc
> >> behavior in the kernel sefltests, the selftests rseq.h
> >> requires explicit registration *and* unregistration.
> >
> > If libc registers rseq (!rseq_ownership), then
> > rseq_unregister_current_thread is a no-op. And if libc has not
> > registered rseq, then we are not relying on any libc behavior. I don't
> > see how calling rseq_unregister_current_thread helps to rely less on
> > libc. Am I missing something?
>
> When libc does not support rseq (either because of the
> glibc tunable, or having an old glibc, or running another
> libc), rseq.c in the selftests registers an initial-exec
> TLS rseq area. This TLS rseq area's lifetime is handled
> by the libc. I don't want to depend on implementation-specific
> libc behavior, unless they are documented and part of the
> ABI.
>
> In your case the area won't be re-used because the program
> exits, but I'd rather use the same approach everywhere,
> which is to unregister explicitly.
>
> Note that in the librseq project, we are currently planning
> to remove the explicit thread registration/unregistration
> from the API, and will exclusively depend on having rseq support
> provided by the libc. It simplifies the implementation, the API,
> and will make it OK to link a dlopen'd .so against librseq.
> There has been no official release of librseq yet, so now is
> a good time to simplify the API and aim at what is becoming
> the primary use-case.
Hi,
I was distracted by other things for a while.
But now resent rebased series v7 with this fix.
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >>
> >>>
> >>>
> >>>> Thanks,
> >>>>
> >>>> Mathieu
> >>>>
> >>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Mathieu
> >>>>>>
> >>>>>>> + return 0;
> >>>>>>> +}
> >>>>>>> diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
> >>>>>>> index ba424ce80a719..65da4a727c550 100644
> >>>>>>> --- a/tools/testing/selftests/rseq/rseq.h
> >>>>>>> +++ b/tools/testing/selftests/rseq/rseq.h
> >>>>>>> @@ -8,6 +8,7 @@
> >>>>>>> #ifndef RSEQ_H
> >>>>>>> #define RSEQ_H
> >>>>>>>
> >>>>>>> +#include <assert.h>
> >>>>>>> #include <stdint.h>
> >>>>>>> #include <stdbool.h>
> >>>>>>> #include <pthread.h>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Mathieu Desnoyers
> >>>>>> EfficiOS Inc.
> >>>>>> https://www.efficios.com
> >>>>
> >>>>
> >>>> --
> >>>> Mathieu Desnoyers
> >>>> EfficiOS Inc.
> >>>> https://www.efficios.com
> >>
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> https://www.efficios.com
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-05-21 8:57 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-27 14:03 [PATCH v6 0/4] rseq: Make rseq work with protection keys Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 1/4] pkeys: add API to switch to permissive/zero pkey register Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 2/4] x86/signal: Use write_permissive_pkey_val() helper Dmitry Vyukov
2025-02-27 14:03 ` [PATCH v6 3/4] rseq: Make rseq work with protection keys Dmitry Vyukov
2025-03-08 10:02 ` Dmitry Vyukov
2025-03-10 14:31 ` Mathieu Desnoyers
2025-03-10 14:39 ` Dmitry Vyukov
2025-03-10 18:51 ` Ingo Molnar
2025-03-10 14:26 ` Mathieu Desnoyers
2025-02-27 14:03 ` [PATCH v6 4/4] selftests/rseq: Add test for rseq+pkeys Dmitry Vyukov
2025-03-10 14:30 ` Mathieu Desnoyers
2025-03-10 14:36 ` Dmitry Vyukov
2025-03-10 14:38 ` Mathieu Desnoyers
2025-03-10 14:43 ` Dmitry Vyukov
2025-03-10 15:41 ` Mathieu Desnoyers
2025-03-10 16:31 ` Dmitry Vyukov
2025-03-10 17:26 ` Mathieu Desnoyers
2025-05-21 8:57 ` Dmitry Vyukov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.