* [RFC PATCH] rseq: Validate read-only fields under DEBUG_RSEQ config
@ 2024-10-18 17:12 Mathieu Desnoyers
2024-10-23 2:46 ` kernel test robot
0 siblings, 1 reply; 2+ messages in thread
From: Mathieu Desnoyers @ 2024-10-18 17:12 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, Mathieu Desnoyers, Paul E . McKenney, Boqun Feng,
Andrew Morton, Andy Lutomirski, Ingo Molnar, Peter Oskolkov,
Dmitry Vyukov, Marco Elver, Florian Weimer, Carlos O'Donell,
DJ Delorie, libc-alpha
The rseq uapi requires cooperation between users of the rseq fields
to ensure that all libraries and applications using rseq within a
process do not interfere with each other.
This is especially important for fields which are meant to be read-only
from user-space, as documented in uapi/linux/rseq.h:
- cpu_id_start,
- cpu_id,
- node_id,
- mm_cid.
Storing to those fields from a user-space library prevents any sharing
of the rseq ABI with other libraries and applications, as other users
are not aware that the content of those fields has been altered by a
third-party library.
This is unfortunately the current behavior of tcmalloc: it purposefully
overlaps part of a cached value with the cpu_id_start upper bits to get
notified about preemption, because the kernel clears those upper bits
before returning to user-space. This behavior does not conform to the
rseq uapi header ABI.
This prevents tcmalloc from using rseq when rseq is registered by the
GNU C library 2.35+. It requires tcmalloc users to disable glibc rseq
registration with a glibc tunable, which is a sad state of affairs.
Considering that tcmalloc and the GNU C library are the two first
upstream projects using rseq, and that they are already incompatible due
to use of this hack, adding kernel-level validation of all read-only
fields content is necessary to ensure future users of rseq abide by the
rseq ABI requirements.
Validate that user-space does not corrupt the read-only fields and
conform to the rseq uapi header ABI when the kernel is built with
CONFIG_DEBUG_RSEQ=y. This is done by storing a copy of the read-only
fields in the task_struct, and validating the prior values present in
user-space before updating them. If the values do not match, print
a warning on the console (printk_ratelimited()).
This is a first step to identify misuses of the rseq ABI by printing
a warning on the console. After a giving some time to userspace to
correct its use of rseq, the plan is to eventually terminate offending
processes with SIGSEGV.
This change is expected to produce warnings for the upstream tcmalloc
implementation, but tcmalloc developers mentioned they were open to
adapt their implementation to kernel-level change.
Link: https://lore.kernel.org/all/CACT4Y+beLh1qnHF9bxhMUcva8KyuvZs7Mg_31SGK5xSoR=3m1A@mail.gmail.com/
Link: https://github.com/google/tcmalloc/issues/144
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Oskolkov <posk@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: DJ Delorie <dj@redhat.com>
Cc: libc-alpha@sourceware.org
---
include/linux/sched.h | 3 ++
kernel/rseq.c | 87 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 90 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f8d150343d42..041b7cd96423 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1340,6 +1340,9 @@ struct task_struct {
* with respect to preemption.
*/
unsigned long rseq_event_mask;
+# ifdef CONFIG_DEBUG_RSEQ
+ struct rseq rseq_fields;
+# endif
#endif
#ifdef CONFIG_SCHED_MM_CID
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 9de6e35fe679..46af9b0fe8a2 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -25,6 +25,56 @@
RSEQ_CS_FLAG_NO_RESTART_ON_SIGNAL | \
RSEQ_CS_FLAG_NO_RESTART_ON_MIGRATE)
+#ifdef CONFIG_DEBUG_RSEQ
+static int rseq_validate_ro_fields(struct task_struct *t)
+{
+ u32 cpu_id_start, cpu_id, node_id, mm_cid;
+ struct rseq __user *rseq = t->rseq;
+
+ /*
+ * Validate fields which are required to be read-only by
+ * user-space.
+ */
+ if (!user_read_access_begin(rseq, t->rseq_len))
+ goto efault;
+ unsafe_get_user(cpu_id_start, &rseq->cpu_id_start, efault_end);
+ unsafe_get_user(cpu_id, &rseq->cpu_id, efault_end);
+ unsafe_get_user(node_id, &rseq->node_id, efault_end);
+ unsafe_get_user(mm_cid, &rseq->mm_cid, efault_end);
+ user_read_access_end();
+
+ if (cpu_id_start != t->rseq_fields.cpu_id_start)
+ printk_ratelimited(KERN_WARNING
+ "Detected rseq cpu_id_start field corruption. Value: %u, expecting: %u (pid=%d).\n",
+ cpu_id_start, t->rseq_fields.cpu_id_start, t->pid);
+ if (cpu_id != t->rseq_fields.cpu_id)
+ printk_ratelimited(KERN_WARNING
+ "Detected rseq cpu_id field corruption. Value: %u, expecting: %u (pid=%d).\n",
+ cpu_id, t->rseq_fields.cpu_id, t->pid);
+ if (node_id != t->rseq_fields.node_id)
+ printk_ratelimited(KERN_WARNING
+ "Detected rseq node_id field corruption. Value: %u, expecting: %u (pid=%d).\n",
+ node_id, t->rseq_fields.node_id, t->pid);
+ if (mm_cid != t->rseq_fields.mm_cid)
+ printk_ratelimited(KERN_WARNING
+ "Detected rseq mm_cid field corruption. Value: %u, expecting: %u (pid=%d).\n",
+ mm_cid, t->rseq_fields.mm_cid, t->pid);
+
+ /* For now, only print a console warning on mismatch. */
+ return 0;
+
+efault_end:
+ user_read_access_end();
+efault:
+ return -EFAULT;
+}
+#else
+static int rseq_validate_ro_fields(struct task_struct *t)
+{
+ return 0;
+}
+#endif
+
/*
*
* Restartable sequences are a lightweight interface that allows
@@ -92,6 +142,11 @@ static int rseq_update_cpu_node_id(struct task_struct *t)
u32 node_id = cpu_to_node(cpu_id);
u32 mm_cid = task_mm_cid(t);
+ /*
+ * Validate read-only rseq fields.
+ */
+ if (rseq_validate_ro_fields(t))
+ goto efault;
WARN_ON_ONCE((int) mm_cid < 0);
if (!user_write_access_begin(rseq, t->rseq_len))
goto efault;
@@ -105,6 +160,13 @@ static int rseq_update_cpu_node_id(struct task_struct *t)
* t->rseq_len != ORIG_RSEQ_SIZE.
*/
user_write_access_end();
+#ifdef CONFIG_DEBUG_RSEQ
+ /* Save a copy of the values which are read-only into kernel-space. */
+ t->rseq_fields.cpu_id_start = cpu_id;
+ t->rseq_fields.cpu_id = cpu_id;
+ t->rseq_fields.node_id = node_id;
+ t->rseq_fields.mm_cid = mm_cid;
+#endif
trace_rseq_update(t);
return 0;
@@ -119,6 +181,11 @@ static int rseq_reset_rseq_cpu_node_id(struct task_struct *t)
u32 cpu_id_start = 0, cpu_id = RSEQ_CPU_ID_UNINITIALIZED, node_id = 0,
mm_cid = 0;
+ /*
+ * Validate read-only rseq fields.
+ */
+ if (!rseq_validate_ro_fields(t))
+ return -EFAULT;
/*
* Reset cpu_id_start to its initial state (0).
*/
@@ -141,6 +208,15 @@ static int rseq_reset_rseq_cpu_node_id(struct task_struct *t)
*/
if (put_user(mm_cid, &t->rseq->mm_cid))
return -EFAULT;
+#ifdef CONFIG_DEBUG_RSEQ
+ /*
+ * Reset the in-kernel rseq fields copy.
+ */
+ t->rseq_fields.cpu_id_start = cpu_id_start;
+ t->rseq_fields.cpu_id = cpu_id;
+ t->rseq_fields.node_id = node_id;
+ t->rseq_fields.mm_cid = mm_cid;
+#endif
/*
* Additional feature fields added after ORIG_RSEQ_SIZE
* need to be conditionally reset only if
@@ -423,6 +499,17 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
current->rseq = rseq;
current->rseq_len = rseq_len;
current->rseq_sig = sig;
+#ifdef CONFIG_DEBUG_RSEQ
+ /*
+ * Initialize the in-kernel rseq fields copy for validation of
+ * read-only fields.
+ */
+ if (get_user(current->rseq_fields.cpu_id_start, &rseq->cpu_id_start) ||
+ get_user(current->rseq_fields.cpu_id, &rseq->cpu_id) ||
+ get_user(current->rseq_fields.node_id, &rseq->node_id) ||
+ get_user(current->rseq_fields.mm_cid, &rseq->mm_cid))
+ return -EFAULT;
+#endif
/*
* If rseq was previously inactive, and has just been
* registered, ensure the cpu_id_start and cpu_id fields
--
2.39.5
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [RFC PATCH] rseq: Validate read-only fields under DEBUG_RSEQ config
2024-10-18 17:12 [RFC PATCH] rseq: Validate read-only fields under DEBUG_RSEQ config Mathieu Desnoyers
@ 2024-10-23 2:46 ` kernel test robot
0 siblings, 0 replies; 2+ messages in thread
From: kernel test robot @ 2024-10-23 2:46 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: llvm, oe-kbuild-all
Hi Mathieu,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:
[auto build test WARNING on tip/sched/core]
[also build test WARNING on akpm-mm/mm-everything linus/master v6.12-rc4 next-20241022]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/rseq-Validate-read-only-fields-under-DEBUG_RSEQ-config/20241019-011655
base: tip/sched/core
patch link: https://lore.kernel.org/r/20241018171219.256647-1-mathieu.desnoyers%40efficios.com
patch subject: [RFC PATCH] rseq: Validate read-only fields under DEBUG_RSEQ config
config: s390-allmodconfig (https://download.01.org/0day-ci/archive/20241023/202410231007.2hFp5Cwl-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 5886454669c3c9026f7f27eab13509dd0241f2d6)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241023/202410231007.2hFp5Cwl-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410231007.2hFp5Cwl-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from arch/s390/purgatory/purgatory.c:10:
In file included from include/linux/kexec.h:18:
In file included from include/linux/vmcore_info.h:6:
In file included from include/linux/elfcore.h:7:
In file included from include/linux/sched/task_stack.h:9:
>> include/linux/sched.h:1368:16: warning: field 'rseq_fields' with variable sized type 'struct rseq' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end]
1368 | struct rseq rseq_fields;
| ^
1 warning generated.
vim +1368 include/linux/sched.h
1357
1358 #ifdef CONFIG_RSEQ
1359 struct rseq __user *rseq;
1360 u32 rseq_len;
1361 u32 rseq_sig;
1362 /*
1363 * RmW on rseq_event_mask must be performed atomically
1364 * with respect to preemption.
1365 */
1366 unsigned long rseq_event_mask;
1367 # ifdef CONFIG_DEBUG_RSEQ
> 1368 struct rseq rseq_fields;
1369 # endif
1370 #endif
1371
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-10-23 2:47 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-18 17:12 [RFC PATCH] rseq: Validate read-only fields under DEBUG_RSEQ config Mathieu Desnoyers
2024-10-23 2:46 ` kernel test robot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.