The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] rseq: don't promote transient TLS faults to SIGSEGV
@ 2026-06-08  2:15 Yuanhe Shu
  2026-06-08  8:29 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Yuanhe Shu @ 2026-06-08  2:15 UTC (permalink / raw)
  To: Mathieu Desnoyers, Peter Zijlstra
  Cc: Paul E . McKenney, Boqun Feng, Thomas Gleixner, linux-kernel,
	Yuanhe Shu

On return to user space the rseq slow path writes the new cpu_id /
mm_cid into the user-space rseq TLS. rseq_update_usr() already
classifies its failures in rseq_event::fatal: the flag is set only
when corrupt user data is positively identified (e.g. a bad rseq_cs
signature or an out-of-bounds abort IP) and stays clear when the
access merely hit an unresolved page fault.

rseq_slowpath_update_usr() ignores that and calls force_sig(SIGSEGV)
on any failure, so a transient page fault on a still-registered rseq
area becomes a fatal SIGSEGV. This is reachable since glibc >= 2.35
registers rseq for every thread by default: a memcg OOM victim can die
of SIGSEGV (si_code=SI_KERNEL, si_addr=NULL) shortly after fork,
before returning to user space, because the CoW of the inherited TLS
page cannot be charged to the OOM-locked memcg and the rseq write
faults.

With oom_score_adj=-1000 the OOM killer finds no killable task, so the
rseq SIGSEGV is the sole outcome; otherwise the rseq SIGSEGV can be
delivered before the OOM killer queues SIGKILL, and the process exits
139 instead of 137, breaking OOMKilled detection in container
runtimes. LTP mm/oom03 and mm/oom05 reproduce it on v7.1-rc6+, and a
strace A/B with glibc.pthread.rseq as the sole variable shows the
SIGSEGV only when rseq is registered.

Only raise SIGSEGV when rseq_event::fatal is set. A non-fatal fault
leaves the cached IDs untouched and is retried on a later return to
user; a genuinely unmapped area keeps faulting and user space takes
SIGSEGV through its own access. All corruption and ROP-hardening
checks keep their SIGSEGV.

Signal delivery is left untouched: it must abort the interrupted
critical section before the handler runs and therefore cannot safely
defer a fault.

Signed-off-by: Yuanhe Shu <xiangzao@linux.alibaba.com>
---
Tested on v7.1-rc6+ (vanilla):
 - LTP mm/oom03 (14/14) and mm/oom05 (8/8): pass with the patch (the
   victim is reaped with SIGKILL); without it the rseq SIGSEGV makes
   the same cases fail.
 - strace A/B on the oom03 binary with glibc.pthread.rseq as the sole
   variable: 2 SIGSEGV (SI_KERNEL, si_addr=NULL) with rseq registered,
   0 without -- isolates the cause to the rseq slow path.
 - tools/testing/selftests/rseq: run_param_test.sh,
   run_syscall_errors_test.sh, run_legacy_check.sh and
   run_timeslice_test.sh all pass.

 kernel/rseq.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/rseq.c b/kernel/rseq.c
index e75e3a5e312c..38a19cef4ad0 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -302,11 +302,18 @@ static void rseq_slowpath_update_usr(struct pt_regs *regs)
 
 	if (unlikely(!rseq_update_usr(t, regs, &ids))) {
 		/*
-		 * Clear the errors just in case this might survive magically, but
-		 * leave the rest intact.
+		 * rseq_update_usr() sets rseq_event::fatal only on corrupt
+		 * user data, which keeps its SIGSEGV. A clear fatal bit is an
+		 * unresolved page fault on a still-registered rseq area (e.g.
+		 * a CoW that cannot be charged to an OOM-locked memcg): that
+		 * is transient, so leave the cached IDs untouched and retry on
+		 * a later return to user instead of killing the task.
 		 */
+		bool fatal = t->rseq.event.fatal;
+
 		t->rseq.event.error = 0;
-		force_sig(SIGSEGV);
+		if (fatal)
+			force_sig(SIGSEGV);
 	}
 }
 
-- 
2.39.5 (Apple Git-154)



^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-08 22:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08  2:15 [PATCH] rseq: don't promote transient TLS faults to SIGSEGV Yuanhe Shu
2026-06-08  8:29 ` Peter Zijlstra
2026-06-08  9:15 ` Thomas Gleixner
2026-06-08 12:52 ` Mathieu Desnoyers
2026-06-08 22:20   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox