* Re: [PATCH] rseq: don't promote transient TLS faults to SIGSEGV
[not found] ` <22819c09-8d8f-4448-801e-742282e50693@efficios.com>
@ 2026-06-08 22:20 ` Thomas Gleixner
0 siblings, 0 replies; only message in thread
From: Thomas Gleixner @ 2026-06-08 22:20 UTC (permalink / raw)
To: Mathieu Desnoyers, Yuanhe Shu, Peter Zijlstra
Cc: Paul E . McKenney, Boqun Feng, linux-kernel, Michal Hocko,
David Rientjes, Shakeel Butt, linux-mm
On Mon, Jun 08 2026 at 08:52, Mathieu Desnoyers wrote:
> On 2026-06-07 22:15, Yuanhe Shu wrote:
>> With oom_score_adj=-1000 the OOM killer finds no killable task, so the
>> rseq SIGSEGV is the sole outcome; otherwise the rseq SIGSEGV can be
>> delivered before the OOM killer queues SIGKILL, and the process exits
>> 139 instead of 137, breaking OOMKilled detection in container
>> runtimes
> As Peter and Thomas said, this is not transient. We simply cannot return
> to userspace with an out-of-date value.
>
> It looks like an issue with the choice of which signal should be
> delivered in priority: rseq force signal enqueues SIGSEGV, and you
> would expect the OOM killer to issue SIGKILL, and somehow it's the
> forced SIGSEGV that wins.
>
> Perhaps look into fixing that instead if you really care about which
> signal is emitted ? (and that's a big _if_)
It's even worse. The proposed patch is actually creating an endless
loop unless there is really a signal pending at some point.
exit_to_user()
rseq_update_usr(); // faults and defers the fault handling to rseq_slowpath_update_usr()
rseq_slowpath_update_usr()
rseq_update_usr(); // Faults again and the fault cannot be resolved
if (!fatal) // Proposed solution....
return;
So if there is no signal queued, then this will end up in exit_to_user()
again, which faults and defers the fault handling to
rseq_slowpath_update_usr() again, which just goes on in circles.
IOW, this would create an unpriviledged DoS attack - not a fatal one,
but at least one which eats up a full time slice in the kernel
forever. Use enough tasks, which register a rseq region and unregister it
after returning to user space ....
So no. And this comment in the patch does not make any sense at all:
> + * rseq_update_usr() sets rseq_event::fatal only on corrupt
> + * user data, which keeps its SIGSEGV. A clear fatal bit is an
> + * unresolved page fault on a still-registered rseq area (e.g.
> + * a CoW that cannot be charged to an OOM-locked memcg): that
> + * is transient, so leave the cached IDs untouched and retry on
> + * a later return to user instead of killing the task.
If the page fault handler fails to wait until the OOM locked memcg
figured out what to do, then that's a clear violation of expectation
vs. resolving a page fault in the context of user/kernel shared memory
with ABI constraints. But definitely not some transient failure which
can be hand waved away.
Not that it matters much whether the task dies from SIGSEGV or SIGKILL,
but that's clearly not a problem which can be papered over in the rseq
code.
Thanks,
tglx
^ permalink raw reply [flat|nested] only message in thread