From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4994352C4E
	for <linux-kernel@vger.kernel.org>; Mon,  8 Jun 2026 22:20:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780957238; cv=none; b=kwtd2ZBoIDWMhK7fBdOMxZINeS/+5O+sNdhO12uk7+79X5AMHX4y56vRyrH+eblNN8tgBZJXBxfbsjbtkZN7fQkTYjtFLQ7Ko0sgFvivulegKWX7B5xRb4S81BKhURYsvYtWd9/8J78ZfhhnAw/17VYPj7qaigV1oB2iZRxLXTE=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780957238; c=relaxed/simple;
	bh=UV71TKcXkOOQIBHtGc1Jyc/H+9NchiQfPc9N/tJHOFU=;
	h=From:To:Cc:Subject:In-Reply-To:MIME-Version:Content-Type:
	 References:Date:Message-ID; b=rMWpmv6Rk6LlkwAopeFa+IoWPPT8MFiD1jEV4uRe6w4ydCXmz7Ko9WkTOsGSD1loOFlnoyyRY1CTFVf+8acQaJJGpvCTMZvLJZBV2jmEFZcodhnxgAlnw/+I5I8gkHuu0GMGhcaT3Sn6DYUScjkDW73W63o5ICl4d62hONKYI0M=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=f7JYR6oA; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f7JYR6oA"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id B29761F00893;
	Mon,  8 Jun 2026 22:20:36 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1780957237;
	bh=pj4zDASXAMVVqBLSsY3DUyz/qFy1Cgs1DkPKV67407I=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date;
	b=f7JYR6oAy8KAeO+iTwXmQ9xwNVNMpDX5V/D+2ivFxaS4yfhM7h/SJxTGiUk8y0o1i
	 4QzgzwsNilI8AeZFnNPbDMY4jhyXgfI3uxYK9d4EbaFHTgHreAjjBTxeOvCXwTBgwA
	 abVPZPI8ODcUEwsKJVv78z7kmG4CsdK/M5c1b8zw3MdHrxgGL8qQTgWS4AN+VFkBn5
	 1ex55s0XaDQUz+bzMlSLt2HAU6cLBgYWr+9J4PcJW6G6ahCJDcLSI/U3ZBrJWxsoya
	 7m85ka535QGgnZDxv+se17Egh5iyXou0smZvydULJERheKMmg5txtM1mHyw4gF5m5+
	 cM9iEDyd1HWfA==
From: Thomas Gleixner <tglx@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Yuanhe Shu
 <xiangzao@linux.alibaba.com>, Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E . McKenney" <paulmck@kernel.org>, Boqun Feng <boqun@kernel.org>,
 linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.com>, David
 Rientjes <rientjes@google.com>, Shakeel Butt <shakeel.butt@linux.dev>,
 linux-mm@kvack.org
Subject: Re: [PATCH] rseq: don't promote transient TLS faults to SIGSEGV
In-Reply-To: <22819c09-8d8f-4448-801e-742282e50693@efficios.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain
References: <20260608021553.1037128-1-xiangzao@linux.alibaba.com>
 <22819c09-8d8f-4448-801e-742282e50693@efficios.com>
Date: Tue, 09 Jun 2026 00:20:33 +0200
Message-ID: <87cxy04qtq.ffs@fw13>

On Mon, Jun 08 2026 at 08:52, Mathieu Desnoyers wrote:
> On 2026-06-07 22:15, Yuanhe Shu wrote:
>> With oom_score_adj=-1000 the OOM killer finds no killable task, so the
>> rseq SIGSEGV is the sole outcome; otherwise the rseq SIGSEGV can be
>> delivered before the OOM killer queues SIGKILL, and the process exits
>> 139 instead of 137, breaking OOMKilled detection in container
>> runtimes
> As Peter and Thomas said, this is not transient. We simply cannot return
> to userspace with an out-of-date value.
>
> It looks like an issue with the choice of which signal should be
> delivered in priority: rseq force signal enqueues SIGSEGV, and you
> would expect the OOM killer to issue SIGKILL, and somehow it's the
> forced SIGSEGV that wins.
>
> Perhaps look into fixing that instead if you really care about which
> signal is emitted ? (and that's a big _if_)

It's even worse. The proposed patch is actually creating an endless
loop unless there is really a signal pending at some point.

exit_to_user()
   rseq_update_usr();  // faults and defers the fault handling to rseq_slowpath_update_usr()

rseq_slowpath_update_usr()
   rseq_update_usr();  // Faults again and the fault cannot be resolved

   if (!fatal)         // Proposed solution....
      return;
      
So if there is no signal queued, then this will end up in exit_to_user()
again, which faults and defers the fault handling to
rseq_slowpath_update_usr() again, which just goes on in circles.

IOW, this would create an unpriviledged DoS attack - not a fatal one,
but at least one which eats up a full time slice in the kernel
forever. Use enough tasks, which register a rseq region and unregister it
after returning to user space ....

So no. And this comment in the patch does not make any sense at all:

> +		 * rseq_update_usr() sets rseq_event::fatal only on corrupt
> +		 * user data, which keeps its SIGSEGV. A clear fatal bit is an
> +		 * unresolved page fault on a still-registered rseq area (e.g.
> +		 * a CoW that cannot be charged to an OOM-locked memcg): that
> +		 * is transient, so leave the cached IDs untouched and retry on
> +		 * a later return to user instead of killing the task.

If the page fault handler fails to wait until the OOM locked memcg
figured out what to do, then that's a clear violation of expectation
vs. resolving a page fault in the context of user/kernel shared memory
with ABI constraints. But definitely not some transient failure which
can be hand waved away.

Not that it matters much whether the task dies from SIGSEGV or SIGKILL,
but that's clearly not a problem which can be papered over in the rseq
code.

Thanks,

        tglx