From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4994352C4E for ; Mon, 8 Jun 2026 22:20:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780957238; cv=none; b=kwtd2ZBoIDWMhK7fBdOMxZINeS/+5O+sNdhO12uk7+79X5AMHX4y56vRyrH+eblNN8tgBZJXBxfbsjbtkZN7fQkTYjtFLQ7Ko0sgFvivulegKWX7B5xRb4S81BKhURYsvYtWd9/8J78ZfhhnAw/17VYPj7qaigV1oB2iZRxLXTE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780957238; c=relaxed/simple; bh=UV71TKcXkOOQIBHtGc1Jyc/H+9NchiQfPc9N/tJHOFU=; h=From:To:Cc:Subject:In-Reply-To:MIME-Version:Content-Type: References:Date:Message-ID; b=rMWpmv6Rk6LlkwAopeFa+IoWPPT8MFiD1jEV4uRe6w4ydCXmz7Ko9WkTOsGSD1loOFlnoyyRY1CTFVf+8acQaJJGpvCTMZvLJZBV2jmEFZcodhnxgAlnw/+I5I8gkHuu0GMGhcaT3Sn6DYUScjkDW73W63o5ICl4d62hONKYI0M= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=f7JYR6oA; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="f7JYR6oA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B29761F00893; Mon, 8 Jun 2026 22:20:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780957237; bh=pj4zDASXAMVVqBLSsY3DUyz/qFy1Cgs1DkPKV67407I=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=f7JYR6oAy8KAeO+iTwXmQ9xwNVNMpDX5V/D+2ivFxaS4yfhM7h/SJxTGiUk8y0o1i 4QzgzwsNilI8AeZFnNPbDMY4jhyXgfI3uxYK9d4EbaFHTgHreAjjBTxeOvCXwTBgwA abVPZPI8ODcUEwsKJVv78z7kmG4CsdK/M5c1b8zw3MdHrxgGL8qQTgWS4AN+VFkBn5 1ex55s0XaDQUz+bzMlSLt2HAU6cLBgYWr+9J4PcJW6G6ahCJDcLSI/U3ZBrJWxsoya 7m85ka535QGgnZDxv+se17Egh5iyXou0smZvydULJERheKMmg5txtM1mHyw4gF5m5+ cM9iEDyd1HWfA== From: Thomas Gleixner To: Mathieu Desnoyers , Yuanhe Shu , Peter Zijlstra Cc: "Paul E . McKenney" , Boqun Feng , linux-kernel@vger.kernel.org, Michal Hocko , David Rientjes , Shakeel Butt , linux-mm@kvack.org Subject: Re: [PATCH] rseq: don't promote transient TLS faults to SIGSEGV In-Reply-To: <22819c09-8d8f-4448-801e-742282e50693@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain References: <20260608021553.1037128-1-xiangzao@linux.alibaba.com> <22819c09-8d8f-4448-801e-742282e50693@efficios.com> Date: Tue, 09 Jun 2026 00:20:33 +0200 Message-ID: <87cxy04qtq.ffs@fw13> On Mon, Jun 08 2026 at 08:52, Mathieu Desnoyers wrote: > On 2026-06-07 22:15, Yuanhe Shu wrote: >> With oom_score_adj=-1000 the OOM killer finds no killable task, so the >> rseq SIGSEGV is the sole outcome; otherwise the rseq SIGSEGV can be >> delivered before the OOM killer queues SIGKILL, and the process exits >> 139 instead of 137, breaking OOMKilled detection in container >> runtimes > As Peter and Thomas said, this is not transient. We simply cannot return > to userspace with an out-of-date value. > > It looks like an issue with the choice of which signal should be > delivered in priority: rseq force signal enqueues SIGSEGV, and you > would expect the OOM killer to issue SIGKILL, and somehow it's the > forced SIGSEGV that wins. > > Perhaps look into fixing that instead if you really care about which > signal is emitted ? (and that's a big _if_) It's even worse. The proposed patch is actually creating an endless loop unless there is really a signal pending at some point. exit_to_user() rseq_update_usr(); // faults and defers the fault handling to rseq_slowpath_update_usr() rseq_slowpath_update_usr() rseq_update_usr(); // Faults again and the fault cannot be resolved if (!fatal) // Proposed solution.... return; So if there is no signal queued, then this will end up in exit_to_user() again, which faults and defers the fault handling to rseq_slowpath_update_usr() again, which just goes on in circles. IOW, this would create an unpriviledged DoS attack - not a fatal one, but at least one which eats up a full time slice in the kernel forever. Use enough tasks, which register a rseq region and unregister it after returning to user space .... So no. And this comment in the patch does not make any sense at all: > + * rseq_update_usr() sets rseq_event::fatal only on corrupt > + * user data, which keeps its SIGSEGV. A clear fatal bit is an > + * unresolved page fault on a still-registered rseq area (e.g. > + * a CoW that cannot be charged to an OOM-locked memcg): that > + * is transient, so leave the cached IDs untouched and retry on > + * a later return to user instead of killing the task. If the page fault handler fails to wait until the OOM locked memcg figured out what to do, then that's a clear violation of expectation vs. resolving a page fault in the context of user/kernel shared memory with ABI constraints. But definitely not some transient failure which can be hand waved away. Not that it matters much whether the task dies from SIGSEGV or SIGKILL, but that's clearly not a problem which can be papered over in the rseq code. Thanks, tglx