From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D9E9263899 for ; Wed, 26 Nov 2025 20:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764190369; cv=none; b=VWJc5zcuCS3PuDhp3DmMgsa8rovWzSfxPOTzFczsBdzGu2d1FWa20n5iB8UgCa/vm3EXAebDkV0o/cf9YCDP6L4MQa2DRJPpEWF5bE2xTvuqRXLgefW3tB3LguPrtr/VYwbZ5chi/J+mrdvgScmytyXVsEt2xav+9AHGKG0D0gc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764190369; c=relaxed/simple; bh=vBRLzOz0e0AQAYBRl1bxCdthj8YRThIr/j/Q8RiyFFE=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=fOaFwfLV+CyaKhhPBLxp0PoIyEhDuNczzE0XTEssVoM5k9Dh7kBaJ13coFj2x+lq8TXdEr/b3+1vlTIJV9sysjwfB5rJ5b3LiWvMesCNbuxMuYY43PY76OTyMsqSnpeAYc1UHnI6C2Kh6QTecdzjfrD4tjpt1DMJtU2gZ9b3Js0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=jP9dtXF1; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Ku1lGTos; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="jP9dtXF1"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Ku1lGTos" From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1764190365; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=lubXpoxi2BXQuftdk6hzEjMz2WAufGE4oaJ496Gpvqo=; b=jP9dtXF1rNn1KoIcib3WkAvPMIGPuyJ2qqOYr7jg8gtqTuwhGRur2gXUl7N0jp28Pn5Jmm 0IdOYjneqSYvmzhDi+qr0cbLCbuhn9c/sUkzKdyryZ1TlA/ll8TnADPpAssQmfoovdzCKm /qlansF5WW05aRYX6D23yDYnf1Q/62mnqmkB2Z5SCPm8/tu2w8XGgkQQdfPVW4Flcd+sSX kwcoX1wb054kDkAnrACqHwucZyx3TOnzsUd8cFd0Up8Bw0C8+zB+T26fHP8XUv61448zGY Yd0bLws5awhXTbniMdZx2+AjnH5IwogHGxy/xb9LprmUIrZn2C3olEP42Fipjg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1764190365; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=lubXpoxi2BXQuftdk6hzEjMz2WAufGE4oaJ496Gpvqo=; b=Ku1lGTosFnAIPPHdYPLrCY3UDxANE/SqMqzPJ1LiyXLAnfdD6mReLeMRSfDelmCGbscsJ8 xsbxz9NNm5n7fQBg== To: Florian Weimer Cc: Kevin Brodsky , Dmitry Vyukov , mathieu.desnoyers@efficios.com, peterz@infradead.org, boqun.feng@gmail.com, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, aruna.ramakrishna@oracle.com, elver@google.com, "Paul E. McKenney" , x86@kernel.org, linux-kernel@vger.kernel.org, Jens Axboe Subject: Re: [PATCH v7 3/4] rseq: Make rseq work with protection keys In-Reply-To: References: <138c29bd5f5a0a22270c9384ecc721c40b7d8fbd.1747817128.git.dvyukov@google.com> <8079f564-cec0-45e4-857b-74b2e630a9d5@arm.com> <87ikexhbah.ffs@tglx> <87a508he4h.ffs@tglx> Date: Wed, 26 Nov 2025 21:52:44 +0100 Message-ID: <873460h5yb.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Wed, Nov 26 2025 at 20:06, Florian Weimer wrote: > * Thomas Gleixner: >> But like with signals just blindly enabling key0 and hope that it works >> is not really a solution. Nothing prevents me from disabling RSEQ for >> glibc. Then install my own RSEQ page and mprotect it. When that key >> becomes disabled in PKRU and the code section is interrupted then exit >> to user space will fault and die in exactly the same way as >> today. That's progress... > > But does that matter? If I mprotect the stack and a signal arrives, > that results in a crash, too. Some things just don't work. They can be made work when we have a dedicated permission setting for signals, which can be used for rseq access too. And having the explicit signal permissions make a lot of sense independent of the above absurd use case which I just used for illustration. >> So we really need to sit down and actually define a proper programming >> model first instead of trying to duct tape the current ill defined mess >> forever. >> >> What do we have to take into account: >> >> 1) signals >> >> Broken as we know already. >> >> IMO, the proper solution is to provide a mechanism to register a >> set of permissions which are used for signal delivery. The >> resulting hardware value should expand the permission, but keep >> the current active ones enabled. >> >> That can be kinda kept backwards compatible as the signal perms >> would default to PKEY0. > > I had validated at one point that this works (although the patch that > enables internal pkeys usage in glibc did not exist back then). > > pkeys: Support setting access rights for signal handlers > That looks about right and what I had in mind. Seems I missed that back in the days and that discussion unfortunately ran into a dead end :( >> 2) rseq >> >> The option of having a separate key which needs to be always >> enabled is definitely simple, but it wastes a key just for >> that. There are only 16 of them :( >> >> If we solve the signal case with an explicit permission set, we >> can just reuse those signal permissions. They are maybe wider than >> what's required to access RSEQ, but the signal permissions have to >> include the TLS/RSEQ area to actually work. > > Would it address the use case for single-colored memory access? Or > would that still crash if the process gets descheduled while the access > rights register is set to the restricted value? It would just work the same way as signals. Assume signal_perms = [PK0=RW, PK1=R, PK2=RW] set_pkey(PK0..6=NONE, PK7=R) access() <- can fault <- or interrupt can happen set_pkey(normal) So when the fault or interrupt results in a signal and/or the return to user space needs to access RSEQ we have in signal delivery: cur = pkey_extend(signal_perms); --> Perms are now [PK0=RW, PK1=R, PK2=RW, PK7=R] access_user_stack(); .... // Return with the extended permissions to deliver the signal // Will be restored on sigreturn and in rseq: cur = pkey_extend(signal_perms); --> Perms are now [PK0=RW, PK1=R, PK2=RW, PK7=R] access_user_rseq(); pkey_set(cur); If the RSEQ access is nested in the signal delivery return then nothing happens as the permissions are not changing because they are already extended: A | A = A :). The kernel does not care about the PKEY permissions when the user to kernel transition is due to an interrupt/exception except for the signal and rseq case. In fact the above also works with my made up example. Just assume the RSEQ page is protected by PK2. :) Syscalls are a different story as copy_to/from_user() obviously requires the proper permissions and the kernel can rightfully expect that stack and rseq are accessible, but that's not what we are debating here. Thanks, tglx