From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC19C378803 for ; Tue, 24 Feb 2026 10:27:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771928853; cv=none; b=LRJnM+7qsorHCA91vYKX0yOGrogckP1w91Z3TB7DjzmsKMcg3XxeLDhNluBwUtDLbhuF3LdsRvlv7H1a6UH57+6G5+VIy9aGWdJDa7sLJYpkfIuy22gXs/fcVSwlZ/uaucPl8WWsKKzBQLx2EirAa8tno0s1KeGupVINW8jc2Mo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771928853; c=relaxed/simple; bh=02cidCa0ZedX8pWCiFYVLggGlOAGiaMrw63fCejLk9k=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=H6h1E2xMtFxl2YSb9xkUQ3zAhq6gxjtNmiu6DnRQzYYDtzz/NlAPYrnTG8kEKW/ob/ckdt8uiKbvR+csg6H4/teKytNNG9nNw0lohSiR4BD7PTyM8ob6bKCBQfIUi2sWdZJAvg2gRSp3mULQ4JvjHuLUI3j84vpuMjYbBVXXtvQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CtjHViGl; arc=none smtp.client-ip=209.85.128.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CtjHViGl" Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-4833115090dso51308195e9.3 for ; Tue, 24 Feb 2026 02:27:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1771928850; x=1772533650; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=VPR6QybfgnWA075p/NB/k8E0sisYao7MD8Dh8oeLfjs=; b=CtjHViGlKhhCbH7OZmA48y55f/bAZe3qN6l5m7FkRznvteBjh0S7FYNN4JAV9KZ8mR AS0WUozkPLcCdVF4bNqSgvi+oSu8qcI7JvnWPvaoaWApj0e59dLNwZTUXPEC74xaZxCG KY2cwXwK3/JLSX5Nbm2ZU5v4n8djKemNTcRGoF4EvgV7jUVmfm73NUCb2Y5kVP/A7lnD VibC+O3TEAfNeo5kGa9bzXRkGYYv2V7cKwMyFc56Oa94iPlA7n2VyShx/l3rDh/9rIjl gtL4FJ0A6myagOZZ0IvblXDjX7MXa4OKQozQzG4+efV1h3pGZkeERfCxCI9uQ1O4pmrA 2zrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771928850; x=1772533650; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=VPR6QybfgnWA075p/NB/k8E0sisYao7MD8Dh8oeLfjs=; b=A7sjk2aQf/3aWM8JdEanWNizNHX5IhYwGir7bCazBlP5cLUYjuNnwFYbilHN3T+Jqk fweSJpbsg6GvbUoTGZq8DY+hfbdm8lipyNaM6tFxbL/L8o9tZT8eVYzZmshBq51wa73S heFolIXpigTv9ByTORCUu34nij1NVVAAZratRcpMePZvidQ6d7djgTWQhRmVtyePhX4h KfjZGn4V5eiCS/YTn5URQCNfryLJpQR/r55tXs3XGJZxD7c65fylLc1zi83ZMVFHowra 6ILTQVbVkthCKTb3hNFofqZMceBYGnw4ioTHmJUnA19jGVQcpHOij4JokEw/G5Y6xNlE b91A== X-Forwarded-Encrypted: i=1; AJvYcCUg4YePs8eKHv0eweWn9G4gPBSHHNOvWyCv+0v1itV/+ppId10TseAwvbC4cUWk+l8viAWsdYMqZzFoVs4=@vger.kernel.org X-Gm-Message-State: AOJu0YyU6q+WzrUMxtBny7h/I3eoDTeJZ/J6eTcQdVDf13PXYFSINr6x rQ9naIL/U/SZRnDo5antC2qWEzHW8M3LorOK0p6xg/5znZVmFe0v+kFU X-Gm-Gg: AZuq6aLz6W/x9nmRSkhB0T7VKOhqg2RBpv/GKd6njDpUR2R3ZSjTYwrAZDCCfedxrB7 T8XCob+NaYMNbjoGeD8n0PkFLiwgEQ/6d3eiAI93hZmuIzfdgTuJQu8uk6B52MIKoTvCiHLDWCS 0PFWYbVIC9N90HBn7bzASr6JObbwPxFrjqhJD/Dx4w1wsSu0sUKT2NbVxuzJDBnWUh7fkMTMfX1 bUTskSPfM5tLIr3yZZNHIVSTnDCqIM+l950ZczWaJcqcv8wo5ojZYZyKSVYfxEon88apMk/LGz7 uL7zOsZnekvI8IuYdFC7Wrh2PvYa3i6FFfzWisvdXYRQ89L8WGMuP7k3F+ftyENImwI0u4FGlPk clopcOlhuM4c5PqCs3UdqREYpmY5iKavQNrHZoMGIBcOGEUy+XjY/YUtrsodH61zIscBD+f2xpG g82AmrL4RzGwdTvc6EO86TQRBMpcy3nCjBfKY2Rjxu5fqMDEUQPSYUdGfdY5t93nq5 X-Received: by 2002:a05:600c:4e02:b0:482:ef72:5781 with SMTP id 5b1f17b1804b1-483a95f5843mr204518575e9.25.1771928849898; Tue, 24 Feb 2026 02:27:29 -0800 (PST) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-483a9b75e51sm261884295e9.5.2026.02.24.02.27.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Feb 2026 02:27:29 -0800 (PST) Date: Tue, 24 Feb 2026 10:27:28 +0000 From: David Laight To: Peter Zijlstra Cc: Mathieu Desnoyers , linux-kernel@vger.kernel.org, Thomas Gleixner , Mark Rutland , cmarinas@kernel.org, maddy@linux.ibm.com, hca@linux.ibm.com, ryan.roberts@arm.com Subject: Re: [RFC] in-kernel rseq Message-ID: <20260224102728.1273c080@pumpkin> In-Reply-To: <20260223215436.GS1282955@noisy.programming.kicks-ass.net> References: <20260223163843.GR1282955@noisy.programming.kicks-ass.net> <20260223175357.481c161e@pumpkin> <20260223215436.GS1282955@noisy.programming.kicks-ass.net> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 23 Feb 2026 22:54:36 +0100 Peter Zijlstra wrote: > On Mon, Feb 23, 2026 at 01:22:18PM -0500, Mathieu Desnoyers wrote: > > > > I think it would be better as the address of the instruction after > > > the 'store'. > > > > That's indeed what we do for userspace rseq. > > Either works I suppose. The only think to be careful about is that you > must not restart once the store has happened. > > > > You probably don't need separate 'begin' and 'restart' addresses. > > > > It's not needed as long as the abort behavior is only restart. It > > becomes useful if another behavior is wanted on abort. But since > > this is kernel code and not ABI, it can change if the need arise. > > Right, didn't want to limit to restart, although that is what is used > here. > > > > It might be enough to save the 'restart' address and a byte length > > > directly in 'current', much simpler code. > > > > That would make it two stores to the task struct. Those would not be > > single-instruction, so we'd have to deal with preemption coming between > > those two stores. Also this would be more code: two stores compared > > to a single pointer store to the task struct to begin the critical > > section. AFAIU Peter's proposed approach is more efficient. > > Must indeed be a single store. Either we have it set in full, or we > don't. Not really, you can do two stores (to the task struct) provided you check the second one - remember the data is being looked at by the cpu that did the writes. > > We could turn the end address into a length if we want, this would > > make it more alike the userspace rseq ABI counterpart. > > I find 3 distinct addresses easier to fill out, but again it doesn't > matter. Actually if you save the end address you only need to check if the current %pc is less that that address, if it is you back it up to the start of the sequence. > > > > How much it helps is another matter. > > > I'm sure I remember something about per-cpu data being used for something > > > because it was faster then using 'current' - not sure of the context. > > > > The problem with per-cpu data for this is how to handle migration ? > > The whole point of this is to replace preempt disable. > > This; it cannot be a per-cpu address, if you need it to implement > per-cpu ops. Sorry yes, you are replacing a per-cpu data operation with a per-task one. But I'm sure I remember something where the opposite was done because it was unexpectedly faster to use per-cpu data. I'm not sure where arm gets 'current' from, x86 'has it easy' because of %fg and %gs. (If current is loaded from per-cpu data that might explain why using per-cpu data is faster.) That makes me think (a bad sign)... Are the compilers 'clever' enough to use %fs for current->member while current()->member uses a #define to get the actual address? preempt_disable() itself can be implemented using per-cpu or per-task data. I think it varies between architectures, not sure which asm uses. > > > The real problem with rseq is they don't scale. > > > > Not sure what you mean. They don't scale with respect to what ? > > He might be talking about forward progress instead of scaling. There are > indeed foward progress concerns with rseq -- as there are with trivial > LL/SC. But given the length of a slice vs the length of a rseq section, > this should be a non-issue. No scaling, in this case it is fine to add the rseq just before needing it. But if they have to be set in advance then you start getting a long list to check - I'm sure that must happen with userspace rseq? > > Doing the restart on interrupt would be a bigger issue. Although even > there I think that since the operations we're talking about are but a > few instructions, it should all just work well enough. > ... > > > I think that is just unlocked RMW of a per-cpu/thread variable. > > That's missing the point entirely. He might be stuck in x86_64 or > something. Not entirely, it doesn't matter if code is preempted between the read and write in preempt_disable() because that can only happen when the count is changing from 0 to 1. What does matter is that the 1 is written to the correct place. David