From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D830C8303C for ; Sat, 5 Jul 2025 21:38:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D43B66B8074; Sat, 5 Jul 2025 17:38:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1B6D6B8067; Sat, 5 Jul 2025 17:38:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C58CF6B8074; Sat, 5 Jul 2025 17:38:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B66476B8067 for ; Sat, 5 Jul 2025 17:38:09 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DDAC1140F06 for ; Sat, 5 Jul 2025 21:38:08 +0000 (UTC) X-FDA: 83631524256.26.C760245 Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by imf18.hostedemail.com (Postfix) with ESMTP id 130E51C0011 for ; Sat, 5 Jul 2025 21:38:05 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; spf=pass (imf18.hostedemail.com: domain of segher@kernel.crashing.org designates 63.228.1.57 as permitted sender) smtp.mailfrom=segher@kernel.crashing.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751751487; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mZ0lsKMgODJyRyk9xL5lvypoh7UYScneWRMbFoGZh0A=; b=dvt5HVcWB8Uyund2TBQb7dO7Hueh0uizFHITb/PFx62oUBNXi+3XUI0lR3rgC48UDGTUC9 1TjYCRno4y7LoodOtnIGfdngexOjUWit0cPAS/4Zc5c0TM7DZEjwJYzBjlHAVcbtAvJJnt 5PTJ9B/YuGwNhliSlVRgoyTdAlEX3CM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf18.hostedemail.com: domain of segher@kernel.crashing.org designates 63.228.1.57 as permitted sender) smtp.mailfrom=segher@kernel.crashing.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751751487; a=rsa-sha256; cv=none; b=8hK4ge5U+vjVpHU7d8hEu9xQy/rzRS8rBmRRliB70dHWsNDFg/REU9mkcMOhhjGe+fVM1r VW8rnOkj2LDgJGjmXIDOrEnEl2e3c4AEQ8XSQXAFX6b68nYzTMHHEhpDjOtp34wNmMxmqi OsHnBmkpb0C8tPzXGb4AKnezv0QhaBM= Received: from gate.crashing.org (localhost [127.0.0.1]) by gate.crashing.org (8.18.1/8.18.1/Debian-2) with ESMTP id 565LbfR5180467; Sat, 5 Jul 2025 16:37:42 -0500 Received: (from segher@localhost) by gate.crashing.org (8.18.1/8.18.1/Submit) id 565Lbbv7180466; Sat, 5 Jul 2025 16:37:37 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Sat, 5 Jul 2025 16:37:37 -0500 From: Segher Boessenkool To: David Laight Cc: Christophe Leroy , Michael Ellerman , Nicholas Piggin , Naveen N Rao , Madhavan Srinivasan , Alexander Viro , Christian Brauner , Jan Kara , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Darren Hart , Davidlohr Bueso , Andre Almeida , Andrew Morton , Dave Hansen , Linus Torvalds , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 0/5] powerpc: Implement masked user access Message-ID: References: <20250624131714.GG17294@gate.crashing.org> <20250624175001.148a768f@pumpkin> <20250624182505.GH17294@gate.crashing.org> <20250624220816.078f960d@pumpkin> <83fb5685-a206-477c-bff3-03e0ebf4c40c@csgroup.eu> <20250626220148.GR17294@gate.crashing.org> <20250705193332.251e0b1f@pumpkin> <20250705220538.1bbe5195@pumpkin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250705220538.1bbe5195@pumpkin> X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 130E51C0011 X-Rspam-User: X-Stat-Signature: 149ry68j1ae1jw4bskuinsi9nbriytxh X-HE-Tag: 1751751485-898848 X-HE-Meta: U2FsdGVkX18gjQElpEMoYhrmJ+cenpbQQdzgqWdzgrFC/HlIINYqK3fHf22BeNrqKUCnZxz6/dq4tO5lZKfiFWoP/DIg8NJGQKT4qntn8QxF5+CF4RKKmnc6PpadMBx+XDgFIvHgE6BOt0QV2+Sa2pWK3HO+o1hIRQfW30XHIw2gcuohDYr7hpDCDMDL4zAA9cLAIpCRI/5/pdcZAKOD3UdSXnI8LxBNcmOyMqxEmNpisRaGAOJtivvcKwNWrBU9eCxW+DkuTY+uQomUpQmCFhXixySZxJ27J79nnvAc8eyXJWJxzrqpUa/Pf+tVjlVKcGHGrMCC5FrO+JWXuAYIzuPh/tcwBXSK/LPfpSovPooiqL22mtRz9k517Kj2Q5SVXvBxxcxXgx9J4oWjSMecGM015p9N86WL19BVPZ/b/4uudG8BOR4KZdsM8J3tmXMZh8PSD2BPc0K9wNpniwFZB+D3us2dpf/IqMwREW357LLXsZWIy4ok0bYbkohnHbZU7htzrnTa5hCeqK8KYeWQ4nPXcypvnL6ZAz6qzhHSVvRW3LqIHIHw0ea0jEiLrK74DTMIQLjCWhWAm/3rrh5zAR2YWOdV2PcPRBZWKHDVRZE5wyNDcHcevSaPmY8KBzKwedLpnuahk6gdRUYoUhlI9lpRq+QsUSv6dJ+g3IByvU8J4JLp0dxa/w7GY0EW8sqNbjoqngAsvItQrJyDl5MbK1BZa88U7n0ry3/eh0aPYcxwqnN2w+t7mXO3udj8ktY89TJ52dkd1HWeqoPxEL4CjMj/A5uDGdidH8SGjhpbzvOo3OCJPpLv9uO/4D5+JU03UGUG2ve2XlUGsbn402XAp5Ny4J6QhYhy+roAWV214T7b/LsQaqdBaLaSgQeclgBw02yPoahC/I35M2VrYyeMfeSBDdQZKd48zrcxMvm5pwRA99b0l2fVa208naybshSEvOErwwsB+209QTHBchu zqm02M7E XDqS8el/Dgs7mQjtQRFEF4mDRzqdhFQveYI1FUqCizpbgPoa4/NkamV1w5gHd0oGKAzwGdLt4YkYprYf8ComU6dgsQahAYzT421s7rMXC8/3Dn3v6+4CZ6wfDhqdLyEMCjd+JzrtnIOpTo1QW3EFohgJSlJmcCPTmyBx372B4jGicQ3vbD+DXT1pt1gxOqo9+5nnRP1rxbDbsuFShvp0hkZ2/Tc4b+Oalre3UBlH14uYA+vvbvwlHwX8cDqro0ydG5qz5UdLGePAr4yl8fFd6OZY/8A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi! On Sat, Jul 05, 2025 at 10:05:38PM +0100, David Laight wrote: > On Sat, 5 Jul 2025 15:15:57 -0500 > Segher Boessenkool wrote: > > ... > > The isel machine instruction is super expensive on p8: it is marked as > > first in an instruction group, and has latency 5 for the GPR sources, > > and 8 for the CR field source. > > > > On p7 it wasn't great either, it was actually converted to a branch > > sequence internally! > > Ugg... > > You'd think they'd add instructions that can be implemented. > It isn't as though isel is any harder than 'add with carry'. It is though! isel was the first instruction that takes both GPR inputs and a CR field input. We now have more, the ISA 3.0 (p9) setb insn, and esp. the extremely useful ISA 3.1 (p10) set[n]bc[r] insns -- well, those don't take any GPR inputs actually, but like isel their output is a GPR :-) > Not that uncommon, IIRC amd added adox/adcx (add carry using the > overflow/carry flag and without changing any other flags) as very We have a similar "addex" insn since p9, which allows to use the OV bit instead of the CA bit, and prepares to allow an extra three possible bits as carry bits, too. Using it you can run multiple carry chains in parallel using insns very close to the traditional stuff. The compiler still doesn't ever generate this, it is mostly useful for handcoded assembler routines. The carry bits are stored in the XER register, the "fixed-point exception register", while results from comparison instructions are stored in the CR, which holds eight four-bit CR fields, which you can use in conditional jumps, or in isel and the like, or in the crlogical insns (which can do any logic function on two CR field inputs and store in a third, just like the logical insns on GPRs that also have the full complement of 14 two source functions). > slow instructions. Intel invented them without making jcxz (dec %cx > and jump non-zero) fast - so you can't (easily) put them in a loop. > Not to mention all the AVX512 fubars. Sounds lovely :-) > Conditional move is more of a problem with a mips-like cpu where > alu ops read two registers and write a third. Like most Power implementations as well. > You don't want to do a conditional write because it messes up > the decision of whether to forward the alu result to the following > instruction. > So I think you might need to do 'cmov odd/even' and read the LSB > from a third copy (or third read port) of the registers indexed > by what would normally be the 'output' register number. > Then tweak the register numbers early in the pipeline so that the > result goes to one of the 'input' registers rather than the normal > 'output' one. > Not really that hard - could add to the cpu I did in 1/2 a day :-) On p9 and later both GPR (or constant) inputs are fed into the execution unit as well as some CR bit, and it just writes to a GPR. Easier for the hardware, easier for the compiler, and easier for the programmer! Win-win-win. The kind of tradeoffs I like best :-) Segher