Re: [RFC PATCH 2/2] treewide, bits: use ffs_val() where it is open-coded

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: "Maciej W. Rozycki" <macro@orcam.me.uk>
Cc: Yury Norov <ynorov@nvidia.com>, Petr Tesarik <ptesarik@suse.com>,
	Yury Norov <yury.norov@gmail.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Richard Henderson <richard.henderson@linaro.org>,
	Matt Turner <mattst88@gmail.com>,
	Magnus Lindholm <linmag7@gmail.com>,
	Vineet Gupta <vgupta@kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Chris Zankel <chris@zankel.net>,
	Max Filippov <jcmvbkbc@gmail.com>,
	Patrik Jakobsson <patrik.r.jakobsson@gmail.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Oliver Neukum <oliver@neukum.org>, Arnd Bergmann <arnd@arndb.de>,
	Kuan-Wei Chiu <visitorckw@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Marcel Holtmann <marcel@holtmann.org>,
	Johan Hedberg <johan.hedberg@gmail.com>,
	Luiz Augusto von Dentz <luiz.dentz@gmail.com>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Florian Westphal <fw@strlen.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 2/2] treewide, bits: use ffs_val() where it is open-coded
Date: Sun, 11 Jan 2026 23:57:45 +0000	[thread overview]
Message-ID: <20260111235745.53d16a59@pumpkin> (raw)
In-Reply-To: <alpine.DEB.2.21.2601111450500.30566@angie.orcam.me.uk>

On Sun, 11 Jan 2026 21:22:03 +0000 (GMT)
"Maciej W. Rozycki" <macro@orcam.me.uk> wrote:

> On Sun, 11 Jan 2026, David Laight wrote:
> 
> > >  It's 4 MIPS instructions and exactly the same machine code produced as 
> > > compiler output from `is_power_of_2'.  
> > 
> > The compiler must be detecting that x == (x & -x) is the same as
> > (x & (x - 1)) == 0.  
> 
>  There are 3 basic operations either way, so unless a CPU has an odd 
> instruction that combines any, there's no way to reduce the instruction 
> count.
> 
> > Although for MIPS the former is negate, and, beq(x,y) - so 3 instructions.
> > On x86 it is negate, and, cmp, beq(zero flag) - one extra instruction.
> > (The 4th MIPS instruction will be beq(syn,0).)  
> 
>  This code only ever runs on R2000/R3000 and R4000/R4400 processors, on 
> which the cost of an untaken branch is nil.  The cost of a taken branch on 
> the former ones is nil too, with their 4-stage pipelines, and while there 
> is a fixed penalty on the latter ones, the compiler should be able to 
> arrange code here such that at most one branch is taken anyway.
> 
> > So maybe s/Badly/In an unusual way/  
> 
>  What's unusual in this way of determining that exactly one bit is set in 
> a word?  It's as good as any I suppose.
> 
> > > If you know of a better alternative  
> > 
> > Doing is_power_of_2() with only one conditional branch would be a gain.
> > But I've not thought about it hard enough to find one.  
> 
>  Sure you can, either of:
> 
> 	(x != 0) & (x == (x & -x))
> 
> and:
> 
> 	(x != 0) & ((x & (x - 1)) == 0)
> 
> will do, but it's 5 instructions on MIPS, so no gain here.
> 
>  If you have a population count instruction such as CTPOP on EV67 Alpha, 
> you can do it in two instructions, such as:
> 
> 	ctpop	$16, $0
> 	cmpeq	$0, 1, $0
> 
> so there seems to be room for improvement here, but GCC stubbornly refuses 
> to produce this instruction sequence for this code:
> 
> bool is_power_of_2(unsigned long n)
> {
> 	return __builtin_popcountl(n) == 1;
> }
> 
> apparently owing to no insn cost set for the POPCOUNT RTX in the backend 
> causing the middle end to pick up some default.  Instead GCC comes up with 
> what can be coded in C as:
> 
> bool is_power_of_2(unsigned long n)
> {
> 	return n - 1 < (n ^ (n - 1));
> }

Not seen that one - and not thought of popcnt, far too new for me.
(I learnt asm for a pdp11 first...)

> which seems an interesting alternative that produces 3 instructions only 
> across Alpha, MIPS and RISC-V targets.  It's 5 instructions on POWER and 
> x86-64 vs 8 and 9 respectively with our current implementation.  There are 
> no branches produced here, although an inline expansion will likely end 
> with one.

I'm seeing:
is_power_of_2:
        leaq    -1(%rdi), %rax
        xorq    %rax, %rdi
        cmpq    %rdi, %rax
        setb    %al
        retq
on x86-64 - which is really 3 instructions.

While using the popcnt instruction would reduce it by one, it will only
be faster on amd zen cpu, on intel cpu popcnt has a latency of 3 and only
executes on p1 (according to Agner).

> 
>  Overall it seems a worthwhile improvement even if still an instruction 
> longer than necessary for EV67 Alpha (and also for POWER, which also has 
> a population count operation, and which GCC does pick up in this case).
> 
>  FWIW on VAX, notorious for the lack of conditional set operations, it's 7 
> instructions and 1 branch, down from 14 and 2 respectively.

I worked for ICL through the 1980s so missed out on VAX.
Picked up 8085, x86, sparc32 and m68k.

> > I suspect there may not be one, but all these 'tricks' come from the 1970s
> > (or earlier) and don't allow for the behaviour of modern cpu with multiple
> > execution units and long pipelines.  
> 
>  As we can see you never know.  I've sent a code update proposal.

I'd suggest moving is_power_of_2() to bits.h as:
#define is_power_of_2(n) ({ auto _n = n; _n - 1 < _n ^ (_n - 1); })
and add:
#define is_zero_or_power_of_2(n) ({ auto _n = n; _n & (_n - 1) != 0; })

> 
>  The use of the population count operation can be considered separately, 
> but we'd have to be careful here as the quality of code produced by the 
> intrinsic varies, e.g. with pre-EV67 Alpha, RISC-V and VAX (!) GCC emits 
> code equivalent to my proposal, but with MIPS or x86-64 it resorts to a 
> libcall, so a guarding condition would be required.  So I'd leave it for 
> another occasion.

I'd guess that many of those popcnt instructions aren't actually as fast
as the dec/xor pair.
But the libcall is just plain stupid.
I think that happens elsewhere - which is one reason __ffs() is x86 asm.

> 
>  Thanks for sparking this discussion overall!

Nice wet Sunday bikeshed...

	David

> 
>   Maciej
>

next prev parent reply	other threads:[~2026-01-11 23:57 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-09 16:41 [RFC PATCH 0/2] Helper to isolate least-significant bit Petr Tesarik
2026-01-09 16:37 ` [RFC PATCH 1/2] bits: introduce ffs_val() Petr Tesarik
2026-01-09 17:01   ` Arnd Bergmann
2026-01-09 23:54     ` David Laight
2026-01-09 17:16   ` Yury Norov
2026-01-09 17:46     ` Petr Tesarik
2026-01-09 18:26       ` Yury Norov
2026-01-09 19:27         ` Petr Tesarik
2026-01-09 19:44           ` Yury Norov
2026-01-09 18:50       ` Arnd Bergmann
2026-01-10 10:50   ` David Laight
2026-01-12  8:15   ` Thomas Zimmermann
2026-01-12  8:58     ` Petr Tesarik
2026-01-12 11:22     ` David Laight
2026-01-13  1:40       ` Maciej W. Rozycki
2026-01-09 16:37 ` [RFC PATCH 2/2] treewide, bits: use ffs_val() where it is open-coded Petr Tesarik
2026-01-09 18:19   ` Yury Norov
2026-01-09 19:32     ` Petr Tesarik
2026-01-09 20:19       ` Yury Norov
2026-01-10 10:36     ` Maciej W. Rozycki
2026-01-10 11:54       ` David Laight
2026-01-11  3:15         ` Maciej W. Rozycki
2026-01-11 10:40           ` David Laight
2026-01-11 21:22             ` Maciej W. Rozycki
2026-01-11 23:57               ` David Laight [this message]
2026-01-12 11:21                 ` Maciej W. Rozycki
2026-01-12 13:23                   ` David Laight
2026-01-10 16:42       ` Kuan-Wei Chiu
2026-01-10 22:23         ` David Laight
2026-01-11 14:41           ` Kuan-Wei Chiu
2026-01-11 21:22             ` Maciej W. Rozycki
2026-01-09 17:20 ` [RFC PATCH 0/2] Helper to isolate least-significant bit Kuan-Wei Chiu
2026-01-09 18:59   ` Petr Tesarik
2026-01-09 19:26 ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260111235745.53d16a59@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=agordeev@linux.ibm.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=arnd@arndb.de \
    --cc=chris@zankel.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=geert@linux-m68k.org \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=johan.hedberg@gmail.com \
    --cc=joro@8bytes.org \
    --cc=kuba@kernel.org \
    --cc=linmag7@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luiz.dentz@gmail.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=macro@orcam.me.uk \
    --cc=maddy@linux.ibm.com \
    --cc=marcel@holtmann.org \
    --cc=mattst88@gmail.com \
    --cc=mpe@ellerman.id.au \
    --cc=mripard@kernel.org \
    --cc=oliver@neukum.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=patrik.r.jakobsson@gmail.com \
    --cc=ptesarik@suse.com \
    --cc=richard.henderson@linaro.org \
    --cc=robin.murphy@arm.com \
    --cc=simona@ffwll.ch \
    --cc=tsbogend@alpha.franken.de \
    --cc=tzimmermann@suse.de \
    --cc=vgupta@kernel.org \
    --cc=visitorckw@gmail.com \
    --cc=will@kernel.org \
    --cc=ynorov@nvidia.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox