All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: "Maciej W. Rozycki" <macro@orcam.me.uk>
Cc: Yury Norov <ynorov@nvidia.com>, Petr Tesarik <ptesarik@suse.com>,
	Yury Norov <yury.norov@gmail.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Richard Henderson <richard.henderson@linaro.org>,
	Matt Turner <mattst88@gmail.com>,
	Magnus Lindholm <linmag7@gmail.com>,
	Vineet Gupta <vgupta@kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Chris Zankel <chris@zankel.net>,
	Max Filippov <jcmvbkbc@gmail.com>,
	Patrik Jakobsson <patrik.r.jakobsson@gmail.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>,
	Robin Murphy <robin.murphy@arm.com>,
	Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Oliver Neukum <oliver@neukum.org>, Arnd Bergmann <arnd@arndb.de>,
	Kuan-Wei Chiu <visitorckw@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Marcel Holtmann <marcel@holtmann.org>,
	Johan Hedberg <johan.hedberg@gmail.com>,
	Luiz Augusto von Dentz <luiz.dentz@gmail.com>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Florian Westphal <fw@strlen.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 2/2] treewide, bits: use ffs_val() where it is open-coded
Date: Sun, 11 Jan 2026 23:57:45 +0000	[thread overview]
Message-ID: <20260111235745.53d16a59@pumpkin> (raw)
In-Reply-To: <alpine.DEB.2.21.2601111450500.30566@angie.orcam.me.uk>

On Sun, 11 Jan 2026 21:22:03 +0000 (GMT)
"Maciej W. Rozycki" <macro@orcam.me.uk> wrote:

> On Sun, 11 Jan 2026, David Laight wrote:
> 
> > >  It's 4 MIPS instructions and exactly the same machine code produced as 
> > > compiler output from `is_power_of_2'.  
> > 
> > The compiler must be detecting that x == (x & -x) is the same as
> > (x & (x - 1)) == 0.  
> 
>  There are 3 basic operations either way, so unless a CPU has an odd 
> instruction that combines any, there's no way to reduce the instruction 
> count.
> 
> > Although for MIPS the former is negate, and, beq(x,y) - so 3 instructions.
> > On x86 it is negate, and, cmp, beq(zero flag) - one extra instruction.
> > (The 4th MIPS instruction will be beq(syn,0).)  
> 
>  This code only ever runs on R2000/R3000 and R4000/R4400 processors, on 
> which the cost of an untaken branch is nil.  The cost of a taken branch on 
> the former ones is nil too, with their 4-stage pipelines, and while there 
> is a fixed penalty on the latter ones, the compiler should be able to 
> arrange code here such that at most one branch is taken anyway.
> 
> > So maybe s/Badly/In an unusual way/  
> 
>  What's unusual in this way of determining that exactly one bit is set in 
> a word?  It's as good as any I suppose.
> 
> > > If you know of a better alternative  
> > 
> > Doing is_power_of_2() with only one conditional branch would be a gain.
> > But I've not thought about it hard enough to find one.  
> 
>  Sure you can, either of:
> 
> 	(x != 0) & (x == (x & -x))
> 
> and:
> 
> 	(x != 0) & ((x & (x - 1)) == 0)
> 
> will do, but it's 5 instructions on MIPS, so no gain here.
> 
>  If you have a population count instruction such as CTPOP on EV67 Alpha, 
> you can do it in two instructions, such as:
> 
> 	ctpop	$16, $0
> 	cmpeq	$0, 1, $0
> 
> so there seems to be room for improvement here, but GCC stubbornly refuses 
> to produce this instruction sequence for this code:
> 
> bool is_power_of_2(unsigned long n)
> {
> 	return __builtin_popcountl(n) == 1;
> }
> 
> apparently owing to no insn cost set for the POPCOUNT RTX in the backend 
> causing the middle end to pick up some default.  Instead GCC comes up with 
> what can be coded in C as:
> 
> bool is_power_of_2(unsigned long n)
> {
> 	return n - 1 < (n ^ (n - 1));
> }

Not seen that one - and not thought of popcnt, far too new for me.
(I learnt asm for a pdp11 first...)

> which seems an interesting alternative that produces 3 instructions only 
> across Alpha, MIPS and RISC-V targets.  It's 5 instructions on POWER and 
> x86-64 vs 8 and 9 respectively with our current implementation.  There are 
> no branches produced here, although an inline expansion will likely end 
> with one.

I'm seeing:
is_power_of_2:
        leaq    -1(%rdi), %rax
        xorq    %rax, %rdi
        cmpq    %rdi, %rax
        setb    %al
        retq
on x86-64 - which is really 3 instructions.

While using the popcnt instruction would reduce it by one, it will only
be faster on amd zen cpu, on intel cpu popcnt has a latency of 3 and only
executes on p1 (according to Agner).

> 
>  Overall it seems a worthwhile improvement even if still an instruction 
> longer than necessary for EV67 Alpha (and also for POWER, which also has 
> a population count operation, and which GCC does pick up in this case).
> 
>  FWIW on VAX, notorious for the lack of conditional set operations, it's 7 
> instructions and 1 branch, down from 14 and 2 respectively.

I worked for ICL through the 1980s so missed out on VAX.
Picked up 8085, x86, sparc32 and m68k.

> > I suspect there may not be one, but all these 'tricks' come from the 1970s
> > (or earlier) and don't allow for the behaviour of modern cpu with multiple
> > execution units and long pipelines.  
> 
>  As we can see you never know.  I've sent a code update proposal.

I'd suggest moving is_power_of_2() to bits.h as:
#define is_power_of_2(n) ({ auto _n = n; _n - 1 < _n ^ (_n - 1); })
and add:
#define is_zero_or_power_of_2(n) ({ auto _n = n; _n & (_n - 1) != 0; })

> 
>  The use of the population count operation can be considered separately, 
> but we'd have to be careful here as the quality of code produced by the 
> intrinsic varies, e.g. with pre-EV67 Alpha, RISC-V and VAX (!) GCC emits 
> code equivalent to my proposal, but with MIPS or x86-64 it resorts to a 
> libcall, so a guarding condition would be required.  So I'd leave it for 
> another occasion.

I'd guess that many of those popcnt instructions aren't actually as fast
as the dec/xor pair.
But the libcall is just plain stupid.
I think that happens elsewhere - which is one reason __ffs() is x86 asm.

> 
>  Thanks for sparking this discussion overall!

Nice wet Sunday bikeshed...

	David

> 
>   Maciej
> 


  reply	other threads:[~2026-01-11 23:57 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-09 16:41 [RFC PATCH 0/2] Helper to isolate least-significant bit Petr Tesarik
2026-01-09 16:37 ` [RFC PATCH 1/2] bits: introduce ffs_val() Petr Tesarik
2026-01-09 17:01   ` Arnd Bergmann
2026-01-09 23:54     ` David Laight
2026-01-09 17:16   ` Yury Norov
2026-01-09 17:46     ` Petr Tesarik
2026-01-09 18:26       ` Yury Norov
2026-01-09 19:27         ` Petr Tesarik
2026-01-09 19:44           ` Yury Norov
2026-01-09 18:50       ` Arnd Bergmann
2026-01-10 10:50   ` David Laight
2026-01-12  8:15   ` Thomas Zimmermann
2026-01-12  8:58     ` Petr Tesarik
2026-01-12 11:22     ` David Laight
2026-01-13  1:40       ` Maciej W. Rozycki
2026-01-09 16:37 ` [RFC PATCH 2/2] treewide, bits: use ffs_val() where it is open-coded Petr Tesarik
2026-01-09 18:19   ` Yury Norov
2026-01-09 19:32     ` Petr Tesarik
2026-01-09 20:19       ` Yury Norov
2026-01-10 10:36     ` Maciej W. Rozycki
2026-01-10 11:54       ` David Laight
2026-01-11  3:15         ` Maciej W. Rozycki
2026-01-11 10:40           ` David Laight
2026-01-11 21:22             ` Maciej W. Rozycki
2026-01-11 23:57               ` David Laight [this message]
2026-01-12 11:21                 ` Maciej W. Rozycki
2026-01-12 13:23                   ` David Laight
2026-01-10 16:42       ` Kuan-Wei Chiu
2026-01-10 22:23         ` David Laight
2026-01-11 14:41           ` Kuan-Wei Chiu
2026-01-11 21:22             ` Maciej W. Rozycki
2026-01-09 17:20 ` [RFC PATCH 0/2] Helper to isolate least-significant bit Kuan-Wei Chiu
2026-01-09 18:59   ` Petr Tesarik
2026-01-09 19:26 ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260111235745.53d16a59@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=agordeev@linux.ibm.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=arnd@arndb.de \
    --cc=chris@zankel.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=geert@linux-m68k.org \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=johan.hedberg@gmail.com \
    --cc=joro@8bytes.org \
    --cc=kuba@kernel.org \
    --cc=linmag7@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luiz.dentz@gmail.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=macro@orcam.me.uk \
    --cc=maddy@linux.ibm.com \
    --cc=marcel@holtmann.org \
    --cc=mattst88@gmail.com \
    --cc=mpe@ellerman.id.au \
    --cc=mripard@kernel.org \
    --cc=oliver@neukum.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=patrik.r.jakobsson@gmail.com \
    --cc=ptesarik@suse.com \
    --cc=richard.henderson@linaro.org \
    --cc=robin.murphy@arm.com \
    --cc=simona@ffwll.ch \
    --cc=tsbogend@alpha.franken.de \
    --cc=tzimmermann@suse.de \
    --cc=vgupta@kernel.org \
    --cc=visitorckw@gmail.com \
    --cc=will@kernel.org \
    --cc=ynorov@nvidia.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.