* [RFC PATCH 0/2] Helper to isolate least-significant bit
@ 2026-01-09 16:37 Petr Tesarik
2026-01-09 19:11 ` Geert Uytterhoeven
0 siblings, 1 reply; 6+ messages in thread
From: Petr Tesarik @ 2026-01-09 16:37 UTC (permalink / raw)
To: Yury Norov, Rasmus Villemoes
Cc: Richard Henderson, Matt Turner, Magnus Lindholm, Vineet Gupta,
Geert Uytterhoeven, Maciej W. Rozycki, Thomas Bogendoerfer,
Madhavan Srinivasan, Michael Ellerman, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Chris Zankel, Max Filippov,
Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Robin Murphy,
Joerg Roedel, Will Deacon, Jakub Kicinski, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Oliver Neukum,
Arnd Bergmann, Kuan-Wei Chiu, Andrew Morton, Marcel Holtmann,
Johan Hedberg, Luiz Augusto von Dentz, Pablo Neira Ayuso,
Florian Westphal, linux-kernel, Petr Tesarik
Isolation of the least significant bit can be achieved with 3 basic
ALU operations which are already open-coded in various places in the
kernel.
However, since other places less efficient constructs, for example
`1UL << ffs(x)`, I assume the trick is known only to some authors, and
it's worth adding a helper to promote its use.
Petr Tesarik (2):
bits: introduce ffs_val()
treewide, bits: use ffs_val() where it is open-coded
arch/alpha/include/asm/bitops.h | 2 +-
arch/alpha/kernel/core_cia.c | 2 +-
arch/alpha/kernel/smp.c | 2 +-
arch/arc/include/asm/bitops.h | 2 +-
arch/m68k/include/asm/bitops.h | 12 +++++------
arch/mips/dec/ecc-berr.c | 2 +-
arch/mips/include/asm/bitops.h | 4 ++--
arch/mips/pci/pci-malta.c | 4 ++--
arch/powerpc/include/asm/bitops.h | 4 ++--
arch/powerpc/kvm/e500_mmu_host.c | 2 +-
arch/powerpc/lib/sstep.c | 2 +-
arch/powerpc/xmon/ppc-dis.c | 3 ++-
arch/powerpc/xmon/ppc-opc.c | 6 +++---
arch/s390/include/asm/bitops.h | 2 +-
arch/xtensa/include/asm/bitops.h | 6 +++---
arch/xtensa/kernel/traps.c | 2 +-
drivers/gpu/drm/gma500/psb_intel_sdvo.c | 2 +-
drivers/iommu/dma-iommu.c | 2 +-
drivers/net/ethernet/netronome/nfp/bpf/jit.c | 2 +-
drivers/net/usb/cdc_ncm.c | 4 ++--
include/asm-generic/div64.h | 4 ++--
include/linux/bitfield.h | 3 ++-
include/linux/bitops.h | 1 +
include/linux/ffs_val.h | 21 ++++++++++++++++++++
include/linux/log2.h | 2 +-
include/linux/min_heap.h | 5 +++--
lib/math/gcd.c | 4 ++--
lib/sort.c | 3 ++-
net/bluetooth/mgmt.c | 2 +-
net/netfilter/nft_set_pipapo.c | 2 +-
30 files changed, 70 insertions(+), 44 deletions(-)
create mode 100644 include/linux/ffs_val.h
--
2.52.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 0/2] Helper to isolate least-significant bit
2026-01-09 16:37 [RFC PATCH 0/2] Helper to isolate least-significant bit Petr Tesarik
@ 2026-01-09 19:11 ` Geert Uytterhoeven
0 siblings, 0 replies; 6+ messages in thread
From: Geert Uytterhoeven @ 2026-01-09 19:11 UTC (permalink / raw)
To: Petr Tesarik
Cc: Yury Norov, Rasmus Villemoes, Richard Henderson, Matt Turner,
Magnus Lindholm, Vineet Gupta, Maciej W. Rozycki,
Thomas Bogendoerfer, Madhavan Srinivasan, Michael Ellerman,
Heiko Carstens, Vasily Gorbik, Alexander Gordeev, Chris Zankel,
Max Filippov, Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Robin Murphy,
Joerg Roedel, Will Deacon, Jakub Kicinski, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Oliver Neukum,
Arnd Bergmann, Kuan-Wei Chiu, Andrew Morton, Marcel Holtmann,
Johan Hedberg, Luiz Augusto von Dentz, Pablo Neira Ayuso,
Florian Westphal, linux-kernel
Hi Petr,
On Fri, 9 Jan 2026 at 17:38, Petr Tesarik <ptesarik@suse.com> wrote:
> Isolation of the least significant bit can be achieved with 3 basic
least significant _set_ bit
> ALU operations which are already open-coded in various places in the
> kernel.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH 0/2] Helper to isolate least-significant bit
@ 2026-01-09 16:41 Petr Tesarik
2026-01-09 17:20 ` Kuan-Wei Chiu
2026-01-09 19:26 ` Andrew Cooper
0 siblings, 2 replies; 6+ messages in thread
From: Petr Tesarik @ 2026-01-09 16:41 UTC (permalink / raw)
To: Yury Norov, Rasmus Villemoes
Cc: Richard Henderson, Matt Turner, Magnus Lindholm, Vineet Gupta,
Geert Uytterhoeven, Maciej W. Rozycki, Thomas Bogendoerfer,
Madhavan Srinivasan, Michael Ellerman, Heiko Carstens,
Vasily Gorbik, Alexander Gordeev, Chris Zankel, Max Filippov,
Patrik Jakobsson, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Robin Murphy,
Joerg Roedel, Will Deacon, Jakub Kicinski, Andrew Lunn,
David S. Miller, Eric Dumazet, Paolo Abeni, Oliver Neukum,
Arnd Bergmann, Kuan-Wei Chiu, Andrew Morton, Marcel Holtmann,
Johan Hedberg, Luiz Augusto von Dentz, Pablo Neira Ayuso,
Florian Westphal, linux-kernel, Petr Tesarik
Isolation of the least significant bit can be achieved with 3 basic
ALU operations which are already open-coded in various places in the
kernel.
However, since other places less efficient constructs, for example
`1UL << ffs(x)`, I assume the trick is known only to some authors, and
it's worth adding a helper to promote its use.
Petr Tesarik (2):
bits: introduce ffs_val()
treewide, bits: use ffs_val() where it is open-coded
arch/alpha/include/asm/bitops.h | 2 +-
arch/alpha/kernel/core_cia.c | 2 +-
arch/alpha/kernel/smp.c | 2 +-
arch/arc/include/asm/bitops.h | 2 +-
arch/m68k/include/asm/bitops.h | 12 +++++------
arch/mips/dec/ecc-berr.c | 2 +-
arch/mips/include/asm/bitops.h | 4 ++--
arch/mips/pci/pci-malta.c | 4 ++--
arch/powerpc/include/asm/bitops.h | 4 ++--
arch/powerpc/kvm/e500_mmu_host.c | 2 +-
arch/powerpc/lib/sstep.c | 2 +-
arch/powerpc/xmon/ppc-dis.c | 3 ++-
arch/powerpc/xmon/ppc-opc.c | 6 +++---
arch/s390/include/asm/bitops.h | 2 +-
arch/xtensa/include/asm/bitops.h | 6 +++---
arch/xtensa/kernel/traps.c | 2 +-
drivers/gpu/drm/gma500/psb_intel_sdvo.c | 2 +-
drivers/iommu/dma-iommu.c | 2 +-
drivers/net/ethernet/netronome/nfp/bpf/jit.c | 2 +-
drivers/net/usb/cdc_ncm.c | 4 ++--
include/asm-generic/div64.h | 4 ++--
include/linux/bitfield.h | 3 ++-
include/linux/bitops.h | 1 +
include/linux/ffs_val.h | 21 ++++++++++++++++++++
include/linux/log2.h | 2 +-
include/linux/min_heap.h | 5 +++--
lib/math/gcd.c | 4 ++--
lib/sort.c | 3 ++-
net/bluetooth/mgmt.c | 2 +-
net/netfilter/nft_set_pipapo.c | 2 +-
30 files changed, 70 insertions(+), 44 deletions(-)
create mode 100644 include/linux/ffs_val.h
--
2.52.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 0/2] Helper to isolate least-significant bit
2026-01-09 16:41 Petr Tesarik
@ 2026-01-09 17:20 ` Kuan-Wei Chiu
2026-01-09 18:59 ` Petr Tesarik
2026-01-09 19:26 ` Andrew Cooper
1 sibling, 1 reply; 6+ messages in thread
From: Kuan-Wei Chiu @ 2026-01-09 17:20 UTC (permalink / raw)
To: Petr Tesarik
Cc: Yury Norov, Rasmus Villemoes, Richard Henderson, Matt Turner,
Magnus Lindholm, Vineet Gupta, Geert Uytterhoeven,
Maciej W. Rozycki, Thomas Bogendoerfer, Madhavan Srinivasan,
Michael Ellerman, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Chris Zankel, Max Filippov, Patrik Jakobsson,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Robin Murphy, Joerg Roedel, Will Deacon,
Jakub Kicinski, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Oliver Neukum, Arnd Bergmann, Andrew Morton,
Marcel Holtmann, Johan Hedberg, Luiz Augusto von Dentz,
Pablo Neira Ayuso, Florian Westphal, linux-kernel
Hi Petr,
On Fri, Jan 09, 2026 at 05:41:34PM +0100, Petr Tesarik wrote:
> Isolation of the least significant bit can be achieved with 3 basic
> ALU operations which are already open-coded in various places in the
> kernel.
>
> However, since other places less efficient constructs, for example
> `1UL << ffs(x)`, I assume the trick is known only to some authors, and
> it's worth adding a helper to promote its use.
Just out of curiosity, are there any existing users employing 1 <<
ffs(x) (or other inefficient variants) in performance-critical
hotpaths?
From a quick grep, I only found one instance in drivers/clk/ti/mux.c
matching the 1 << ffs(x) pattern. However, this doesn't appear to be a
bottleneck since it is followed by ti_clk_ll_ops->clk_writel(...). The
latency of the MMIO write would likely overshadow the savings of a few
ALU cycles.
Additionally, it seems that patch #2 focuses on replacing the x & -x
implementation with the new API, rather than converting inefficient
constructs like 1 << ffs(x) to use ffs_val().
Regards,
Kuan-Wei
>
> Petr Tesarik (2):
> bits: introduce ffs_val()
> treewide, bits: use ffs_val() where it is open-coded
>
> arch/alpha/include/asm/bitops.h | 2 +-
> arch/alpha/kernel/core_cia.c | 2 +-
> arch/alpha/kernel/smp.c | 2 +-
> arch/arc/include/asm/bitops.h | 2 +-
> arch/m68k/include/asm/bitops.h | 12 +++++------
> arch/mips/dec/ecc-berr.c | 2 +-
> arch/mips/include/asm/bitops.h | 4 ++--
> arch/mips/pci/pci-malta.c | 4 ++--
> arch/powerpc/include/asm/bitops.h | 4 ++--
> arch/powerpc/kvm/e500_mmu_host.c | 2 +-
> arch/powerpc/lib/sstep.c | 2 +-
> arch/powerpc/xmon/ppc-dis.c | 3 ++-
> arch/powerpc/xmon/ppc-opc.c | 6 +++---
> arch/s390/include/asm/bitops.h | 2 +-
> arch/xtensa/include/asm/bitops.h | 6 +++---
> arch/xtensa/kernel/traps.c | 2 +-
> drivers/gpu/drm/gma500/psb_intel_sdvo.c | 2 +-
> drivers/iommu/dma-iommu.c | 2 +-
> drivers/net/ethernet/netronome/nfp/bpf/jit.c | 2 +-
> drivers/net/usb/cdc_ncm.c | 4 ++--
> include/asm-generic/div64.h | 4 ++--
> include/linux/bitfield.h | 3 ++-
> include/linux/bitops.h | 1 +
> include/linux/ffs_val.h | 21 ++++++++++++++++++++
> include/linux/log2.h | 2 +-
> include/linux/min_heap.h | 5 +++--
> lib/math/gcd.c | 4 ++--
> lib/sort.c | 3 ++-
> net/bluetooth/mgmt.c | 2 +-
> net/netfilter/nft_set_pipapo.c | 2 +-
> 30 files changed, 70 insertions(+), 44 deletions(-)
> create mode 100644 include/linux/ffs_val.h
>
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 0/2] Helper to isolate least-significant bit
2026-01-09 17:20 ` Kuan-Wei Chiu
@ 2026-01-09 18:59 ` Petr Tesarik
0 siblings, 0 replies; 6+ messages in thread
From: Petr Tesarik @ 2026-01-09 18:59 UTC (permalink / raw)
To: Kuan-Wei Chiu
Cc: Yury Norov, Rasmus Villemoes, Richard Henderson, Matt Turner,
Magnus Lindholm, Vineet Gupta, Geert Uytterhoeven,
Maciej W. Rozycki, Thomas Bogendoerfer, Madhavan Srinivasan,
Michael Ellerman, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Chris Zankel, Max Filippov, Patrik Jakobsson,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, Robin Murphy, Joerg Roedel, Will Deacon,
Jakub Kicinski, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Oliver Neukum, Arnd Bergmann, Andrew Morton,
Marcel Holtmann, Johan Hedberg, Luiz Augusto von Dentz,
Pablo Neira Ayuso, Florian Westphal, linux-kernel
On Sat, 10 Jan 2026 01:20:59 +0800
Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
> Hi Petr,
>
> On Fri, Jan 09, 2026 at 05:41:34PM +0100, Petr Tesarik wrote:
> > Isolation of the least significant bit can be achieved with 3 basic
> > ALU operations which are already open-coded in various places in the
> > kernel.
> >
> > However, since other places less efficient constructs, for example
> > `1UL << ffs(x)`, I assume the trick is known only to some authors, and
> > it's worth adding a helper to promote its use.
>
> Just out of curiosity, are there any existing users employing 1 <<
> ffs(x) (or other inefficient variants) in performance-critical
> hotpaths?
>
> From a quick grep, I only found one instance in drivers/clk/ti/mux.c
> matching the 1 << ffs(x) pattern. However, this doesn't appear to be a
> bottleneck since it is followed by ti_clk_ll_ops->clk_writel(...). The
> latency of the MMIO write would likely overshadow the savings of a few
> ALU cycles.
Most expressions are a bit more complex, like this one in
page_cache_async_ra():
align = 1UL << min(ra->order, ffs(max_pages) - 1);
Or split across multiple lines, like this one in sata_down_spd_limit():
bit = ffs(mask) - 1;
mask = 1 << bit;
I agree that there is most likely no measurable performance win. The
resulting machine code merely looks quite silly on architectures
without an instruction to do ffs() and a little bit silly on
architectures where the instruction has a slightly different semantics
(bit position numbering and/or handling of zero value).
> Additionally, it seems that patch #2 focuses on replacing the x & -x
> implementation with the new API, rather than converting inefficient
> constructs like 1 << ffs(x) to use ffs_val().
This is correct. This patch series merely introduces the helper. Patch
2/2 was created mechanically. I haven't sent any follow-up yet to
replace inefficient use, because that would require more effort, and I'm
trying to get some early feedback first (that's why the series is
tagged RFC).
If we agree that it's worth the effort, and I split the patches by
subsystem, it may become a rather long series.
Petr T
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 0/2] Helper to isolate least-significant bit
2026-01-09 16:41 Petr Tesarik
2026-01-09 17:20 ` Kuan-Wei Chiu
@ 2026-01-09 19:26 ` Andrew Cooper
1 sibling, 0 replies; 6+ messages in thread
From: Andrew Cooper @ 2026-01-09 19:26 UTC (permalink / raw)
To: ptesarik
Cc: Andrew Cooper, agordeev, airlied, akpm, andrew+netdev, arnd,
chris, davem, edumazet, fw, geert, gor, hca, jcmvbkbc,
johan.hedberg, joro, kuba, linmag7, linux-kernel, linux,
luiz.dentz, maarten.lankhorst, macro, maddy, marcel, mattst88,
mpe, mripard, oliver, pabeni, pablo, patrik.r.jakobsson,
richard.henderson, robin.murphy, simona, tsbogend, tzimmermann,
vgupta, visitorckw, will, yury.norov
> diff --git a/arch/alpha/kernel/core_cia.c b/arch/alpha/kernel/core_cia.c
> index 6e577228e175d..b3c273c13c104 100644
> --- a/arch/alpha/kernel/core_cia.c
> +++ b/arch/alpha/kernel/core_cia.c
> @@ -1035,7 +1035,7 @@ cia_decode_ecc_error(struct el_CIA_sysdata_mcheck *cia, const char *msg)
> cia_decode_mem_error(cia, msg);
>
> syn = cia->cia_syn & 0xff;
> - if (syn == (syn & -syn)) {
> + if (syn == ffs_val(syn)) {
> fmt = KERN_CRIT " ECC syndrome %#x -- check bit %d\n";
> i = ffs(syn) - 1;
> } else {
This is a check for multiple bits set, which has a simpler expression to
calculate. (the ffs() in context shows that syn isn't expected to be 0.)
In Xen I added this helper while tiding things up, and it's applicable
to the more common pattern of checking hweight() of a bitmap against 1.
/*
* Calculate if a value has two or more bits set. Always use this in
* preference to an expression of the form 'hweight(x) > 1'.
*/
#define multiple_bits_set(x) \
({ \
typeof(x) _v = (x); \
(_v & (_v - 1)) != 0; \
})
~Andrew
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-09 19:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-09 16:37 [RFC PATCH 0/2] Helper to isolate least-significant bit Petr Tesarik
2026-01-09 19:11 ` Geert Uytterhoeven
-- strict thread matches above, loose matches on Subject: below --
2026-01-09 16:41 Petr Tesarik
2026-01-09 17:20 ` Kuan-Wei Chiu
2026-01-09 18:59 ` Petr Tesarik
2026-01-09 19:26 ` Andrew Cooper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox