* [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
@ 2024-06-20 17:56 Yury Norov
2024-06-20 17:56 ` [PATCH v4 01/40] " Yury Norov
` (8 more replies)
0 siblings, 9 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
To: linux-kernel, David S. Miller, H. Peter Anvin,
James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
Wenjia Zhang, Will Deacon, Yoshinori Sato,
GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
kvm, linux-arm-kernel, linux-arm-msm, linux-block,
linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
sparclinux, x86
Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
Rasmus Villemoes, Sergey Shtylyov
---
This v4 moves new API to separate headers, as adding stuff to find.h
concerns people, particularly Linus. It also adds few more conversions
alongside other cosmetic changes. See full changelog below.
---
Add helpers around test_and_{set,clear}_bit() to allow searching for
clear or set bits and flipping them atomically.
Using atomic search primitives allows to implement lockless bitmap
handling where only individual bits are touched by concurrent processes,
and where people now have to protect their bitmaps to search for a free
or set bit due to the lack of atomic searching routines.
The typical lock-protected bit allocation may look like this:
unsigned long alloc_bit()
{
unsigned long bit;
spin_lock(bitmap_lock);
bit = find_first_zero_bit(bitmap, nbits);
if (bit < nbits)
__set_bit(bit, bitmap);
spin_unlock(bitmap_lock);
return bit;
}
void free_bit(unsigned long bit)
{
spin_lock(bitmap_lock);
__clear_bit(bit, bitmap);
spin_unlock(bitmap_lock);
}
Now with atomic find_and_set_bit(), the above can be implemented
lockless, directly by using it and atomic clear_bit().
Patches 36-40 do this in few places in the kernel where the
transition is clear. There is likely more candidates for
refactoring.
The other important case is when people opencode atomic search
or atomic traverse on the maps with the patterns looking like:
for (idx = 0; idx < nbits; idx++)
if (test_and_clear_bit(idx, bitmap))
do_something(idx);
Or like this:
do {
bit = find_first_bit(bitmap, nbits);
if (bit >= nbits)
return nbits;
} while (!test_and_clear_bit(bit, bitmap));
return bit;
In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:
for_each_test_and_clear_bit(idx, bitmap, nbits)
do_something(idx);
Or:
return find_and_clear_bit(bitmap, nbits);
Obviously, the less routine code people have to write themself, the
less probability to make a mistake. The patch #33 fixes one such
mistake.
The new API is not only a handy helpers - it also resolves a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().
The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.
For example, find_first_bit() is implemented like:
for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
val = addr[idx];
if (val) {
sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
break;
}
}
On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().
When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is an undefined behaviour. This is a potentially
dangerous call.
find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.
Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.
In some cases concurrent operations with plain find_bit() are acceptable.
For example:
- two threads running find_*_bit(): safe wrt ffs(0) and returns correct
value, because underlying bitmap is unchanged;
- find_next_bit() in parallel with set or clear_bit(), when modifying
a bit prior to the start bit to search: safe and correct;
- find_first_bit() in parallel with set_bit(): safe, but may return wrong
bit number;
- find_first_zero_bit() in parallel with clear_bit(): same as above.
In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly the first) set or clear
bit, correspondingly.
In such cases, KCSAN may be safely silenced with data_race(). But in most
cases where KCSAN detects concurrency we should carefully review the code
and likely protect critical sections or switch to atomic find_and_bit(),
as appropriate.
This patch adds the following atomic primitives:
find_and_set_bit(addr, nbits);
find_and_set_next_bit(addr, nbits, start);
...
Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.
For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.
The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.
This series is not aimed on performance, but some performance
implications are considered.
In [1] Jan reported 2% slowdown in a single-thread search test when
switching find_bit() function to treat bitmaps as volatile arrays. On
the other hand, kernel robot in the same thread reported +3.7% to the
performance of will-it-scale.per_thread_ops test.
Assuming that our compilers are sane and generate better code against
properly annotated data, the above discrepancy doesn't look weird. When
running on non-volatile bitmaps, plain find_bit() outperforms atomic
find_and_bit(), and vice-versa.
So, all users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.
The 1st patch of this series adds atomic find_and_bit() API, 2nd adds
a basic test for new API, and all the following patches spread it over
the kernel.
[1] https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21660@alu.unizg.hr/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615
---
v1: https://lore.kernel.org/netdev/20231118155105.25678-29-yury.norov@gmail.com/T/
v2: https://lore.kernel.org/all/20231204185101.ddmkvsr2xxsmoh2u@quack3/T/
v3: https://lore.kernel.org/linux-pci/ZX4bIisLzpW8c4WM@yury-ThinkPad/T/
v4:
- drop patch v3-24: not needed after null_blk refactoring;
- add patch 34: "MIPS: sgi-ip27: optimize alloc_level()";
- add patch 35: "uprobes: optimize xol_take_insn_slot()";
- add patches 36-40: get rid of locking scheme around bitmaps;
- move new API to separate headers, to not bloat bitmap.h @ Linus;
- patch #1: adjust comments to allow returning >= @size;
- rebase the series on top of current master.
Yury Norov (40):
lib/find: add atomic find_bit() primitives
lib/find: add test for atomic find_bit() ops
lib/sbitmap; optimize __sbitmap_get_word() by using find_and_set_bit()
watch_queue: optimize post_one_notification() by using
find_and_clear_bit()
sched: add cpumask_find_and_set() and use it in __mm_cid_get()
mips: sgi-ip30: optimize heart_alloc_int() by using find_and_set_bit()
sparc: optimize alloc_msi() by using find_and_set_bit()
perf/arm: use atomic find_bit() API
drivers/perf: optimize ali_drw_get_counter_idx() by using
find_and_set_bit()
dmaengine: idxd: optimize perfmon_assign_event()
ath10k: optimize ath10k_snoc_napi_poll()
wifi: rtw88: optimize the driver by using atomic iterator
KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
PCI: hv: Optimize hv_get_dom_num() by using find_and_set_bit()
scsi: core: optimize scsi_evt_emit() by using an atomic iterator
scsi: mpi3mr: optimize the driver by using find_and_set_bit()
scsi: qedi: optimize qedi_get_task_idx() by using find_and_set_bit()
powerpc: optimize arch code by using atomic find_bit() API
iommu: optimize subsystem by using atomic find_bit() API
media: radio-shark: optimize the driver by using atomic find_bit() API
sfc: optimize the driver by using atomic find_bit() API
tty: nozomi: optimize interrupt_handler()
usb: cdc-acm: optimize acm_softint()
RDMA/rtrs: optimize __rtrs_get_permit() by using
find_and_set_bit_lock()
mISDN: optimize get_free_devid()
media: em28xx: cx231xx: optimize drivers by using find_and_set_bit()
ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
bluetooth: optimize cmtp_alloc_block_id()
net: smc: optimize smc_wr_tx_get_free_slot_index()
ALSA: use atomic find_bit() functions where applicable
m68k: optimize get_mmu_context()
microblaze: optimize get_mmu_context()
sh: mach-x3proto: optimize ilsel_enable()
MIPS: sgi-ip27: optimize alloc_level()
uprobes: optimize xol_take_insn_slot()
scsi: sr: drop locking around SR index bitmap
KVM: PPC: Book3s HV: drop locking around kvmppc_uvmem_bitmap
wifi: mac80211: drop locking around ntp_fltr_bmap
mailbox: bcm-flexrm: simplify locking scheme
powerpc/xive: drop locking around IRQ map
MAINTAINERS | 2 +
arch/m68k/include/asm/mmu_context.h | 12 +-
arch/microblaze/include/asm/mmu_context_mm.h | 12 +-
arch/mips/sgi-ip27/ip27-irq.c | 13 +-
arch/mips/sgi-ip30/ip30-irq.c | 13 +-
arch/powerpc/kvm/book3s_hv_uvmem.c | 33 +-
arch/powerpc/mm/book3s32/mmu_context.c | 11 +-
arch/powerpc/platforms/pasemi/dma_lib.c | 46 +--
arch/powerpc/platforms/powernv/pci-sriov.c | 13 +-
arch/powerpc/sysdev/xive/spapr.c | 34 +-
arch/sh/boards/mach-x3proto/ilsel.c | 5 +-
arch/sparc/kernel/pci_msi.c | 10 +-
arch/x86/kvm/hyperv.c | 41 +--
drivers/dma/idxd/perfmon.c | 9 +-
drivers/infiniband/ulp/rtrs/rtrs-clt.c | 16 +-
drivers/iommu/arm/arm-smmu/arm-smmu.h | 11 +-
drivers/iommu/msm_iommu.c | 19 +-
drivers/isdn/mISDN/core.c | 10 +-
drivers/mailbox/bcm-flexrm-mailbox.c | 21 +-
drivers/media/radio/radio-shark.c | 6 +-
drivers/media/radio/radio-shark2.c | 6 +-
drivers/media/usb/cx231xx/cx231xx-cards.c | 17 +-
drivers/media/usb/em28xx/em28xx-cards.c | 38 +--
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 +-
drivers/net/ethernet/rocker/rocker_ofdpa.c | 12 +-
drivers/net/ethernet/sfc/rx_common.c | 5 +-
drivers/net/ethernet/sfc/siena/rx_common.c | 5 +-
drivers/net/ethernet/sfc/siena/siena_sriov.c | 15 +-
drivers/net/wireless/ath/ath10k/snoc.c | 10 +-
drivers/net/wireless/realtek/rtw88/pci.c | 6 +-
drivers/net/wireless/realtek/rtw89/pci.c | 6 +-
drivers/pci/controller/pci-hyperv.c | 8 +-
drivers/perf/alibaba_uncore_drw_pmu.c | 11 +-
drivers/perf/arm-cci.c | 25 +-
drivers/perf/arm-ccn.c | 11 +-
drivers/perf/arm_dmc620_pmu.c | 10 +-
drivers/perf/arm_pmuv3.c | 9 +-
drivers/scsi/mpi3mr/mpi3mr_os.c | 22 +-
drivers/scsi/qedi/qedi_main.c | 10 +-
drivers/scsi/scsi_lib.c | 8 +-
drivers/scsi/sr.c | 15 +-
drivers/tty/nozomi.c | 6 +-
drivers/usb/class/cdc-acm.c | 6 +-
include/linux/cpumask_atomic.h | 20 ++
include/linux/find.h | 4 -
include/linux/find_atomic.h | 324 +++++++++++++++++++
kernel/events/uprobes.c | 15 +-
kernel/sched/sched.h | 15 +-
kernel/watch_queue.c | 7 +-
lib/find_bit.c | 86 +++++
lib/sbitmap.c | 47 +--
lib/test_bitmap.c | 62 ++++
net/bluetooth/cmtp/core.c | 11 +-
net/smc/smc_wr.c | 11 +-
sound/pci/hda/hda_codec.c | 8 +-
sound/usb/caiaq/audio.c | 14 +-
56 files changed, 747 insertions(+), 493 deletions(-)
create mode 100644 include/linux/cpumask_atomic.h
create mode 100644 include/linux/find_atomic.h
--
2.43.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 01/40] lib/find: add atomic find_bit() primitives
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
2024-06-20 17:56 ` [PATCH v4 02/40] lib/find: add test for atomic find_bit() ops Yury Norov
` (7 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
To: linux-kernel, David S. Miller, H. Peter Anvin,
James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
Wenjia Zhang, Will Deacon, Yoshinori Sato,
GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
kvm, linux-arm-kernel, linux-arm-msm, linux-block,
linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
sparclinux, x86
Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
Rasmus Villemoes, Sergey Shtylyov
Add helpers around test_and_{set,clear}_bit() to allow searching for
clear or set bits and flipping them atomically.
Using atomic search primitives allows to implement lockless bitmap
handling where only individual bits are touched by concurrent processes,
and where people have to protect their bitmaps to search for a free
or set bit due to the lack of atomic searching routines.
The typical locking routines may look like this:
unsigned long alloc_bit()
{
unsigned long bit;
spin_lock(bitmap_lock);
bit = find_first_zero_bit(bitmap, nbits);
if (bit < nbits)
__set_bit(bit, bitmap);
spin_unlock(bitmap_lock);
return bit;
}
void free_bit(unsigned long bit)
{
spin_lock(bitmap_lock);
__clear_bit(bit, bitmap);
spin_unlock(bitmap_lock);
}
Now with atomic find_and_set_bit(), the above can be implemented
lockless, directly by using it and atomic clear_bit().
Patches 36-40 do this in few places in the kernel where the
transition is clear. There is likely more candidates for
refactoring.
The other important case is when people opencode atomic search
or atomic traverse on the maps with the patterns looking like:
for (idx = 0; idx < nbits; idx++)
if (test_and_clear_bit(idx, bitmap))
do_something(idx);
Or like this:
do {
bit = find_first_bit(bitmap, nbits);
if (bit >= nbits)
return nbits;
} while (!test_and_clear_bit(bit, bitmap));
return bit;
In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:
for_each_test_and_clear_bit(idx, bitmap, nbits)
do_something(idx);
Or:
return find_and_clear_bit(bitmap, nbits);
Obviously, the less routine code people have to write themself, the
less probability to make a mistake.
The new API is not only a handy helpers - it also resolves a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().
The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.
For example, find_first_bit() is implemented like:
for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
val = addr[idx];
if (val) {
sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
break;
}
}
On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().
When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.
find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.
Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.
In some cases concurrent operations with plain find_bit() are acceptable.
For example:
- two threads running find_*_bit(): safe wrt ffs(0) and returns correct
value, because underlying bitmap is unchanged;
- find_next_bit() in parallel with set or clear_bit(), when modifying
a bit prior to the start bit to search: safe and correct;
- find_first_bit() in parallel with set_bit(): safe, but may return wrong
bit number;
- find_first_zero_bit() in parallel with clear_bit(): same as above.
In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly the first) set or clear
bit, correspondingly.
In such cases, KCSAN may be safely silenced with data_race(). But in most
cases where KCSAN detects concurrency we should carefully review their
code and likely protect critical sections or switch to atomic
find_and_bit(), as appropriate.
This patch adds the following atomic primitives:
find_and_set_bit(addr, nbits);
find_and_set_next_bit(addr, nbits, start);
...
Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.
For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.
The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.
CC: Bart Van Assche <bvanassche@acm.org>
CC: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
MAINTAINERS | 1 +
include/linux/find.h | 4 -
include/linux/find_atomic.h | 324 ++++++++++++++++++++++++++++++++++++
lib/find_bit.c | 86 ++++++++++
4 files changed, 411 insertions(+), 4 deletions(-)
create mode 100644 include/linux/find_atomic.h
diff --git a/MAINTAINERS b/MAINTAINERS
index b68c8b25bb93..54f37d4f33dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3730,6 +3730,7 @@ F: include/linux/bitmap-str.h
F: include/linux/bitmap.h
F: include/linux/bits.h
F: include/linux/cpumask.h
+F: include/linux/find_atomic.h
F: include/linux/find.h
F: include/linux/nodemask.h
F: include/vdso/bits.h
diff --git a/include/linux/find.h b/include/linux/find.h
index 5dfca4225fef..a855f82ab9ad 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -2,10 +2,6 @@
#ifndef __LINUX_FIND_H_
#define __LINUX_FIND_H_
-#ifndef __LINUX_BITMAP_H
-#error only <linux/bitmap.h> can be included directly
-#endif
-
#include <linux/bitops.h>
unsigned long _find_next_bit(const unsigned long *addr1, unsigned long nbits,
diff --git a/include/linux/find_atomic.h b/include/linux/find_atomic.h
new file mode 100644
index 000000000000..a9e238f88d0b
--- /dev/null
+++ b/include/linux/find_atomic.h
@@ -0,0 +1,324 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_FIND_ATOMIC_H_
+#define __LINUX_FIND_ATOMIC_H_
+
+#include <linux/bitops.h>
+#include <linux/find.h>
+
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr, unsigned long nbits,
+ unsigned long start);
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr, unsigned long nbits,
+ unsigned long start);
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr, unsigned long nbits,
+ unsigned long start);
+
+/**
+ * find_and_set_bit - Find a zero bit and set it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if >= @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+ if (small_const_nbits(nbits)) {
+ unsigned long val, ret;
+
+ do {
+ val = *addr | ~GENMASK(nbits - 1, 0);
+ if (val == ~0UL)
+ return nbits;
+ ret = ffz(val);
+ } while (test_and_set_bit(ret, addr));
+
+ return ret;
+ }
+
+ return _find_and_set_bit(addr, nbits);
+}
+
+
+/**
+ * find_and_set_next_bit - Find a zero bit and set it, starting from @offset
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap, starting from
+ * @offset. It's also not guaranteed that if >= @nbits is returned, the bitmap
+ * is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit(volatile unsigned long *addr,
+ unsigned long nbits, unsigned long offset)
+{
+ if (small_const_nbits(nbits)) {
+ unsigned long val, ret;
+
+ do {
+ val = *addr | ~GENMASK(nbits - 1, offset);
+ if (val == ~0UL)
+ return nbits;
+ ret = ffz(val);
+ } while (test_and_set_bit(ret, addr));
+
+ return ret;
+ }
+
+ return _find_and_set_next_bit(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap - find and set bit starting at @offset, wrapping around zero
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next clear bit, or first clear bit up to @offset,
+ * while atomically setting it. If no bits are found, returns >= @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap(volatile unsigned long *addr,
+ unsigned long nbits, unsigned long offset)
+{
+ unsigned long bit = find_and_set_next_bit(addr, nbits, offset);
+
+ if (bit < nbits || offset == 0)
+ return bit;
+
+ bit = find_and_set_bit(addr, offset);
+ return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_set_bit_lock - find a zero bit, then set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if >= @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+ if (small_const_nbits(nbits)) {
+ unsigned long val, ret;
+
+ do {
+ val = *addr | ~GENMASK(nbits - 1, 0);
+ if (val == ~0UL)
+ return nbits;
+ ret = ffz(val);
+ } while (test_and_set_bit_lock(ret, addr));
+
+ return ret;
+ }
+
+ return _find_and_set_bit_lock(addr, nbits);
+}
+
+/**
+ * find_and_set_next_bit_lock - find a zero bit and set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the range. It's also not
+ * guaranteed that if >= @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit_lock(volatile unsigned long *addr,
+ unsigned long nbits, unsigned long offset)
+{
+ if (small_const_nbits(nbits)) {
+ unsigned long val, ret;
+
+ do {
+ val = *addr | ~GENMASK(nbits - 1, offset);
+ if (val == ~0UL)
+ return nbits;
+ ret = ffz(val);
+ } while (test_and_set_bit_lock(ret, addr));
+
+ return ret;
+ }
+
+ return _find_and_set_next_bit_lock(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap_lock - find zero bit starting at @ofset and set it
+ * with lock, and wrap around zero if nothing found
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next set bit, or first set bit up to @offset
+ * If no bits are set, returns >= @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap_lock(volatile unsigned long *addr,
+ unsigned long nbits, unsigned long offset)
+{
+ unsigned long bit = find_and_set_next_bit_lock(addr, nbits, offset);
+
+ if (bit < nbits || offset == 0)
+ return bit;
+
+ bit = find_and_set_bit_lock(addr, offset);
+ return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_clear_bit - Find a set bit and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if >= @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or >= @nbits if no bits found
+ */
+static inline unsigned long find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+ if (small_const_nbits(nbits)) {
+ unsigned long val, ret;
+
+ do {
+ val = *addr & GENMASK(nbits - 1, 0);
+ if (val == 0)
+ return nbits;
+ ret = __ffs(val);
+ } while (!test_and_clear_bit(ret, addr));
+
+ return ret;
+ }
+
+ return _find_and_clear_bit(addr, nbits);
+}
+
+/**
+ * find_and_clear_next_bit - Find a set bit next after @offset, and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: bit offset at which to start searching
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the range It's also not
+ * guaranteed that if >= @nbits is returned, there's no set bits after @offset.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_clear_next_bit(volatile unsigned long *addr,
+ unsigned long nbits, unsigned long offset)
+{
+ if (small_const_nbits(nbits)) {
+ unsigned long val, ret;
+
+ do {
+ val = *addr & GENMASK(nbits - 1, offset);
+ if (val == 0)
+ return nbits;
+ ret = __ffs(val);
+ } while (!test_and_clear_bit(ret, addr));
+
+ return ret;
+ }
+
+ return _find_and_clear_next_bit(addr, nbits, offset);
+}
+
+/**
+ * __find_and_set_bit - Find a zero bit and set it non-atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ *
+ * A non-atomic version of find_and_set_bit() needed to help writing
+ * common-looking code where atomicity is provided externally.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long __find_and_set_bit(unsigned long *addr, unsigned long nbits)
+{
+ unsigned long bit;
+
+ bit = find_first_zero_bit(addr, nbits);
+ if (bit < nbits)
+ __set_bit(bit, addr);
+
+ return bit;
+}
+
+/* same as for_each_set_bit() but atomically clears each found bit */
+#define for_each_test_and_clear_bit(bit, addr, size) \
+ for ((bit) = 0; \
+ (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); \
+ (bit)++)
+
+/* same as for_each_set_bit_from() but atomically clears each found bit */
+#define for_each_test_and_clear_bit_from(bit, addr, size) \
+ for (; (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
+
+/* same as for_each_clear_bit() but atomically sets each found bit */
+#define for_each_test_and_set_bit(bit, addr, size) \
+ for ((bit) = 0; \
+ (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+ (bit)++)
+
+/* same as for_each_clear_bit_from() but atomically clears each found bit */
+#define for_each_test_and_set_bit_from(bit, addr, size) \
+ for (; \
+ (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+ (bit)++)
+
+#endif /* __LINUX_FIND_ATOMIC_H_ */
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 0836bb3d76c5..a322abd1e540 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -14,6 +14,7 @@
#include <linux/bitops.h>
#include <linux/bitmap.h>
+#include <linux/find_atomic.h>
#include <linux/export.h>
#include <linux/math.h>
#include <linux/minmax.h>
@@ -128,6 +129,91 @@ unsigned long _find_first_and_and_bit(const unsigned long *addr1,
}
EXPORT_SYMBOL(_find_first_and_and_bit);
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+ unsigned long bit;
+
+ do {
+ bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+ if (bit >= nbits)
+ return nbits;
+ } while (test_and_set_bit(bit, addr));
+
+ return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit);
+
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr,
+ unsigned long nbits, unsigned long start)
+{
+ unsigned long bit;
+
+ do {
+ bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+ if (bit >= nbits)
+ return nbits;
+ } while (test_and_set_bit(bit, addr));
+
+ return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit);
+
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+ unsigned long bit;
+
+ do {
+ bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+ if (bit >= nbits)
+ return nbits;
+ } while (test_and_set_bit_lock(bit, addr));
+
+ return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit_lock);
+
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr,
+ unsigned long nbits, unsigned long start)
+{
+ unsigned long bit;
+
+ do {
+ bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+ if (bit >= nbits)
+ return nbits;
+ } while (test_and_set_bit_lock(bit, addr));
+
+ return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit_lock);
+
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+ unsigned long bit;
+
+ do {
+ bit = FIND_FIRST_BIT(addr[idx], /* nop */, nbits);
+ if (bit >= nbits)
+ return nbits;
+ } while (!test_and_clear_bit(bit, addr));
+
+ return bit;
+}
+EXPORT_SYMBOL(_find_and_clear_bit);
+
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr,
+ unsigned long nbits, unsigned long start)
+{
+ do {
+ start = FIND_NEXT_BIT(addr[idx], /* nop */, nbits, start);
+ if (start >= nbits)
+ return nbits;
+ } while (!test_and_clear_bit(start, addr));
+
+ return start;
+}
+EXPORT_SYMBOL(_find_and_clear_next_bit);
+
#ifndef find_first_zero_bit
/*
* Find the first cleared bit in a memory region.
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 02/40] lib/find: add test for atomic find_bit() ops
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
2024-06-20 17:56 ` [PATCH v4 01/40] " Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
2024-06-20 17:56 ` [PATCH v4 21/40] sfc: optimize the driver by using atomic find_bit() API Yury Norov
` (6 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
To: linux-kernel, David S. Miller, H. Peter Anvin,
James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
Wenjia Zhang, Will Deacon, Yoshinori Sato,
GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
kvm, linux-arm-kernel, linux-arm-msm, linux-block,
linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
sparclinux, x86
Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
Rasmus Villemoes, Sergey Shtylyov
Add basic functionality test for new API.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
lib/test_bitmap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 65a75d58ed9e..405f79dd2266 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -6,6 +6,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/bitmap.h>
+#include <linux/find_atomic.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
@@ -221,6 +222,65 @@ static void __init test_zero_clear(void)
expect_eq_pbl("", bmap, 1024);
}
+static void __init test_find_and_bit(void)
+{
+ unsigned long w, w_part, bit, cnt = 0;
+ DECLARE_BITMAP(bmap, EXP1_IN_BITS);
+
+ /*
+ * Test find_and_clear{_next}_bit() and corresponding
+ * iterators
+ */
+ bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+ w = bitmap_weight(bmap, EXP1_IN_BITS);
+
+ for_each_test_and_clear_bit(bit, bmap, EXP1_IN_BITS)
+ cnt++;
+
+ expect_eq_uint(w, cnt);
+ expect_eq_uint(0, bitmap_weight(bmap, EXP1_IN_BITS));
+
+ bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+ w = bitmap_weight(bmap, EXP1_IN_BITS);
+ w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+
+ cnt = 0;
+ bit = EXP1_IN_BITS / 3;
+ for_each_test_and_clear_bit_from(bit, bmap, EXP1_IN_BITS)
+ cnt++;
+
+ expect_eq_uint(bitmap_weight(bmap, EXP1_IN_BITS), bitmap_weight(bmap, EXP1_IN_BITS / 3));
+ expect_eq_uint(w_part, bitmap_weight(bmap, EXP1_IN_BITS));
+ expect_eq_uint(w - w_part, cnt);
+
+ /*
+ * Test find_and_set{_next}_bit() and corresponding
+ * iterators
+ */
+ bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+ w = bitmap_weight(bmap, EXP1_IN_BITS);
+ cnt = 0;
+
+ for_each_test_and_set_bit(bit, bmap, EXP1_IN_BITS)
+ cnt++;
+
+ expect_eq_uint(EXP1_IN_BITS - w, cnt);
+ expect_eq_uint(EXP1_IN_BITS, bitmap_weight(bmap, EXP1_IN_BITS));
+
+ bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+ w = bitmap_weight(bmap, EXP1_IN_BITS);
+ w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+ cnt = 0;
+
+ bit = EXP1_IN_BITS / 3;
+ for_each_test_and_set_bit_from(bit, bmap, EXP1_IN_BITS)
+ cnt++;
+
+ expect_eq_uint(EXP1_IN_BITS - bitmap_weight(bmap, EXP1_IN_BITS),
+ EXP1_IN_BITS / 3 - bitmap_weight(bmap, EXP1_IN_BITS / 3));
+ expect_eq_uint(EXP1_IN_BITS * 2 / 3 - (w - w_part), cnt);
+}
+
static void __init test_find_nth_bit(void)
{
unsigned long b, bit, cnt = 0;
@@ -1482,6 +1542,8 @@ static void __init selftest(void)
test_for_each_clear_bitrange_from();
test_for_each_set_clump8();
test_for_each_set_bit_wrap();
+
+ test_find_and_bit();
}
KSTM_MODULE_LOADERS(test_bitmap);
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 21/40] sfc: optimize the driver by using atomic find_bit() API
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
2024-06-20 17:56 ` [PATCH v4 01/40] " Yury Norov
2024-06-20 17:56 ` [PATCH v4 02/40] lib/find: add test for atomic find_bit() ops Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
2024-06-20 17:56 ` [PATCH v4 25/40] mISDN: optimize get_free_devid() Yury Norov
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
To: linux-kernel, Edward Cree, Martin Habets, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
linux-net-drivers
Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
Rasmus Villemoes, Sergey Shtylyov
SFC code traverses rps_slot_map and rxq_retry_mask bit by bit. Simplify
it by using dedicated atomic find_bit() functions, as they skip already
clear bits.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
---
drivers/net/ethernet/sfc/rx_common.c | 5 ++---
drivers/net/ethernet/sfc/siena/rx_common.c | 5 ++---
drivers/net/ethernet/sfc/siena/siena_sriov.c | 15 +++++++--------
3 files changed, 11 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index dcd901eccfc8..370a2d20ccfb 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -9,6 +9,7 @@
*/
#include "net_driver.h"
+#include <linux/find_atomic.h>
#include <linux/module.h>
#include <linux/iommu.h>
#include <net/rps.h>
@@ -953,9 +954,7 @@ int efx_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
int rc;
/* find a free slot */
- for (slot_idx = 0; slot_idx < EFX_RPS_MAX_IN_FLIGHT; slot_idx++)
- if (!test_and_set_bit(slot_idx, &efx->rps_slot_map))
- break;
+ slot_idx = find_and_set_bit(&efx->rps_slot_map, EFX_RPS_MAX_IN_FLIGHT);
if (slot_idx >= EFX_RPS_MAX_IN_FLIGHT)
return -EBUSY;
diff --git a/drivers/net/ethernet/sfc/siena/rx_common.c b/drivers/net/ethernet/sfc/siena/rx_common.c
index 219fb358a646..fc1d4d02beb6 100644
--- a/drivers/net/ethernet/sfc/siena/rx_common.c
+++ b/drivers/net/ethernet/sfc/siena/rx_common.c
@@ -9,6 +9,7 @@
*/
#include "net_driver.h"
+#include <linux/find_atomic.h>
#include <linux/module.h>
#include <linux/iommu.h>
#include <net/rps.h>
@@ -959,9 +960,7 @@ int efx_siena_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
int rc;
/* find a free slot */
- for (slot_idx = 0; slot_idx < EFX_RPS_MAX_IN_FLIGHT; slot_idx++)
- if (!test_and_set_bit(slot_idx, &efx->rps_slot_map))
- break;
+ slot_idx = find_and_set_bit(&efx->rps_slot_map, EFX_RPS_MAX_IN_FLIGHT);
if (slot_idx >= EFX_RPS_MAX_IN_FLIGHT)
return -EBUSY;
diff --git a/drivers/net/ethernet/sfc/siena/siena_sriov.c b/drivers/net/ethernet/sfc/siena/siena_sriov.c
index 8353c15dc233..f643413f9c20 100644
--- a/drivers/net/ethernet/sfc/siena/siena_sriov.c
+++ b/drivers/net/ethernet/sfc/siena/siena_sriov.c
@@ -3,6 +3,7 @@
* Driver for Solarflare network controllers and boards
* Copyright 2010-2012 Solarflare Communications Inc.
*/
+#include <linux/find_atomic.h>
#include <linux/pci.h>
#include <linux/module.h>
#include "net_driver.h"
@@ -722,14 +723,12 @@ static int efx_vfdi_fini_all_queues(struct siena_vf *vf)
efx_vfdi_flush_wake(vf),
timeout);
rxqs_count = 0;
- for (index = 0; index < count; ++index) {
- if (test_and_clear_bit(index, vf->rxq_retry_mask)) {
- atomic_dec(&vf->rxq_retry_count);
- MCDI_SET_ARRAY_DWORD(
- inbuf, FLUSH_RX_QUEUES_IN_QID_OFST,
- rxqs_count, vf_offset + index);
- rxqs_count++;
- }
+ for_each_test_and_clear_bit(index, vf->rxq_retry_mask, count) {
+ atomic_dec(&vf->rxq_retry_count);
+ MCDI_SET_ARRAY_DWORD(
+ inbuf, FLUSH_RX_QUEUES_IN_QID_OFST,
+ rxqs_count, vf_offset + index);
+ rxqs_count++;
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 25/40] mISDN: optimize get_free_devid()
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
` (2 preceding siblings ...)
2024-06-20 17:56 ` [PATCH v4 21/40] sfc: optimize the driver by using atomic find_bit() API Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
2024-06-20 17:56 ` [PATCH v4 27/40] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
` (4 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
To: linux-kernel, Karsten Keil, netdev
Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
Rasmus Villemoes, Sergey Shtylyov
get_free_devid() traverses each bit in device_ids in an open-coded loop.
Simplify it by using the dedicated find_and_set_bit().
It makes the whole function a nice one-liner. And because MAX_DEVICE_ID
is a small constant-time value (63), on 64-bit platforms find_and_set_bit()
call will be optimized to:
test_and_set_bit(ffs());
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
drivers/isdn/mISDN/core.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/isdn/mISDN/core.c b/drivers/isdn/mISDN/core.c
index ab8513a7acd5..d499b193529a 100644
--- a/drivers/isdn/mISDN/core.c
+++ b/drivers/isdn/mISDN/core.c
@@ -3,6 +3,7 @@
* Copyright 2008 by Karsten Keil <kkeil@novell.com>
*/
+#include <linux/find_atomic.h>
#include <linux/slab.h>
#include <linux/types.h>
#include <linux/stddef.h>
@@ -197,14 +198,9 @@ get_mdevice_count(void)
static int
get_free_devid(void)
{
- u_int i;
+ int i = find_and_set_bit((u_long *)&device_ids, MAX_DEVICE_ID + 1);
- for (i = 0; i <= MAX_DEVICE_ID; i++)
- if (!test_and_set_bit(i, (u_long *)&device_ids))
- break;
- if (i > MAX_DEVICE_ID)
- return -EBUSY;
- return i;
+ return i <= MAX_DEVICE_ID ? i : -EBUSY;
}
int
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 27/40] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
` (3 preceding siblings ...)
2024-06-20 17:56 ` [PATCH v4 25/40] mISDN: optimize get_free_devid() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
2024-06-20 17:56 ` [PATCH v4 28/40] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
` (3 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
To: linux-kernel, Jiri Pirko, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev
Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
Rasmus Villemoes, Sergey Shtylyov
Optimize ofdpa_port_internal_vlan_id_get() by using find_and_set_bit(),
instead of polling every bit from bitmap in a for-loop.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
drivers/net/ethernet/rocker/rocker_ofdpa.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 826990459fa4..d8fe018001b9 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -6,6 +6,7 @@
* Copyright (c) 2014-2016 Jiri Pirko <jiri@mellanox.com>
*/
+#include <linux/find_atomic.h>
#include <linux/kernel.h>
#include <linux/types.h>
#include <linux/spinlock.h>
@@ -2249,14 +2250,11 @@ static __be16 ofdpa_port_internal_vlan_id_get(struct ofdpa_port *ofdpa_port,
found = entry;
hash_add(ofdpa->internal_vlan_tbl, &found->entry, found->ifindex);
- for (i = 0; i < OFDPA_N_INTERNAL_VLANS; i++) {
- if (test_and_set_bit(i, ofdpa->internal_vlan_bitmap))
- continue;
+ i = find_and_set_bit(ofdpa->internal_vlan_bitmap, OFDPA_N_INTERNAL_VLANS);
+ if (i < OFDPA_N_INTERNAL_VLANS)
found->vlan_id = htons(OFDPA_INTERNAL_VLAN_ID_BASE + i);
- goto found;
- }
-
- netdev_err(ofdpa_port->dev, "Out of internal VLAN IDs\n");
+ else
+ netdev_err(ofdpa_port->dev, "Out of internal VLAN IDs\n");
found:
found->ref_count++;
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 28/40] bluetooth: optimize cmtp_alloc_block_id()
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
` (4 preceding siblings ...)
2024-06-20 17:56 ` [PATCH v4 27/40] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
2024-06-20 17:56 ` [PATCH v4 29/40] net: smc: optimize smc_wr_tx_get_free_slot_index() Yury Norov
` (2 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
To: linux-kernel, Karsten Keil, Marcel Holtmann, Johan Hedberg,
Luiz Augusto von Dentz, Yury Norov, netdev, linux-bluetooth
Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
Sergey Shtylyov
Instead of polling every bit in blockids, use a dedicated
find_and_set_bit(), and make the function a simple one-liner.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
net/bluetooth/cmtp/core.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
index 90d130588a3e..06732cf2661b 100644
--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -22,6 +22,7 @@
#include <linux/module.h>
+#include <linux/find_atomic.h>
#include <linux/types.h>
#include <linux/errno.h>
#include <linux/kernel.h>
@@ -88,15 +89,9 @@ static void __cmtp_copy_session(struct cmtp_session *session, struct cmtp_connin
static inline int cmtp_alloc_block_id(struct cmtp_session *session)
{
- int i, id = -1;
+ int id = find_and_set_bit(&session->blockids, 16);
- for (i = 0; i < 16; i++)
- if (!test_and_set_bit(i, &session->blockids)) {
- id = i;
- break;
- }
-
- return id;
+ return id < 16 ? id : -1;
}
static inline void cmtp_free_block_id(struct cmtp_session *session, int id)
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 29/40] net: smc: optimize smc_wr_tx_get_free_slot_index()
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
` (5 preceding siblings ...)
2024-06-20 17:56 ` [PATCH v4 28/40] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
2024-06-20 17:57 ` [PATCH v4 38/40] wifi: mac80211: drop locking around ntp_fltr_bmap Yury Norov
2024-06-20 18:00 ` [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Linus Torvalds
8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
To: linux-kernel, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe,
Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, linux-s390, netdev
Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
Rasmus Villemoes, Sergey Shtylyov, Alexandra Winter
Simplify the function by using find_and_set_bit() and make it a simple
almost one-liner.
While here, drop explicit initialization of *idx, because it's already
initialized by the caller in case of ENOLINK, or set properly with
->wr_tx_mask, if nothing is found, in case of EBUSY.
CC: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
---
net/smc/smc_wr.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 0021065a600a..941c2434a021 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -23,6 +23,7 @@
*/
#include <linux/atomic.h>
+#include <linux/find_atomic.h>
#include <linux/hashtable.h>
#include <linux/wait.h>
#include <rdma/ib_verbs.h>
@@ -170,15 +171,11 @@ void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
{
- *idx = link->wr_tx_cnt;
if (!smc_link_sendable(link))
return -ENOLINK;
- for_each_clear_bit(*idx, link->wr_tx_mask, link->wr_tx_cnt) {
- if (!test_and_set_bit(*idx, link->wr_tx_mask))
- return 0;
- }
- *idx = link->wr_tx_cnt;
- return -EBUSY;
+
+ *idx = find_and_set_bit(link->wr_tx_mask, link->wr_tx_cnt);
+ return *idx < link->wr_tx_cnt ? 0 : -EBUSY;
}
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 38/40] wifi: mac80211: drop locking around ntp_fltr_bmap
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
` (6 preceding siblings ...)
2024-06-20 17:56 ` [PATCH v4 29/40] net: smc: optimize smc_wr_tx_get_free_slot_index() Yury Norov
@ 2024-06-20 17:57 ` Yury Norov
2024-06-20 18:00 ` [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Linus Torvalds
8 siblings, 0 replies; 14+ messages in thread
From: Yury Norov @ 2024-06-20 17:57 UTC (permalink / raw)
To: linux-kernel, Michael Chan, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev
Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
Rasmus Villemoes, Sergey Shtylyov
The driver operates on individual bits of the bitmap. Now that we have
atomic find_and_set_bit() helper, we can move the map manipulation out
of ntp_fltr_lock-protected area.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index c437ca1c0fd3..5f4c3449570d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -51,6 +51,7 @@
#include <linux/bitmap.h>
#include <linux/cpu_rmap.h>
#include <linux/cpumask.h>
+#include <linux/find_atomic.h>
#include <net/pkt_cls.h>
#include <net/page_pool/helpers.h>
#include <linux/align.h>
@@ -5616,17 +5617,16 @@ static int bnxt_init_l2_filter(struct bnxt *bp, struct bnxt_l2_filter *fltr,
struct bnxt_l2_key *key, u32 idx)
{
struct hlist_head *head;
+ int bit_id;
ether_addr_copy(fltr->l2_key.dst_mac_addr, key->dst_mac_addr);
fltr->l2_key.vlan = key->vlan;
fltr->base.type = BNXT_FLTR_TYPE_L2;
if (fltr->base.flags) {
- int bit_id;
-
- bit_id = bitmap_find_free_region(bp->ntp_fltr_bmap,
- bp->max_fltr, 0);
- if (bit_id < 0)
+ bit_id = find_and_set_bit(bp->ntp_fltr_bmap, bp->max_fltr);
+ if (bit_id >= bp->max_fltr)
return -ENOMEM;
+
fltr->base.sw_id = (u16)bit_id;
bp->ntp_fltr_count++;
}
@@ -14396,13 +14396,11 @@ int bnxt_insert_ntp_filter(struct bnxt *bp, struct bnxt_ntuple_filter *fltr,
struct hlist_head *head;
int bit_id;
- spin_lock_bh(&bp->ntp_fltr_lock);
- bit_id = bitmap_find_free_region(bp->ntp_fltr_bmap, bp->max_fltr, 0);
- if (bit_id < 0) {
- spin_unlock_bh(&bp->ntp_fltr_lock);
+ bit_id = find_and_set_bit(bp->ntp_fltr_bmap, bp->max_fltr);
+ if (bit_id >= bp->max_fltr)
return -ENOMEM;
- }
+ spin_lock_bh(&bp->ntp_fltr_lock);
fltr->base.sw_id = (u16)bit_id;
fltr->base.type = BNXT_FLTR_TYPE_NTUPLE;
fltr->base.flags |= BNXT_ACT_RING_DST;
--
2.43.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
` (7 preceding siblings ...)
2024-06-20 17:57 ` [PATCH v4 38/40] wifi: mac80211: drop locking around ntp_fltr_bmap Yury Norov
@ 2024-06-20 18:00 ` Linus Torvalds
2024-06-20 18:32 ` Yury Norov
8 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2024-06-20 18:00 UTC (permalink / raw)
To: Yury Norov
Cc: linux-kernel, David S. Miller, H. Peter Anvin,
James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
Wenjia Zhang, Will Deacon, Yoshinori Sato,
GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
kvm, linux-arm-kernel, linux-arm-msm, linux-block,
linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
Sergey Shtylyov
On Thu, 20 Jun 2024 at 10:57, Yury Norov <yury.norov@gmail.com> wrote:
>
>
> The typical lock-protected bit allocation may look like this:
If it looks like this, then nobody cares. Clearly the user in question
never actually cared about performance, and you SHOULD NOT then say
"let's optimize this that nobody cares about":.
Yury, I spend an inordinate amount of time just double-checking your
patches. I ended up having to basically undo one of them just days
ago.
New rule: before you send some optimization, you need to have NUMBERS.
Some kind of "look, this code is visible in profiles, so we actually care".
Because without numbers, I'm just not going to pull anything from you.
These insane inlines for things that don't matter need to stop.
And if they *DO* matter, you need to show that they matter.
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
2024-06-20 18:00 ` [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Linus Torvalds
@ 2024-06-20 18:32 ` Yury Norov
2024-06-20 19:26 ` Linus Torvalds
0 siblings, 1 reply; 14+ messages in thread
From: Yury Norov @ 2024-06-20 18:32 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, David S. Miller, H. Peter Anvin,
James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
Wenjia Zhang, Will Deacon, Yoshinori Sato,
GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
kvm, linux-arm-kernel, linux-arm-msm, linux-block,
linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
Sergey Shtylyov
On Thu, Jun 20, 2024 at 11:00:38AM -0700, Linus Torvalds wrote:
> On Thu, 20 Jun 2024 at 10:57, Yury Norov <yury.norov@gmail.com> wrote:
> >
> >
> > The typical lock-protected bit allocation may look like this:
>
> If it looks like this, then nobody cares. Clearly the user in question
> never actually cared about performance, and you SHOULD NOT then say
> "let's optimize this that nobody cares about":.
>
> Yury, I spend an inordinate amount of time just double-checking your
> patches. I ended up having to basically undo one of them just days
> ago.
Is that in master already? I didn't get any email, and I can't find
anything related in the master branch.
> New rule: before you send some optimization, you need to have NUMBERS.
I tried to underline that it's not a performance optimization at my
best. People notice some performance differences, but it's ~3%, no
more.
> Some kind of "look, this code is visible in profiles, so we actually care".
The original motivation comes from a KCSAN report, so it's already
visible in profiles. See [1] in cover letter. This series doesn't fix
that particular issue, but it adds tooling that allow people to search
and acquire bits in bitmaps without firing KCSAN warnings.
This series fixes one real bug in the codebase - see #33, and
simplifies bitmaps usage in many other places. Many people like
it, and acked the patches.
Again, this is NOT a performance series.
Thanks,
Yury
> Because without numbers, I'm just not going to pull anything from you.
> These insane inlines for things that don't matter need to stop.
>
> And if they *DO* matter, you need to show that they matter.
>
> Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
2024-06-20 18:32 ` Yury Norov
@ 2024-06-20 19:26 ` Linus Torvalds
2024-06-20 20:20 ` Yury Norov
0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2024-06-20 19:26 UTC (permalink / raw)
To: Yury Norov
Cc: linux-kernel, David S. Miller, H. Peter Anvin,
James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
Wenjia Zhang, Will Deacon, Yoshinori Sato,
GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
kvm, linux-arm-kernel, linux-arm-msm, linux-block,
linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
Sergey Shtylyov
On Thu, 20 Jun 2024 at 11:32, Yury Norov <yury.norov@gmail.com> wrote:
>
> Is that in master already? I didn't get any email, and I can't find
> anything related in the master branch.
It's 5d272dd1b343 ("cpumask: limit FORCE_NR_CPUS to just the UP case").
> > New rule: before you send some optimization, you need to have NUMBERS.
>
> I tried to underline that it's not a performance optimization at my
> best.
If it's not about performance, then it damn well shouldn't be 90%
inline functions in a header file.
If it's a helper function, it needs to be a real function elsewhere. Not this:
include/linux/find_atomic.h | 324 +++++++++++++++++++
because either performance really matters, in which case you need to
show profiles, or performance doesn't matter, in which case it damn
well shouldn't have special cases for small bitsets that double the
size of the code.
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
2024-06-20 19:26 ` Linus Torvalds
@ 2024-06-20 20:20 ` Yury Norov
2024-06-20 20:32 ` Linus Torvalds
0 siblings, 1 reply; 14+ messages in thread
From: Yury Norov @ 2024-06-20 20:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, David S. Miller, H. Peter Anvin,
James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
Wenjia Zhang, Will Deacon, Yoshinori Sato,
GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
kvm, linux-arm-kernel, linux-arm-msm, linux-block,
linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
Sergey Shtylyov
On Thu, Jun 20, 2024 at 12:26:18PM -0700, Linus Torvalds wrote:
> On Thu, 20 Jun 2024 at 11:32, Yury Norov <yury.norov@gmail.com> wrote:
> >
> > Is that in master already? I didn't get any email, and I can't find
> > anything related in the master branch.
>
> It's 5d272dd1b343 ("cpumask: limit FORCE_NR_CPUS to just the UP case").
FORCE_NR_CPUS helped to generate a better code for me back then. I'll
check again against the current kernel.
The 5d272dd1b343 is wrong. Limiting FORCE_NR_CPUS to UP case makes no
sense because in UP case nr_cpu_ids is already a compile-time macro:
#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
#define nr_cpu_ids ((unsigned int)NR_CPUS)
#else
extern unsigned int nr_cpu_ids;
#endif
I use FORCE_NR_CPUS for my Rpi. (used, until I burnt it)
> > > New rule: before you send some optimization, you need to have NUMBERS.
> >
> > I tried to underline that it's not a performance optimization at my
> > best.
>
> If it's not about performance, then it damn well shouldn't be 90%
> inline functions in a header file.
>
> If it's a helper function, it needs to be a real function elsewhere. Not this:
>
> include/linux/find_atomic.h | 324 +++++++++++++++++++
>
> because either performance really matters, in which case you need to
> show profiles, or performance doesn't matter, in which case it damn
> well shouldn't have special cases for small bitsets that double the
> size of the code.
This small_const_nbits() thing is a compile-time optimization for a
single-word bitmap with a compile-time length.
If the bitmap is longer, or nbits is not known at compile time, the
inline part goes away entirely at compile time.
In the other case, outline part goes away. So those converting from
find_bit() + test_and_set_bit() will see no new outline function
calls.
This inline + outline implementation is traditional for bitmaps, and
for some people it's important. For example, Sean Christopherson
explicitly asked to add a notice that converting to the new API will
still generate inline code. See patch #13.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
2024-06-20 20:20 ` Yury Norov
@ 2024-06-20 20:32 ` Linus Torvalds
0 siblings, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2024-06-20 20:32 UTC (permalink / raw)
To: Yury Norov
Cc: linux-kernel, David S. Miller, H. Peter Anvin,
James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
Wenjia Zhang, Will Deacon, Yoshinori Sato,
GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
kvm, linux-arm-kernel, linux-arm-msm, linux-block,
linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
Sergey Shtylyov
On Thu, 20 Jun 2024 at 13:20, Yury Norov <yury.norov@gmail.com> wrote:
>
> FORCE_NR_CPUS helped to generate a better code for me back then. I'll
> check again against the current kernel.
Of _course_ it generates better code.
But when "better code" is a source of bugs, and isn't actually useful
in general, it's not better, is it.
> The 5d272dd1b343 is wrong. Limiting FORCE_NR_CPUS to UP case makes no
> sense because in UP case nr_cpu_ids is already a compile-time macro:
Yury, I'm very aware. That was obviously intentional. the whole point
of the commit is to just disable the the whole thing as useless and
problematic.
I could have just ripped it out entirely. I ended up doing a one-liner instead.
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-06-20 20:32 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
2024-06-20 17:56 ` [PATCH v4 01/40] " Yury Norov
2024-06-20 17:56 ` [PATCH v4 02/40] lib/find: add test for atomic find_bit() ops Yury Norov
2024-06-20 17:56 ` [PATCH v4 21/40] sfc: optimize the driver by using atomic find_bit() API Yury Norov
2024-06-20 17:56 ` [PATCH v4 25/40] mISDN: optimize get_free_devid() Yury Norov
2024-06-20 17:56 ` [PATCH v4 27/40] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
2024-06-20 17:56 ` [PATCH v4 28/40] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
2024-06-20 17:56 ` [PATCH v4 29/40] net: smc: optimize smc_wr_tx_get_free_slot_index() Yury Norov
2024-06-20 17:57 ` [PATCH v4 38/40] wifi: mac80211: drop locking around ntp_fltr_bmap Yury Norov
2024-06-20 18:00 ` [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Linus Torvalds
2024-06-20 18:32 ` Yury Norov
2024-06-20 19:26 ` Linus Torvalds
2024-06-20 20:20 ` Yury Norov
2024-06-20 20:32 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).