public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
@ 2024-06-20 17:56 Yury Norov
  2024-06-20 17:56 ` [PATCH v4 01/40] " Yury Norov
                   ` (40 more replies)
  0 siblings, 41 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

--- 

This v4 moves new API to separate headers, as adding stuff to find.h
concerns people, particularly Linus. It also adds few more conversions
alongside other cosmetic changes. See full changelog below.

---

Add helpers around test_and_{set,clear}_bit() to allow searching for
clear or set bits and flipping them atomically.

Using atomic search primitives allows to implement lockless bitmap
handling where only individual bits are touched by concurrent processes,
and where people now have to protect their bitmaps to search for a free
or set bit due to the lack of atomic searching routines.

The typical lock-protected bit allocation may look like this:

	unsigned long alloc_bit()
	{
		unsigned long bit;

		spin_lock(bitmap_lock);
		bit = find_first_zero_bit(bitmap, nbits);
		if (bit < nbits)
			__set_bit(bit, bitmap);
		spin_unlock(bitmap_lock);

		return bit;
	}

	void free_bit(unsigned long bit)
	{
		spin_lock(bitmap_lock);
		__clear_bit(bit, bitmap);
		spin_unlock(bitmap_lock);
	}

Now with atomic find_and_set_bit(), the above can be implemented
lockless, directly by using it and atomic clear_bit().

Patches 36-40 do this in few places in the kernel where the
transition is clear. There is likely more candidates for
refactoring.

The other important case is when people opencode atomic search
or atomic traverse on the maps with the patterns looking like:

	for (idx = 0; idx < nbits; idx++)
		if (test_and_clear_bit(idx, bitmap))
			do_something(idx);

Or like this:

	do {
		bit = find_first_bit(bitmap, nbits);
		if (bit >= nbits)
			return nbits;

	} while (!test_and_clear_bit(bit, bitmap));

	return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

	for_each_test_and_clear_bit(idx, bitmap, nbits)
		do_something(idx);

Or:
	return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have to write themself, the
less probability to make a mistake. The patch #33 fixes one such
mistake.

The new API is not only a handy helpers - it also resolves a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like:

      for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
              val = addr[idx];
              if (val) {
                      sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
                      break;
              }
      }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is an undefined behaviour. This is a potentially
dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

In some cases concurrent operations with plain find_bit() are acceptable.
For example:

 - two threads running find_*_bit(): safe wrt ffs(0) and returns correct
   value, because underlying bitmap is unchanged;
 - find_next_bit() in parallel with set or clear_bit(), when modifying
   a bit prior to the start bit to search: safe and correct;
 - find_first_bit() in parallel with set_bit(): safe, but may return wrong
   bit number;
 - find_first_zero_bit() in parallel with clear_bit(): same as above.

In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly the first) set or clear
bit, correspondingly.

In such cases, KCSAN may be safely silenced with data_race(). But in most
cases where KCSAN detects concurrency we should carefully review the code
and likely protect critical sections or switch to atomic find_and_bit(),
as appropriate.

This patch adds the following atomic primitives:

	find_and_set_bit(addr, nbits);
	find_and_set_next_bit(addr, nbits, start);
	...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

This series is not aimed on performance, but some performance
implications are considered.

In [1] Jan reported 2% slowdown in a single-thread search test when
switching find_bit() function to treat bitmaps as volatile arrays. On
the other hand, kernel robot in the same thread reported +3.7% to the
performance of will-it-scale.per_thread_ops test.

Assuming that our compilers are sane and generate better code against
properly annotated data, the above discrepancy doesn't look weird. When
running on non-volatile bitmaps, plain find_bit() outperforms atomic
find_and_bit(), and vice-versa.

So, all users of find_bit() API, where heavy concurrency is expected,
are encouraged to switch to atomic find_and_bit() as appropriate.

The 1st patch of this series adds atomic find_and_bit() API, 2nd adds
a basic test for new API, and all the following patches spread it over
the kernel.

[1] https://lore.kernel.org/lkml/634f5fdf-e236-42cf-be8d-48a581c21660@alu.unizg.hr/T/#m3e7341eb3571753f3acf8fe166f3fb5b2c12e615

---
v1: https://lore.kernel.org/netdev/20231118155105.25678-29-yury.norov@gmail.com/T/
v2: https://lore.kernel.org/all/20231204185101.ddmkvsr2xxsmoh2u@quack3/T/
v3: https://lore.kernel.org/linux-pci/ZX4bIisLzpW8c4WM@yury-ThinkPad/T/
v4:
 - drop patch v3-24: not needed after null_blk refactoring;
 - add patch 34: "MIPS: sgi-ip27: optimize alloc_level()";
 - add patch 35: "uprobes: optimize xol_take_insn_slot()";
 - add patches 36-40: get rid of locking scheme around bitmaps;
 - move new API to separate headers, to not bloat bitmap.h @ Linus;
 - patch #1: adjust comments to allow returning >= @size;
 - rebase the series on top of current master.

Yury Norov (40):
  lib/find: add atomic find_bit() primitives
  lib/find: add test for atomic find_bit() ops
  lib/sbitmap; optimize __sbitmap_get_word() by using find_and_set_bit()
  watch_queue: optimize post_one_notification() by using
    find_and_clear_bit()
  sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  mips: sgi-ip30: optimize heart_alloc_int() by using find_and_set_bit()
  sparc: optimize alloc_msi() by using find_and_set_bit()
  perf/arm: use atomic find_bit() API
  drivers/perf: optimize ali_drw_get_counter_idx() by using
    find_and_set_bit()
  dmaengine: idxd: optimize perfmon_assign_event()
  ath10k: optimize ath10k_snoc_napi_poll()
  wifi: rtw88: optimize the driver by using atomic iterator
  KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  PCI: hv: Optimize hv_get_dom_num() by using find_and_set_bit()
  scsi: core: optimize scsi_evt_emit() by using an atomic iterator
  scsi: mpi3mr: optimize the driver by using find_and_set_bit()
  scsi: qedi: optimize qedi_get_task_idx() by using find_and_set_bit()
  powerpc: optimize arch code by using atomic find_bit() API
  iommu: optimize subsystem by using atomic find_bit() API
  media: radio-shark: optimize the driver by using atomic find_bit() API
  sfc: optimize the driver by using atomic find_bit() API
  tty: nozomi: optimize interrupt_handler()
  usb: cdc-acm: optimize acm_softint()
  RDMA/rtrs: optimize __rtrs_get_permit() by using
    find_and_set_bit_lock()
  mISDN: optimize get_free_devid()
  media: em28xx: cx231xx: optimize drivers by using find_and_set_bit()
  ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
  bluetooth: optimize cmtp_alloc_block_id()
  net: smc: optimize smc_wr_tx_get_free_slot_index()
  ALSA: use atomic find_bit() functions where applicable
  m68k: optimize get_mmu_context()
  microblaze: optimize get_mmu_context()
  sh: mach-x3proto: optimize ilsel_enable()
  MIPS: sgi-ip27: optimize alloc_level()
  uprobes: optimize xol_take_insn_slot()
  scsi: sr: drop locking around SR index bitmap
  KVM: PPC: Book3s HV: drop locking around kvmppc_uvmem_bitmap
  wifi: mac80211: drop locking around ntp_fltr_bmap
  mailbox: bcm-flexrm: simplify locking scheme
  powerpc/xive: drop locking around IRQ map

 MAINTAINERS                                  |   2 +
 arch/m68k/include/asm/mmu_context.h          |  12 +-
 arch/microblaze/include/asm/mmu_context_mm.h |  12 +-
 arch/mips/sgi-ip27/ip27-irq.c                |  13 +-
 arch/mips/sgi-ip30/ip30-irq.c                |  13 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c           |  33 +-
 arch/powerpc/mm/book3s32/mmu_context.c       |  11 +-
 arch/powerpc/platforms/pasemi/dma_lib.c      |  46 +--
 arch/powerpc/platforms/powernv/pci-sriov.c   |  13 +-
 arch/powerpc/sysdev/xive/spapr.c             |  34 +-
 arch/sh/boards/mach-x3proto/ilsel.c          |   5 +-
 arch/sparc/kernel/pci_msi.c                  |  10 +-
 arch/x86/kvm/hyperv.c                        |  41 +--
 drivers/dma/idxd/perfmon.c                   |   9 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c       |  16 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h        |  11 +-
 drivers/iommu/msm_iommu.c                    |  19 +-
 drivers/isdn/mISDN/core.c                    |  10 +-
 drivers/mailbox/bcm-flexrm-mailbox.c         |  21 +-
 drivers/media/radio/radio-shark.c            |   6 +-
 drivers/media/radio/radio-shark2.c           |   6 +-
 drivers/media/usb/cx231xx/cx231xx-cards.c    |  17 +-
 drivers/media/usb/em28xx/em28xx-cards.c      |  38 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c    |  18 +-
 drivers/net/ethernet/rocker/rocker_ofdpa.c   |  12 +-
 drivers/net/ethernet/sfc/rx_common.c         |   5 +-
 drivers/net/ethernet/sfc/siena/rx_common.c   |   5 +-
 drivers/net/ethernet/sfc/siena/siena_sriov.c |  15 +-
 drivers/net/wireless/ath/ath10k/snoc.c       |  10 +-
 drivers/net/wireless/realtek/rtw88/pci.c     |   6 +-
 drivers/net/wireless/realtek/rtw89/pci.c     |   6 +-
 drivers/pci/controller/pci-hyperv.c          |   8 +-
 drivers/perf/alibaba_uncore_drw_pmu.c        |  11 +-
 drivers/perf/arm-cci.c                       |  25 +-
 drivers/perf/arm-ccn.c                       |  11 +-
 drivers/perf/arm_dmc620_pmu.c                |  10 +-
 drivers/perf/arm_pmuv3.c                     |   9 +-
 drivers/scsi/mpi3mr/mpi3mr_os.c              |  22 +-
 drivers/scsi/qedi/qedi_main.c                |  10 +-
 drivers/scsi/scsi_lib.c                      |   8 +-
 drivers/scsi/sr.c                            |  15 +-
 drivers/tty/nozomi.c                         |   6 +-
 drivers/usb/class/cdc-acm.c                  |   6 +-
 include/linux/cpumask_atomic.h               |  20 ++
 include/linux/find.h                         |   4 -
 include/linux/find_atomic.h                  | 324 +++++++++++++++++++
 kernel/events/uprobes.c                      |  15 +-
 kernel/sched/sched.h                         |  15 +-
 kernel/watch_queue.c                         |   7 +-
 lib/find_bit.c                               |  86 +++++
 lib/sbitmap.c                                |  47 +--
 lib/test_bitmap.c                            |  62 ++++
 net/bluetooth/cmtp/core.c                    |  11 +-
 net/smc/smc_wr.c                             |  11 +-
 sound/pci/hda/hda_codec.c                    |   8 +-
 sound/usb/caiaq/audio.c                      |  14 +-
 56 files changed, 747 insertions(+), 493 deletions(-)
 create mode 100644 include/linux/cpumask_atomic.h
 create mode 100644 include/linux/find_atomic.h

-- 
2.43.0


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v4 01/40] lib/find: add atomic find_bit() primitives
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 02/40] lib/find: add test for atomic find_bit() ops Yury Norov
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

Add helpers around test_and_{set,clear}_bit() to allow searching for
clear or set bits and flipping them atomically.

Using atomic search primitives allows to implement lockless bitmap
handling where only individual bits are touched by concurrent processes,
and where people have to protect their bitmaps to search for a free
or set bit due to the lack of atomic searching routines.

The typical locking routines may look like this:

	unsigned long alloc_bit()
	{
		unsigned long bit;

		spin_lock(bitmap_lock);
		bit = find_first_zero_bit(bitmap, nbits);
		if (bit < nbits)
			__set_bit(bit, bitmap);
		spin_unlock(bitmap_lock);

		return bit;
	}

	void free_bit(unsigned long bit)
	{
		spin_lock(bitmap_lock);
		__clear_bit(bit, bitmap);
		spin_unlock(bitmap_lock);
	}

Now with atomic find_and_set_bit(), the above can be implemented
lockless, directly by using it and atomic clear_bit().

Patches 36-40 do this in few places in the kernel where the
transition is clear. There is likely more candidates for
refactoring.

The other important case is when people opencode atomic search
or atomic traverse on the maps with the patterns looking like:

	for (idx = 0; idx < nbits; idx++)
		if (test_and_clear_bit(idx, bitmap))
			do_something(idx);

Or like this:

	do {
		bit = find_first_bit(bitmap, nbits);
		if (bit >= nbits)
			return nbits;

	} while (!test_and_clear_bit(bit, bitmap));

	return bit;

In both cases, the opencoded loop may be converted to a single function
or iterator call. Correspondingly:

	for_each_test_and_clear_bit(idx, bitmap, nbits)
		do_something(idx);

Or:
	return find_and_clear_bit(bitmap, nbits);

Obviously, the less routine code people have to write themself, the
less probability to make a mistake.

The new API is not only a handy helpers - it also resolves a non-trivial
issue of using non-atomic find_bit() together with atomic
test_and_{set,clear)_bit().

The trick is that find_bit() implies that the bitmap is a regular
non-volatile piece of memory, and compiler is allowed to use such
optimization techniques like re-fetching memory instead of caching it.

For example, find_first_bit() is implemented like:

      for (idx = 0; idx * BITS_PER_LONG < sz; idx++) {
              val = addr[idx];
              if (val) {
                      sz = min(idx * BITS_PER_LONG + __ffs(val), sz);
                      break;
              }
      }

On register-memory architectures, like x86, compiler may decide to
access memory twice - first time to compare against 0, and second time
to fetch its value to pass it to __ffs().

When running find_first_bit() on volatile memory, the memory may get
changed in-between, and for instance, it may lead to passing 0 to
__ffs(), which is undefined. This is a potentially dangerous call.

find_and_clear_bit() as a wrapper around test_and_clear_bit()
naturally treats underlying bitmap as a volatile memory and prevents
compiler from such optimizations.

Now that KCSAN is catching exactly this type of situations and warns on
undercover memory modifications. We can use it to reveal improper usage
of find_bit(), and convert it to atomic find_and_*_bit() as appropriate.

In some cases concurrent operations with plain find_bit() are acceptable.
For example:

 - two threads running find_*_bit(): safe wrt ffs(0) and returns correct
   value, because underlying bitmap is unchanged;
 - find_next_bit() in parallel with set or clear_bit(), when modifying
   a bit prior to the start bit to search: safe and correct;
 - find_first_bit() in parallel with set_bit(): safe, but may return wrong
   bit number;
 - find_first_zero_bit() in parallel with clear_bit(): same as above.

In last 2 cases find_bit() may not return a correct bit number, but
it may be OK if caller requires any (not exactly the first) set or clear
bit, correspondingly.

In such cases, KCSAN may be safely silenced with data_race(). But in most
cases where KCSAN detects concurrency we should carefully review their
code and likely protect critical sections or switch to atomic
find_and_bit(), as appropriate.

This patch adds the following atomic primitives:

	find_and_set_bit(addr, nbits);
	find_and_set_next_bit(addr, nbits, start);
	...

Here find_and_{set,clear} part refers to the corresponding
test_and_{set,clear}_bit function. Suffixes like _wrap or _lock
derive their semantics from corresponding find() or test() functions.

For brevity, the naming omits the fact that we search for zero bit in
find_and_set, and correspondingly search for set bit in find_and_clear
functions.

The patch also adds iterators with atomic semantics, like
for_each_test_and_set_bit(). Here, the naming rule is to simply prefix
corresponding atomic operation with 'for_each'.

CC: Bart Van Assche <bvanassche@acm.org>
CC: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 MAINTAINERS                 |   1 +
 include/linux/find.h        |   4 -
 include/linux/find_atomic.h | 324 ++++++++++++++++++++++++++++++++++++
 lib/find_bit.c              |  86 ++++++++++
 4 files changed, 411 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/find_atomic.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b68c8b25bb93..54f37d4f33dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3730,6 +3730,7 @@ F:	include/linux/bitmap-str.h
 F:	include/linux/bitmap.h
 F:	include/linux/bits.h
 F:	include/linux/cpumask.h
+F:	include/linux/find_atomic.h
 F:	include/linux/find.h
 F:	include/linux/nodemask.h
 F:	include/vdso/bits.h
diff --git a/include/linux/find.h b/include/linux/find.h
index 5dfca4225fef..a855f82ab9ad 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -2,10 +2,6 @@
 #ifndef __LINUX_FIND_H_
 #define __LINUX_FIND_H_
 
-#ifndef __LINUX_BITMAP_H
-#error only <linux/bitmap.h> can be included directly
-#endif
-
 #include <linux/bitops.h>
 
 unsigned long _find_next_bit(const unsigned long *addr1, unsigned long nbits,
diff --git a/include/linux/find_atomic.h b/include/linux/find_atomic.h
new file mode 100644
index 000000000000..a9e238f88d0b
--- /dev/null
+++ b/include/linux/find_atomic.h
@@ -0,0 +1,324 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_FIND_ATOMIC_H_
+#define __LINUX_FIND_ATOMIC_H_
+
+#include <linux/bitops.h>
+#include <linux/find.h>
+
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr, unsigned long nbits,
+				unsigned long start);
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr, unsigned long nbits,
+					  unsigned long start);
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits);
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr, unsigned long nbits,
+				unsigned long start);
+
+/**
+ * find_and_set_bit - Find a zero bit and set it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if >= @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, 0);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_bit(addr, nbits);
+}
+
+
+/**
+ * find_and_set_next_bit - Find a zero bit and set it, starting from @offset
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap, starting from
+ * @offset. It's also not guaranteed that if >= @nbits is returned, the bitmap
+ * is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit(volatile unsigned long *addr,
+				    unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, offset);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_next_bit(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap - find and set bit starting at @offset, wrapping around zero
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next clear bit, or first clear bit up to @offset,
+ * while atomically setting it. If no bits are found, returns >= @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	unsigned long bit = find_and_set_next_bit(addr, nbits, offset);
+
+	if (bit < nbits || offset == 0)
+		return bit;
+
+	bit = find_and_set_bit(addr, offset);
+	return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_set_bit_lock - find a zero bit, then set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if >= @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, 0);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit_lock(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_bit_lock(addr, nbits);
+}
+
+/**
+ * find_and_set_next_bit_lock - find a zero bit and set it atomically with lock
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the range. It's also not
+ * guaranteed that if >= @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_set_next_bit_lock(volatile unsigned long *addr,
+					 unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr | ~GENMASK(nbits - 1, offset);
+			if (val == ~0UL)
+				return nbits;
+			ret = ffz(val);
+		} while (test_and_set_bit_lock(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_set_next_bit_lock(addr, nbits, offset);
+}
+
+/**
+ * find_and_set_bit_wrap_lock - find zero bit starting at @ofset and set it
+ *				with lock, and wrap around zero if nothing found
+ * @addr: The first address to base the search on
+ * @nbits: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns: the bit number for the next set bit, or first set bit up to @offset
+ * If no bits are set, returns >= @nbits.
+ */
+static inline
+unsigned long find_and_set_bit_wrap_lock(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	unsigned long bit = find_and_set_next_bit_lock(addr, nbits, offset);
+
+	if (bit < nbits || offset == 0)
+		return bit;
+
+	bit = find_and_set_bit_lock(addr, offset);
+	return bit < offset ? bit : nbits;
+}
+
+/**
+ * find_and_clear_bit - Find a set bit and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the bitmap. It's also not
+ * guaranteed that if >= @nbits is returned, the bitmap is empty.
+ *
+ * The function does guarantee that if returned value is in range [0 .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or >= @nbits if no bits found
+ */
+static inline unsigned long find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr & GENMASK(nbits - 1, 0);
+			if (val == 0)
+				return nbits;
+			ret = __ffs(val);
+		} while (!test_and_clear_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_clear_bit(addr, nbits);
+}
+
+/**
+ * find_and_clear_next_bit - Find a set bit next after @offset, and clear it atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap nbits in bits
+ * @offset: bit offset at which to start searching
+ *
+ * This function is designed to operate in concurrent access environment.
+ *
+ * Because of concurrency and volatile nature of underlying bitmap, it's not
+ * guaranteed that the found bit is the 1st bit in the range It's also not
+ * guaranteed that if >= @nbits is returned, there's no set bits after @offset.
+ *
+ * The function does guarantee that if returned value is in range [@offset .. @nbits),
+ * the acquired bit belongs to the caller exclusively.
+ *
+ * Returns: found and cleared bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long find_and_clear_next_bit(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long offset)
+{
+	if (small_const_nbits(nbits)) {
+		unsigned long val, ret;
+
+		do {
+			val = *addr & GENMASK(nbits - 1, offset);
+			if (val == 0)
+				return nbits;
+			ret = __ffs(val);
+		} while (!test_and_clear_bit(ret, addr));
+
+		return ret;
+	}
+
+	return _find_and_clear_next_bit(addr, nbits, offset);
+}
+
+/**
+ * __find_and_set_bit - Find a zero bit and set it non-atomically
+ * @addr: The address to base the search on
+ * @nbits: The bitmap size in bits
+ *
+ * A non-atomic version of find_and_set_bit() needed to help writing
+ * common-looking code where atomicity is provided externally.
+ *
+ * Returns: found and set bit, or >= @nbits if no bits found
+ */
+static inline
+unsigned long __find_and_set_bit(unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	bit = find_first_zero_bit(addr, nbits);
+	if (bit < nbits)
+		__set_bit(bit, addr);
+
+	return bit;
+}
+
+/* same as for_each_set_bit() but atomically clears each found bit */
+#define for_each_test_and_clear_bit(bit, addr, size) \
+	for ((bit) = 0; \
+	     (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+/* same as for_each_set_bit_from() but atomically clears each found bit */
+#define for_each_test_and_clear_bit_from(bit, addr, size) \
+	for (; (bit) = find_and_clear_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
+
+/* same as for_each_clear_bit() but atomically sets each found bit */
+#define for_each_test_and_set_bit(bit, addr, size) \
+	for ((bit) = 0; \
+	     (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+/* same as for_each_clear_bit_from() but atomically clears each found bit */
+#define for_each_test_and_set_bit_from(bit, addr, size) \
+	for (; \
+	     (bit) = find_and_set_next_bit((addr), (size), (bit)), (bit) < (size); \
+	     (bit)++)
+
+#endif /* __LINUX_FIND_ATOMIC_H_ */
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 0836bb3d76c5..a322abd1e540 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -14,6 +14,7 @@
 
 #include <linux/bitops.h>
 #include <linux/bitmap.h>
+#include <linux/find_atomic.h>
 #include <linux/export.h>
 #include <linux/math.h>
 #include <linux/minmax.h>
@@ -128,6 +129,91 @@ unsigned long _find_first_and_and_bit(const unsigned long *addr1,
 }
 EXPORT_SYMBOL(_find_first_and_and_bit);
 
+unsigned long _find_and_set_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit);
+
+unsigned long _find_and_set_next_bit(volatile unsigned long *addr,
+				     unsigned long nbits, unsigned long start)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit);
+
+unsigned long _find_and_set_bit_lock(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(~addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit_lock(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_bit_lock);
+
+unsigned long _find_and_set_next_bit_lock(volatile unsigned long *addr,
+					  unsigned long nbits, unsigned long start)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_NEXT_BIT(~addr[idx], /* nop */, nbits, start);
+		if (bit >= nbits)
+			return nbits;
+	} while (test_and_set_bit_lock(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_set_next_bit_lock);
+
+unsigned long _find_and_clear_bit(volatile unsigned long *addr, unsigned long nbits)
+{
+	unsigned long bit;
+
+	do {
+		bit = FIND_FIRST_BIT(addr[idx], /* nop */, nbits);
+		if (bit >= nbits)
+			return nbits;
+	} while (!test_and_clear_bit(bit, addr));
+
+	return bit;
+}
+EXPORT_SYMBOL(_find_and_clear_bit);
+
+unsigned long _find_and_clear_next_bit(volatile unsigned long *addr,
+					unsigned long nbits, unsigned long start)
+{
+	do {
+		start =  FIND_NEXT_BIT(addr[idx], /* nop */, nbits, start);
+		if (start >= nbits)
+			return nbits;
+	} while (!test_and_clear_bit(start, addr));
+
+	return start;
+}
+EXPORT_SYMBOL(_find_and_clear_next_bit);
+
 #ifndef find_first_zero_bit
 /*
  * Find the first cleared bit in a memory region.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 02/40] lib/find: add test for atomic find_bit() ops
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
  2024-06-20 17:56 ` [PATCH v4 01/40] " Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 03/40] lib/sbitmap; optimize __sbitmap_get_word() by using find_and_set_bit() Yury Norov
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

Add basic functionality test for new API.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/test_bitmap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 65a75d58ed9e..405f79dd2266 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -6,6 +6,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/bitmap.h>
+#include <linux/find_atomic.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -221,6 +222,65 @@ static void __init test_zero_clear(void)
 	expect_eq_pbl("", bmap, 1024);
 }
 
+static void __init test_find_and_bit(void)
+{
+	unsigned long w, w_part, bit, cnt = 0;
+	DECLARE_BITMAP(bmap, EXP1_IN_BITS);
+
+	/*
+	 * Test find_and_clear{_next}_bit() and corresponding
+	 * iterators
+	 */
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+
+	for_each_test_and_clear_bit(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(w, cnt);
+	expect_eq_uint(0, bitmap_weight(bmap, EXP1_IN_BITS));
+
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+
+	cnt = 0;
+	bit = EXP1_IN_BITS / 3;
+	for_each_test_and_clear_bit_from(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(bitmap_weight(bmap, EXP1_IN_BITS), bitmap_weight(bmap, EXP1_IN_BITS / 3));
+	expect_eq_uint(w_part, bitmap_weight(bmap, EXP1_IN_BITS));
+	expect_eq_uint(w - w_part, cnt);
+
+	/*
+	 * Test find_and_set{_next}_bit() and corresponding
+	 * iterators
+	 */
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	cnt = 0;
+
+	for_each_test_and_set_bit(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(EXP1_IN_BITS - w, cnt);
+	expect_eq_uint(EXP1_IN_BITS, bitmap_weight(bmap, EXP1_IN_BITS));
+
+	bitmap_copy(bmap, exp1, EXP1_IN_BITS);
+	w = bitmap_weight(bmap, EXP1_IN_BITS);
+	w_part = bitmap_weight(bmap, EXP1_IN_BITS / 3);
+	cnt = 0;
+
+	bit = EXP1_IN_BITS / 3;
+	for_each_test_and_set_bit_from(bit, bmap, EXP1_IN_BITS)
+		cnt++;
+
+	expect_eq_uint(EXP1_IN_BITS - bitmap_weight(bmap, EXP1_IN_BITS),
+			EXP1_IN_BITS / 3 - bitmap_weight(bmap, EXP1_IN_BITS / 3));
+	expect_eq_uint(EXP1_IN_BITS * 2 / 3 - (w - w_part), cnt);
+}
+
 static void __init test_find_nth_bit(void)
 {
 	unsigned long b, bit, cnt = 0;
@@ -1482,6 +1542,8 @@ static void __init selftest(void)
 	test_for_each_clear_bitrange_from();
 	test_for_each_set_clump8();
 	test_for_each_set_bit_wrap();
+
+	test_find_and_bit();
 }
 
 KSTM_MODULE_LOADERS(test_bitmap);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 03/40] lib/sbitmap; optimize __sbitmap_get_word() by using find_and_set_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
  2024-06-20 17:56 ` [PATCH v4 01/40] " Yury Norov
  2024-06-20 17:56 ` [PATCH v4 02/40] lib/find: add test for atomic find_bit() ops Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 04/40] watch_queue: optimize post_one_notification() by using find_and_clear_bit() Yury Norov
                   ` (37 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Jens Axboe, linux-block
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

__sbitmap_get_word() opencodes either find_and_set_bit_wrap(), or
find_and_set_next_bit(), depending on wrap parameter. Simplify it
by using atomic find_bit() API.

While here, simplify sbitmap_find_bit_in_word(), which calls it.

CC: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 lib/sbitmap.c | 47 ++++++++++-------------------------------------
 1 file changed, 10 insertions(+), 37 deletions(-)

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 1e453f825c05..3881996217c9 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2013-2014 Jens Axboe
  */
 
+#include <linux/find_atomic.h>
 #include <linux/sched.h>
 #include <linux/random.h>
 #include <linux/sbitmap.h>
@@ -133,38 +134,13 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth)
 }
 EXPORT_SYMBOL_GPL(sbitmap_resize);
 
-static int __sbitmap_get_word(unsigned long *word, unsigned long depth,
+static inline int __sbitmap_get_word(unsigned long *word, unsigned long depth,
 			      unsigned int hint, bool wrap)
 {
-	int nr;
-
-	/* don't wrap if starting from 0 */
-	wrap = wrap && hint;
-
-	while (1) {
-		nr = find_next_zero_bit(word, depth, hint);
-		if (unlikely(nr >= depth)) {
-			/*
-			 * We started with an offset, and we didn't reset the
-			 * offset to 0 in a failure case, so start from 0 to
-			 * exhaust the map.
-			 */
-			if (hint && wrap) {
-				hint = 0;
-				continue;
-			}
-			return -1;
-		}
+	if (wrap)
+		return find_and_set_bit_wrap_lock(word, depth, hint);
 
-		if (!test_and_set_bit_lock(nr, word))
-			break;
-
-		hint = nr + 1;
-		if (hint >= depth - 1)
-			hint = 0;
-	}
-
-	return nr;
+	return find_and_set_next_bit_lock(word, depth, hint);
 }
 
 static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
@@ -175,15 +151,12 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
 	int nr;
 
 	do {
-		nr = __sbitmap_get_word(&map->word, depth,
-					alloc_hint, wrap);
-		if (nr != -1)
-			break;
-		if (!sbitmap_deferred_clear(map))
-			break;
-	} while (1);
+		nr = __sbitmap_get_word(&map->word, depth, alloc_hint, wrap);
+		if (nr < depth)
+			return nr;
+	} while (sbitmap_deferred_clear(map));
 
-	return nr;
+	return -1;
 }
 
 static int sbitmap_find_bit(struct sbitmap *sb,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 04/40] watch_queue: optimize post_one_notification() by using find_and_clear_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (2 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 03/40] lib/sbitmap; optimize __sbitmap_get_word() by using find_and_set_bit() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 05/40] sched: add cpumask_find_and_set() and use it in __mm_cid_get() Yury Norov
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Christian Brauner, David Howells, Siddh Raman Pant,
	Dave Airlie, David Disseldorp, Philipp Stanner, Nick Alcock
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

post_one_notification() searches for a set bit in wqueue->notes_bitmap,
and after some housekeeping work clears it, firing a BUG() if someone
else cleared the bit in-between.

We can allocate the bit atomically with an atomic find_and_clear_bit(),
and remove the BUG() possibility entirely.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/watch_queue.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 03b90d7d2175..387ee88af71d 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -9,6 +9,7 @@
 
 #define pr_fmt(fmt) "watchq: " fmt
 #include <linux/module.h>
+#include <linux/find_atomic.h>
 #include <linux/init.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
@@ -112,7 +113,7 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	if (pipe_full(head, tail, pipe->ring_size))
 		goto lost;
 
-	note = find_first_bit(wqueue->notes_bitmap, wqueue->nr_notes);
+	note = find_and_clear_bit(wqueue->notes_bitmap, wqueue->nr_notes);
 	if (note >= wqueue->nr_notes)
 		goto lost;
 
@@ -133,10 +134,6 @@ static bool post_one_notification(struct watch_queue *wqueue,
 	buf->flags = PIPE_BUF_FLAG_WHOLE;
 	smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */
 
-	if (!test_and_clear_bit(note, wqueue->notes_bitmap)) {
-		spin_unlock_irq(&pipe->rd_wait.lock);
-		BUG();
-	}
 	wake_up_interruptible_sync_poll_locked(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
 	done = true;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 05/40] sched: add cpumask_find_and_set() and use it in __mm_cid_get()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (3 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 04/40] watch_queue: optimize post_one_notification() by using find_and_clear_bit() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 06/40] mips: sgi-ip30: optimize heart_alloc_int() by using find_and_set_bit() Yury Norov
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Yury Norov, Andy Shevchenko, Rasmus Villemoes,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Sergey Shtylyov,
	Mathieu Desnoyers

__mm_cid_get() uses __mm_cid_try_get() helper to atomically acquire a
bit in mm cid mask. Now that we have atomic find_and_set_bit(), we can
easily extend it to cpumasks and use in the scheduler code.

cpumask_find_and_set() considers cid mask as a volatile region of memory,
as it actually is in this case. So, if it's changed while search is in
progress, KCSAN wouldn't fire warning on it.

CC: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 MAINTAINERS                    |  1 +
 include/linux/cpumask_atomic.h | 20 ++++++++++++++++++++
 kernel/sched/sched.h           | 15 ++++++---------
 3 files changed, 27 insertions(+), 9 deletions(-)
 create mode 100644 include/linux/cpumask_atomic.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 54f37d4f33dd..7173c74896d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3730,6 +3730,7 @@ F:	include/linux/bitmap-str.h
 F:	include/linux/bitmap.h
 F:	include/linux/bits.h
 F:	include/linux/cpumask.h
+F:	include/linux/cpumask_atomic.h
 F:	include/linux/find_atomic.h
 F:	include/linux/find.h
 F:	include/linux/nodemask.h
diff --git a/include/linux/cpumask_atomic.h b/include/linux/cpumask_atomic.h
new file mode 100644
index 000000000000..1aaf9a63cbe6
--- /dev/null
+++ b/include/linux/cpumask_atomic.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_CPUMASK_ATOMIC_H_
+#define __LINUX_CPUMASK_ATOMIC_H_
+
+#include <linux/cpumask.h>
+#include <linux/find_atomic.h>
+
+/*
+ * cpumask_find_and_set - find the first unset cpu in a cpumask and
+ *			  set it atomically
+ * @srcp: the cpumask pointer
+ *
+ * Return: >= nr_cpu_ids if nothing is found.
+ */
+static inline unsigned int cpumask_find_and_set(volatile struct cpumask *srcp)
+{
+	return find_and_set_bit(cpumask_bits(srcp), small_cpumask_bits);
+}
+
+#endif /* __LINUX_CPUMASK_ATOMIC_H_ */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a831af102070..557896f8ccd7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -30,6 +30,7 @@
 #include <linux/context_tracking.h>
 #include <linux/cpufreq.h>
 #include <linux/cpumask_api.h>
+#include <linux/cpumask_atomic.h>
 #include <linux/ctype.h>
 #include <linux/file.h>
 #include <linux/fs_api.h>
@@ -3312,23 +3313,19 @@ static inline void mm_cid_put(struct mm_struct *mm)
 
 static inline int __mm_cid_try_get(struct mm_struct *mm)
 {
-	struct cpumask *cpumask;
-	int cid;
+	struct cpumask *cpumask = mm_cidmask(mm);
+	int cid = nr_cpu_ids;
 
-	cpumask = mm_cidmask(mm);
 	/*
 	 * Retry finding first zero bit if the mask is temporarily
 	 * filled. This only happens during concurrent remote-clear
 	 * which owns a cid without holding a rq lock.
 	 */
-	for (;;) {
-		cid = cpumask_first_zero(cpumask);
-		if (cid < nr_cpu_ids)
-			break;
+	while (cid >= nr_cpu_ids) {
+		cid = cpumask_find_and_set(cpumask);
 		cpu_relax();
 	}
-	if (cpumask_test_and_set_cpu(cid, cpumask))
-		return -1;
+
 	return cid;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 06/40] mips: sgi-ip30: optimize heart_alloc_int() by using find_and_set_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (4 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 05/40] sched: add cpumask_find_and_set() and use it in __mm_cid_get() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 07/40] sparc: optimize alloc_msi() " Yury Norov
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Thomas Bogendoerfer, linux-mips
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

heart_alloc_int() opencodes find_and_set_bit(). Simplify it by using the
dedicated function, and make a nice one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/mips/sgi-ip30/ip30-irq.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/mips/sgi-ip30/ip30-irq.c b/arch/mips/sgi-ip30/ip30-irq.c
index 423c32cb66ed..a70e7af93643 100644
--- a/arch/mips/sgi-ip30/ip30-irq.c
+++ b/arch/mips/sgi-ip30/ip30-irq.c
@@ -2,6 +2,7 @@
 /*
  * ip30-irq.c: Highlevel interrupt handling for IP30 architecture.
  */
+#include <linux/find_atomic.h>
 #include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
@@ -28,17 +29,9 @@ static DEFINE_PER_CPU(unsigned long, irq_enable_mask);
 
 static inline int heart_alloc_int(void)
 {
-	int bit;
+	int bit = find_and_set_bit(heart_irq_map, HEART_NUM_IRQS);
 
-again:
-	bit = find_first_zero_bit(heart_irq_map, HEART_NUM_IRQS);
-	if (bit >= HEART_NUM_IRQS)
-		return -ENOSPC;
-
-	if (test_and_set_bit(bit, heart_irq_map))
-		goto again;
-
-	return bit;
+	return bit < HEART_NUM_IRQS ? bit : -ENOSPC;
 }
 
 static void ip30_error_irq(struct irq_desc *desc)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 07/40] sparc: optimize alloc_msi() by using find_and_set_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (5 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 06/40] mips: sgi-ip30: optimize heart_alloc_int() by using find_and_set_bit() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 08/40] perf/arm: use atomic find_bit() API Yury Norov
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, David S. Miller, Rob Herring, Sam Ravnborg,
	Yury Norov, sparclinux
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

alloc_msi() opencodes find_and_set_bit(). Simplify it by using the
dedicated function, and make a nice one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/sparc/kernel/pci_msi.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/sparc/kernel/pci_msi.c b/arch/sparc/kernel/pci_msi.c
index acb2f83a1d5c..55ff78a8f37c 100644
--- a/arch/sparc/kernel/pci_msi.c
+++ b/arch/sparc/kernel/pci_msi.c
@@ -3,6 +3,7 @@
  *
  * Copyright (C) 2007 David S. Miller (davem@davemloft.net)
  */
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/interrupt.h>
 #include <linux/of.h>
@@ -96,14 +97,9 @@ static u32 pick_msiq(struct pci_pbm_info *pbm)
 
 static int alloc_msi(struct pci_pbm_info *pbm)
 {
-	int i;
-
-	for (i = 0; i < pbm->msi_num; i++) {
-		if (!test_and_set_bit(i, pbm->msi_bitmap))
-			return i + pbm->msi_first;
-	}
+	int i = find_and_set_bit(pbm->msi_bitmap, pbm->msi_num);
 
-	return -ENOENT;
+	return i < pbm->msi_num ? i + pbm->msi_first : -ENOENT;
 }
 
 static void free_msi(struct pci_pbm_info *pbm, int msi_num)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 08/40] perf/arm: use atomic find_bit() API
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (6 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 07/40] sparc: optimize alloc_msi() " Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 09/40] drivers/perf: optimize ali_drw_get_counter_idx() by using find_and_set_bit() Yury Norov
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Mark Rutland, linux-arm-kernel
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

Simplify subsystem by use atomic find_bit() or atomic API where
applicable.

CC: Will Deacon <will@kernel.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/perf/arm-cci.c        | 25 +++++++------------------
 drivers/perf/arm-ccn.c        | 11 +++--------
 drivers/perf/arm_dmc620_pmu.c | 10 +++-------
 drivers/perf/arm_pmuv3.c      |  9 +++------
 4 files changed, 16 insertions(+), 39 deletions(-)

diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
index c76bac668dea..4c5d23942352 100644
--- a/drivers/perf/arm-cci.c
+++ b/drivers/perf/arm-cci.c
@@ -4,6 +4,7 @@
 // Author: Punit Agrawal <punit.agrawal@arm.com>, Suzuki Poulose <suzuki.poulose@arm.com>
 
 #include <linux/arm-cci.h>
+#include <linux/find_atomic.h>
 #include <linux/io.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
@@ -318,12 +319,9 @@ static int cci400_get_event_idx(struct cci_pmu *cci_pmu,
 		return CCI400_PMU_CYCLE_CNTR_IDX;
 	}
 
-	for (idx = CCI400_PMU_CNTR0_IDX; idx <= CCI_PMU_CNTR_LAST(cci_pmu); ++idx)
-		if (!test_and_set_bit(idx, hw->used_mask))
-			return idx;
-
-	/* No counters available */
-	return -EAGAIN;
+	idx = find_and_set_next_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1,
+							CCI400_PMU_CNTR0_IDX);
+	return idx < CCI_PMU_CNTR_LAST(cci_pmu) + 1 ? idx : -EAGAIN;
 }
 
 static int cci400_validate_hw_event(struct cci_pmu *cci_pmu, unsigned long hw_event)
@@ -792,13 +790,8 @@ static int pmu_get_event_idx(struct cci_pmu_hw_events *hw, struct perf_event *ev
 	if (cci_pmu->model->get_event_idx)
 		return cci_pmu->model->get_event_idx(cci_pmu, hw, cci_event);
 
-	/* Generic code to find an unused idx from the mask */
-	for (idx = 0; idx <= CCI_PMU_CNTR_LAST(cci_pmu); idx++)
-		if (!test_and_set_bit(idx, hw->used_mask))
-			return idx;
-
-	/* No counters available */
-	return -EAGAIN;
+	idx = find_and_set_bit(hw->used_mask, CCI_PMU_CNTR_LAST(cci_pmu) + 1);
+	return idx < CCI_PMU_CNTR_LAST(cci_pmu) + 1 ? idx : -EAGAIN;
 }
 
 static int pmu_map_event(struct perf_event *event)
@@ -851,12 +844,8 @@ static void pmu_free_irq(struct cci_pmu *cci_pmu)
 {
 	int i;
 
-	for (i = 0; i < cci_pmu->nr_irqs; i++) {
-		if (!test_and_clear_bit(i, &cci_pmu->active_irqs))
-			continue;
-
+	for_each_test_and_clear_bit(i, &cci_pmu->active_irqs, cci_pmu->nr_irqs)
 		free_irq(cci_pmu->irqs[i], cci_pmu);
-	}
 }
 
 static u32 pmu_read_counter(struct perf_event *event)
diff --git a/drivers/perf/arm-ccn.c b/drivers/perf/arm-ccn.c
index 86ef31ac7503..bd66d90dfda6 100644
--- a/drivers/perf/arm-ccn.c
+++ b/drivers/perf/arm-ccn.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/ctype.h>
+#include <linux/find_atomic.h>
 #include <linux/hrtimer.h>
 #include <linux/idr.h>
 #include <linux/interrupt.h>
@@ -580,15 +581,9 @@ static const struct attribute_group *arm_ccn_pmu_attr_groups[] = {
 
 static int arm_ccn_pmu_alloc_bit(unsigned long *bitmap, unsigned long size)
 {
-	int bit;
-
-	do {
-		bit = find_first_zero_bit(bitmap, size);
-		if (bit >= size)
-			return -EAGAIN;
-	} while (test_and_set_bit(bit, bitmap));
+	int bit = find_and_set_bit(bitmap, size);
 
-	return bit;
+	return bit < size ? bit : -EAGAIN;
 }
 
 /* All RN-I and RN-D nodes have identical PMUs */
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 7e5f1d4fca0f..f41cc2ee9564 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -16,6 +16,7 @@
 #include <linux/cpumask.h>
 #include <linux/device.h>
 #include <linux/errno.h>
+#include <linux/find_atomic.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
 #include <linux/kernel.h>
@@ -303,13 +304,8 @@ static int dmc620_get_event_idx(struct perf_event *event)
 		end_idx = DMC620_PMU_MAX_COUNTERS;
 	}
 
-	for (idx = start_idx; idx < end_idx; ++idx) {
-		if (!test_and_set_bit(idx, dmc620_pmu->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EAGAIN;
+	idx = find_and_set_next_bit(dmc620_pmu->used_mask, end_idx, start_idx);
+	return idx < end_idx ? idx : -EAGAIN;
 }
 
 static inline
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 23fa6c5da82c..f3b20a3b1d9c 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -17,6 +17,7 @@
 #include <linux/acpi.h>
 #include <linux/bitfield.h>
 #include <linux/clocksource.h>
+#include <linux/find_atomic.h>
 #include <linux/of.h>
 #include <linux/perf/arm_pmu.h>
 #include <linux/perf/arm_pmuv3.h>
@@ -903,13 +904,9 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc,
 				    struct arm_pmu *cpu_pmu)
 {
-	int idx;
+	int idx = find_and_set_next_bit(cpuc->used_mask, cpu_pmu->num_events, ARMV8_IDX_COUNTER0);
 
-	for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) {
-		if (!test_and_set_bit(idx, cpuc->used_mask))
-			return idx;
-	}
-	return -EAGAIN;
+	return idx < cpu_pmu->num_events ? idx : -EAGAIN;
 }
 
 static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 09/40] drivers/perf: optimize ali_drw_get_counter_idx() by using find_and_set_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (7 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 08/40] perf/arm: use atomic find_bit() API Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 10/40] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Shuai Xue, Will Deacon, Mark Rutland,
	linux-arm-kernel
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

The function searches used_mask for a set bit in a for-loop bit by bit.
Simplify it by using atomic find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
---
 drivers/perf/alibaba_uncore_drw_pmu.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c
index 38a2947ae813..1516f2c3d58f 100644
--- a/drivers/perf/alibaba_uncore_drw_pmu.c
+++ b/drivers/perf/alibaba_uncore_drw_pmu.c
@@ -17,6 +17,7 @@
 #include <linux/cpumask.h>
 #include <linux/device.h>
 #include <linux/errno.h>
+#include <linux/find_atomic.h>
 #include <linux/interrupt.h>
 #include <linux/irq.h>
 #include <linux/kernel.h>
@@ -266,15 +267,9 @@ static const struct attribute_group *ali_drw_pmu_attr_groups[] = {
 static int ali_drw_get_counter_idx(struct perf_event *event)
 {
 	struct ali_drw_pmu *drw_pmu = to_ali_drw_pmu(event->pmu);
-	int idx;
+	int idx = find_and_set_bit(drw_pmu->used_mask, ALI_DRW_PMU_COMMON_MAX_COUNTERS);
 
-	for (idx = 0; idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS; ++idx) {
-		if (!test_and_set_bit(idx, drw_pmu->used_mask))
-			return idx;
-	}
-
-	/* The counters are all in use. */
-	return -EBUSY;
+	return idx < ALI_DRW_PMU_COMMON_MAX_COUNTERS ? idx : -EBUSY;
 }
 
 static u64 ali_drw_pmu_read_counter(struct perf_event *event)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 10/40] dmaengine: idxd: optimize perfmon_assign_event()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (8 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 09/40] drivers/perf: optimize ali_drw_get_counter_idx() by using find_and_set_bit() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 11/40] ath10k: optimize ath10k_snoc_napi_poll() Yury Norov
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Fenghua Yu, Dave Jiang, Vinod Koul, dmaengine
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

The function searches used_mask for a set bit in a for-loop bit by
bit. Simplify it by using atomic find_and_set_bit(), and make a nice
one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/dma/idxd/perfmon.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
index 5e94247e1ea7..063ee78fb132 100644
--- a/drivers/dma/idxd/perfmon.c
+++ b/drivers/dma/idxd/perfmon.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright(c) 2020 Intel Corporation. All rights rsvd. */
 
+#include <linux/find_atomic.h>
 #include <linux/sched/task.h>
 #include <linux/io-64-nonatomic-lo-hi.h>
 #include "idxd.h"
@@ -134,13 +135,9 @@ static void perfmon_assign_hw_event(struct idxd_pmu *idxd_pmu,
 static int perfmon_assign_event(struct idxd_pmu *idxd_pmu,
 				struct perf_event *event)
 {
-	int i;
-
-	for (i = 0; i < IDXD_PMU_EVENT_MAX; i++)
-		if (!test_and_set_bit(i, idxd_pmu->used_mask))
-			return i;
+	int i = find_and_set_bit(idxd_pmu->used_mask, IDXD_PMU_EVENT_MAX);
 
-	return -EINVAL;
+	return i < IDXD_PMU_EVENT_MAX ? i : -EINVAL;
 }
 
 /*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 11/40] ath10k: optimize ath10k_snoc_napi_poll()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (9 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 10/40] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 12/40] wifi: rtw88: optimize the driver by using atomic iterator Yury Norov
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Kalle Valo, Jeff Johnson, ath10k, linux-wireless
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

ath10k_snoc_napi_poll() traverses pending_ce_irqs bitmap bit by bit.
Simplify it by using for_each_test_and_clear_bit() iterator.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/wireless/ath/ath10k/snoc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 8530550cf5df..d63608e34785 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -5,6 +5,7 @@
 
 #include <linux/bits.h>
 #include <linux/clk.h>
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/of.h>
@@ -1237,11 +1238,10 @@ static int ath10k_snoc_napi_poll(struct napi_struct *ctx, int budget)
 		return done;
 	}
 
-	for (ce_id = 0; ce_id < CE_COUNT; ce_id++)
-		if (test_and_clear_bit(ce_id, ar_snoc->pending_ce_irqs)) {
-			ath10k_ce_per_engine_service(ar, ce_id);
-			ath10k_ce_enable_interrupt(ar, ce_id);
-		}
+	for_each_test_and_clear_bit(ce_id, ar_snoc->pending_ce_irqs, CE_COUNT) {
+		ath10k_ce_per_engine_service(ar, ce_id);
+		ath10k_ce_enable_interrupt(ar, ce_id);
+	}
 
 	done = ath10k_htt_txrx_compl_task(ar, budget);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 12/40] wifi: rtw88: optimize the driver by using atomic iterator
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (10 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 11/40] ath10k: optimize ath10k_snoc_napi_poll() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 13/40] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Ping-Ke Shih, Kalle Valo, linux-wireless
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

rtw_pci_tx_kick_off() and rtw89_pci_tx_kick_off_pending() traverse bitmaps
bit by bit. Simplify it by using atomic for_each_test_and_clear_bit()
iterator.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 6 +++---
 drivers/net/wireless/realtek/rtw89/pci.c | 6 ++----
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 30232f7e3ec5..28c0f4c99cf8 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -2,6 +2,7 @@
 /* Copyright(c) 2018-2019  Realtek Corporation
  */
 
+#include <linux/find_atomic.h>
 #include <linux/module.h>
 #include <linux/pci.h>
 #include "main.h"
@@ -790,9 +791,8 @@ static void rtw_pci_tx_kick_off(struct rtw_dev *rtwdev)
 	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
 	enum rtw_tx_queue_type queue;
 
-	for (queue = 0; queue < RTK_MAX_TX_QUEUE_NUM; queue++)
-		if (test_and_clear_bit(queue, rtwpci->tx_queued))
-			rtw_pci_tx_kick_off_queue(rtwdev, queue);
+	for_each_test_and_clear_bit(queue, rtwpci->tx_queued, RTK_MAX_TX_QUEUE_NUM)
+		rtw_pci_tx_kick_off_queue(rtwdev, queue);
 }
 
 static int rtw_pci_tx_write_data(struct rtw_dev *rtwdev,
diff --git a/drivers/net/wireless/realtek/rtw89/pci.c b/drivers/net/wireless/realtek/rtw89/pci.c
index 03bbcf9b6737..deb06cab5974 100644
--- a/drivers/net/wireless/realtek/rtw89/pci.c
+++ b/drivers/net/wireless/realtek/rtw89/pci.c
@@ -2,6 +2,7 @@
 /* Copyright(c) 2020  Realtek Corporation
  */
 
+#include <linux/find_atomic.h>
 #include <linux/pci.h>
 
 #include "mac.h"
@@ -1234,10 +1235,7 @@ static void rtw89_pci_tx_kick_off_pending(struct rtw89_dev *rtwdev)
 	struct rtw89_pci_tx_ring *tx_ring;
 	int txch;
 
-	for (txch = 0; txch < RTW89_TXCH_NUM; txch++) {
-		if (!test_and_clear_bit(txch, rtwpci->kick_map))
-			continue;
-
+	for_each_test_and_clear_bit(txch, rtwpci->kick_map, RTW89_TXCH_NUM) {
 		tx_ring = &rtwpci->tx_rings[txch];
 		__rtw89_pci_tx_kick_off(rtwdev, tx_ring);
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 13/40] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (11 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 12/40] wifi: rtw88: optimize the driver by using atomic iterator Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 14/40] PCI: hv: Optimize hv_get_dom_num() by using find_and_set_bit() Yury Norov
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Vitaly Kuznetsov, Sean Christopherson,
	Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, kvm
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

The function traverses stimer_pending_bitmap in a for-loop bit by bit.
Simplify it by using atomic for_each_test_and_clear_bit().

Because there are only 4 bits, using for_each_test_and_clear_bit() will
generate inline code, so no excessive bloating with the new API.

While here, refactor the logic by decreasing indentation level.

CC: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/hyperv.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8a47f8541eab..96acbcf603f5 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -28,6 +28,7 @@
 #include "xen.h"
 
 #include <linux/cpu.h>
+#include <linux/find_atomic.h>
 #include <linux/kvm_host.h>
 #include <linux/highmem.h>
 #include <linux/sched/cputime.h>
@@ -870,27 +871,27 @@ void kvm_hv_process_stimers(struct kvm_vcpu *vcpu)
 	if (!hv_vcpu)
 		return;
 
-	for (i = 0; i < ARRAY_SIZE(hv_vcpu->stimer); i++)
-		if (test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap)) {
-			stimer = &hv_vcpu->stimer[i];
-			if (stimer->config.enable) {
-				exp_time = stimer->exp_time;
-
-				if (exp_time) {
-					time_now =
-						get_time_ref_counter(vcpu->kvm);
-					if (time_now >= exp_time)
-						stimer_expiration(stimer);
-				}
-
-				if ((stimer->config.enable) &&
-				    stimer->count) {
-					if (!stimer->msg_pending)
-						stimer_start(stimer);
-				} else
-					stimer_cleanup(stimer);
-			}
+	for_each_test_and_clear_bit(i, hv_vcpu->stimer_pending_bitmap,
+				    ARRAY_SIZE(hv_vcpu->stimer)) {
+		stimer = &hv_vcpu->stimer[i];
+		if (!stimer->config.enable)
+			continue;
+
+		exp_time = stimer->exp_time;
+
+		if (exp_time) {
+			time_now = get_time_ref_counter(vcpu->kvm);
+			if (time_now >= exp_time)
+				stimer_expiration(stimer);
 		}
+
+		if (stimer->config.enable && stimer->count) {
+			if (!stimer->msg_pending)
+				stimer_start(stimer);
+		} else {
+			stimer_cleanup(stimer);
+		}
+	}
 }
 
 void kvm_hv_vcpu_uninit(struct kvm_vcpu *vcpu)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 14/40] PCI: hv: Optimize hv_get_dom_num() by using find_and_set_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (12 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 13/40] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 15/40] scsi: core: optimize scsi_evt_emit() by using an atomic iterator Yury Norov
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, Bjorn Helgaas, linux-hyperv, linux-pci
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov, Michael Kelley

The function traverses bitmap with for_each_clear_bit() just to allocate
a bit atomically. Simplify it by using dedicated find_and_set_bit().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/controller/pci-hyperv.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 5992280e8110..d8a3ca9a7378 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -37,6 +37,7 @@
  * the PCI back-end driver in Hyper-V.
  */
 
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -3599,12 +3600,9 @@ static u16 hv_get_dom_num(u16 dom)
 	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
 		return dom;
 
-	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
-		if (test_and_set_bit(i, hvpci_dom_map) == 0)
-			return i;
-	}
+	i = find_and_set_bit(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
 
-	return HVPCI_DOM_INVALID;
+	return i < HVPCI_DOM_MAP_SIZE ? i : HVPCI_DOM_INVALID;
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 15/40] scsi: core: optimize scsi_evt_emit() by using an atomic iterator
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (13 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 14/40] PCI: hv: Optimize hv_get_dom_num() by using find_and_set_bit() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 16/40] scsi: mpi3mr: optimize the driver by using find_and_set_bit() Yury Norov
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Sathya Prakash Veerichetty, Kashyap Desai,
	Sumit Saxena, Sreekanth Reddy, James E.J. Bottomley,
	Martin K. Petersen, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, mpi3mr-linuxdrv.pdl, linux-scsi
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

A plain loop in scsi_evt_thread() opencodes optimized atomic bit traversing
macro. Simplify it by using the dedicated iterator.

CC: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/scsi/scsi_lib.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ec39acc986d6..72bebe5247e7 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -13,6 +13,7 @@
 #include <linux/bitops.h>
 #include <linux/blkdev.h>
 #include <linux/completion.h>
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <linux/init.h>
@@ -2588,14 +2589,13 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt)
 void scsi_evt_thread(struct work_struct *work)
 {
 	struct scsi_device *sdev;
-	enum scsi_device_event evt_type;
+	enum scsi_device_event evt_type = SDEV_EVT_FIRST;
 	LIST_HEAD(event_list);
 
 	sdev = container_of(work, struct scsi_device, event_work);
 
-	for (evt_type = SDEV_EVT_FIRST; evt_type <= SDEV_EVT_LAST; evt_type++)
-		if (test_and_clear_bit(evt_type, sdev->pending_events))
-			sdev_evt_send_simple(sdev, evt_type, GFP_KERNEL);
+	for_each_test_and_clear_bit_from(evt_type, sdev->pending_events, SDEV_EVT_LAST + 1)
+		sdev_evt_send_simple(sdev, evt_type, GFP_KERNEL);
 
 	while (1) {
 		struct scsi_event *evt;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 16/40] scsi: mpi3mr: optimize the driver by using find_and_set_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (14 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 15/40] scsi: core: optimize scsi_evt_emit() by using an atomic iterator Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 17/40] scsi: qedi: optimize qedi_get_task_idx() " Yury Norov
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Sathya Prakash Veerichetty, Kashyap Desai,
	Sumit Saxena, Sreekanth Reddy, James E.J. Bottomley,
	Martin K. Petersen, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, mpi3mr-linuxdrv.pdl, linux-scsi
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

mpi3mr_dev_rmhs_send_tm() and mpi3mr_send_event_ack() opencode
find_and_set_bit(). Simplify them by using dedicated function.

CC: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/scsi/mpi3mr/mpi3mr_os.c | 22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_os.c b/drivers/scsi/mpi3mr/mpi3mr_os.c
index bce639a6cca1..8ad1521dd0b3 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_os.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_os.c
@@ -7,6 +7,7 @@
  *
  */
 
+#include <linux/find_atomic.h>
 #include "mpi3mr.h"
 #include <linux/idr.h>
 
@@ -2292,13 +2293,9 @@ static void mpi3mr_dev_rmhs_send_tm(struct mpi3mr_ioc *mrioc, u16 handle,
 	if (drv_cmd)
 		goto issue_cmd;
 	do {
-		cmd_idx = find_first_zero_bit(mrioc->devrem_bitmap,
-		    MPI3MR_NUM_DEVRMCMD);
-		if (cmd_idx < MPI3MR_NUM_DEVRMCMD) {
-			if (!test_and_set_bit(cmd_idx, mrioc->devrem_bitmap))
-				break;
-			cmd_idx = MPI3MR_NUM_DEVRMCMD;
-		}
+		cmd_idx = find_and_set_bit(mrioc->devrem_bitmap, MPI3MR_NUM_DEVRMCMD);
+		if (cmd_idx < MPI3MR_NUM_DEVRMCMD)
+			break;
 	} while (retrycount--);
 
 	if (cmd_idx >= MPI3MR_NUM_DEVRMCMD) {
@@ -2433,14 +2430,9 @@ static void mpi3mr_send_event_ack(struct mpi3mr_ioc *mrioc, u8 event,
 	    "sending event ack in the top half for event(0x%02x), event_ctx(0x%08x)\n",
 	    event, event_ctx);
 	do {
-		cmd_idx = find_first_zero_bit(mrioc->evtack_cmds_bitmap,
-		    MPI3MR_NUM_EVTACKCMD);
-		if (cmd_idx < MPI3MR_NUM_EVTACKCMD) {
-			if (!test_and_set_bit(cmd_idx,
-			    mrioc->evtack_cmds_bitmap))
-				break;
-			cmd_idx = MPI3MR_NUM_EVTACKCMD;
-		}
+		cmd_idx = find_and_set_bit(mrioc->evtack_cmds_bitmap, MPI3MR_NUM_EVTACKCMD);
+		if (cmd_idx < MPI3MR_NUM_EVTACKCMD)
+			break;
 	} while (retrycount--);
 
 	if (cmd_idx >= MPI3MR_NUM_EVTACKCMD) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 17/40] scsi: qedi: optimize qedi_get_task_idx() by using find_and_set_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (15 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 16/40] scsi: mpi3mr: optimize the driver by using find_and_set_bit() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 18/40] powerpc: optimize arch code by using atomic find_bit() API Yury Norov
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Sathya Prakash Veerichetty, Kashyap Desai,
	Sumit Saxena, Sreekanth Reddy, James E.J. Bottomley,
	Martin K. Petersen, Nilesh Javali, Manish Rangankar,
	GR-QLogic-Storage-Upstream, mpi3mr-linuxdrv.pdl, linux-scsi
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

qedi_get_task_idx() opencodes find_and_set_bit(). Simplify it and make the
whole function a simiple almost one-liner.

CC: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/scsi/qedi/qedi_main.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index cd0180b1f5b9..a6e63a6c25fe 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/find_atomic.h>
 #include <linux/pci.h>
 #include <linux/kernel.h>
 #include <linux/if_arp.h>
@@ -1824,20 +1825,13 @@ int qedi_get_task_idx(struct qedi_ctx *qedi)
 {
 	s16 tmp_idx;
 
-again:
-	tmp_idx = find_first_zero_bit(qedi->task_idx_map,
-				      MAX_ISCSI_TASK_ENTRIES);
+	tmp_idx = find_and_set_bit(qedi->task_idx_map, MAX_ISCSI_TASK_ENTRIES);
 
 	if (tmp_idx >= MAX_ISCSI_TASK_ENTRIES) {
 		QEDI_ERR(&qedi->dbg_ctx, "FW task context pool is full.\n");
 		tmp_idx = -1;
-		goto err_idx;
 	}
 
-	if (test_and_set_bit(tmp_idx, qedi->task_idx_map))
-		goto again;
-
-err_idx:
 	return tmp_idx;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 18/40] powerpc: optimize arch code by using atomic find_bit() API
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (16 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 17/40] scsi: qedi: optimize qedi_get_task_idx() " Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 19/40] iommu: optimize subsystem " Yury Norov
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Colin Ian King, linuxppc-dev
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

Use find_and_{set,clear}_bit() where appropriate and simplify the logic.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/powerpc/mm/book3s32/mmu_context.c     | 11 +++---
 arch/powerpc/platforms/pasemi/dma_lib.c    | 46 ++++++----------------
 arch/powerpc/platforms/powernv/pci-sriov.c | 13 ++----
 3 files changed, 20 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/mm/book3s32/mmu_context.c b/arch/powerpc/mm/book3s32/mmu_context.c
index 1922f9a6b058..ece7b55b6cdb 100644
--- a/arch/powerpc/mm/book3s32/mmu_context.c
+++ b/arch/powerpc/mm/book3s32/mmu_context.c
@@ -17,6 +17,7 @@
  *    Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
  */
 
+#include <linux/find_atomic.h>
 #include <linux/mm.h>
 #include <linux/init.h>
 #include <linux/export.h>
@@ -50,13 +51,11 @@ static unsigned long context_map[LAST_CONTEXT / BITS_PER_LONG + 1];
 
 unsigned long __init_new_context(void)
 {
-	unsigned long ctx = next_mmu_context;
+	unsigned long ctx;
 
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+	ctx = find_and_set_next_bit(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	if (ctx > LAST_CONTEXT)
+		ctx = 0;
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 
 	return ctx;
diff --git a/arch/powerpc/platforms/pasemi/dma_lib.c b/arch/powerpc/platforms/pasemi/dma_lib.c
index 1be1f18f6f09..db008902e5f3 100644
--- a/arch/powerpc/platforms/pasemi/dma_lib.c
+++ b/arch/powerpc/platforms/pasemi/dma_lib.c
@@ -5,6 +5,7 @@
  * Common functions for DMA access on PA Semi PWRficient
  */
 
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <linux/pci.h>
@@ -118,14 +119,9 @@ static int pasemi_alloc_tx_chan(enum pasemi_dmachan_type type)
 		limit = MAX_TXCH;
 		break;
 	}
-retry:
-	bit = find_next_bit(txch_free, MAX_TXCH, start);
-	if (bit >= limit)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, txch_free))
-		goto retry;
-
-	return bit;
+
+	bit = find_and_clear_next_bit(txch_free, MAX_TXCH, start);
+	return bit < limit ? bit : -ENOSPC;
 }
 
 static void pasemi_free_tx_chan(int chan)
@@ -136,15 +132,9 @@ static void pasemi_free_tx_chan(int chan)
 
 static int pasemi_alloc_rx_chan(void)
 {
-	int bit;
-retry:
-	bit = find_first_bit(rxch_free, MAX_RXCH);
-	if (bit >= MAX_TXCH)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, rxch_free))
-		goto retry;
-
-	return bit;
+	int bit = find_and_clear_bit(rxch_free, MAX_RXCH);
+
+	return bit < MAX_TXCH ? bit : -ENOSPC;
 }
 
 static void pasemi_free_rx_chan(int chan)
@@ -374,16 +364,9 @@ EXPORT_SYMBOL(pasemi_dma_free_buf);
  */
 int pasemi_dma_alloc_flag(void)
 {
-	int bit;
+	int bit = find_and_clear_bit(flags_free, MAX_FLAGS);
 
-retry:
-	bit = find_first_bit(flags_free, MAX_FLAGS);
-	if (bit >= MAX_FLAGS)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, flags_free))
-		goto retry;
-
-	return bit;
+	return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_flag);
 
@@ -439,16 +422,9 @@ EXPORT_SYMBOL(pasemi_dma_clear_flag);
  */
 int pasemi_dma_alloc_fun(void)
 {
-	int bit;
-
-retry:
-	bit = find_first_bit(fun_free, MAX_FLAGS);
-	if (bit >= MAX_FLAGS)
-		return -ENOSPC;
-	if (!test_and_clear_bit(bit, fun_free))
-		goto retry;
+	int bit = find_and_clear_bit(fun_free, MAX_FLAGS);
 
-	return bit;
+	return bit < MAX_FLAGS ? bit : -ENOSPC;
 }
 EXPORT_SYMBOL(pasemi_dma_alloc_fun);
 
diff --git a/arch/powerpc/platforms/powernv/pci-sriov.c b/arch/powerpc/platforms/powernv/pci-sriov.c
index cc7b1dd54ac6..e33e57c559f7 100644
--- a/arch/powerpc/platforms/powernv/pci-sriov.c
+++ b/arch/powerpc/platforms/powernv/pci-sriov.c
@@ -3,6 +3,7 @@
 #include <linux/kernel.h>
 #include <linux/ioport.h>
 #include <linux/bitmap.h>
+#include <linux/find_atomic.h>
 #include <linux/pci.h>
 
 #include <asm/opal.h>
@@ -397,18 +398,12 @@ static int64_t pnv_ioda_map_m64_single(struct pnv_phb *phb,
 
 static int pnv_pci_alloc_m64_bar(struct pnv_phb *phb, struct pnv_iov_data *iov)
 {
-	int win;
+	int win = find_and_set_bit(&phb->ioda.m64_bar_alloc, phb->ioda.m64_bar_idx + 1);
 
-	do {
-		win = find_next_zero_bit(&phb->ioda.m64_bar_alloc,
-				phb->ioda.m64_bar_idx + 1, 0);
-
-		if (win >= phb->ioda.m64_bar_idx + 1)
-			return -1;
-	} while (test_and_set_bit(win, &phb->ioda.m64_bar_alloc));
+	if (win >= phb->ioda.m64_bar_idx + 1)
+		return -1;
 
 	set_bit(win, iov->used_m64_bar_mask);
-
 	return win;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 19/40] iommu: optimize subsystem by using atomic find_bit() API
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (17 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 18/40] powerpc: optimize arch code by using atomic find_bit() API Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-25 12:16   ` Joerg Roedel
  2024-06-20 17:56 ` [PATCH v4 20/40] media: radio-shark: optimize the driver " Yury Norov
                   ` (21 subsequent siblings)
  40 siblings, 1 reply; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Will Deacon, Robin Murphy, Joerg Roedel, Andy Gross,
	Bjorn Andersson, Konrad Dybcio, linux-arm-kernel, iommu,
	linux-arm-msm
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

Simplify __arm_smmu_alloc_bitmap() and msm_iommu_alloc_ctx() by using
a dedicated API, and make them nice one-liner wrappers.

While here, refactor msm_iommu_attach_dev() and msm_iommu_alloc_ctx()
so that error codes don't mismatch.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 11 +++--------
 drivers/iommu/msm_iommu.c             | 19 +++++--------------
 2 files changed, 8 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 4765c6945c34..c74d0300b64b 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -15,6 +15,7 @@
 #include <linux/bits.h>
 #include <linux/clk.h>
 #include <linux/device.h>
+#include <linux/find_atomic.h>
 #include <linux/io-64-nonatomic-hi-lo.h>
 #include <linux/io-pgtable.h>
 #include <linux/iommu.h>
@@ -455,15 +456,9 @@ struct arm_smmu_impl {
 
 static inline int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
 {
-	int idx;
+	int idx = find_and_set_next_bit(map, end, start);
 
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
+	return idx < end ? idx : -ENOSPC;
 }
 
 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 989e0869d805..4299e6a5b2ec 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -9,6 +9,7 @@
 #include <linux/init.h>
 #include <linux/platform_device.h>
 #include <linux/errno.h>
+#include <linux/find_atomic.h>
 #include <linux/io.h>
 #include <linux/io-pgtable.h>
 #include <linux/interrupt.h>
@@ -185,17 +186,9 @@ static const struct iommu_flush_ops msm_iommu_flush_ops = {
 	.tlb_add_page = __flush_iotlb_page,
 };
 
-static int msm_iommu_alloc_ctx(unsigned long *map, int start, int end)
+static int msm_iommu_alloc_ctx(struct msm_iommu_dev *iommu)
 {
-	int idx;
-
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
+	return find_and_set_bit(iommu->context_map, iommu->ncb);
 }
 
 static void msm_iommu_free_ctx(unsigned long *map, int idx)
@@ -418,10 +411,8 @@ static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 					ret = -EEXIST;
 					goto fail;
 				}
-				master->num =
-					msm_iommu_alloc_ctx(iommu->context_map,
-							    0, iommu->ncb);
-				if (IS_ERR_VALUE(master->num)) {
+				master->num = msm_iommu_alloc_ctx(iommu);
+				if (master->num >= iommu->ncb) {
 					ret = -ENODEV;
 					goto fail;
 				}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 20/40] media: radio-shark: optimize the driver by using atomic find_bit() API
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (18 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 19/40] iommu: optimize subsystem " Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 21/40] sfc: " Yury Norov
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Hans Verkuil, Mauro Carvalho Chehab, linux-media
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov, Hans Verkuil

Despite that it's only 2- or 3-bit maps, convert for-loop followed by
test_bit() to for_each_test_and_clear_bit() as it makes the code cleaner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
---
 drivers/media/radio/radio-shark.c  | 6 ++----
 drivers/media/radio/radio-shark2.c | 6 ++----
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/media/radio/radio-shark.c b/drivers/media/radio/radio-shark.c
index 127a3be0e0f0..c7e5c08d034a 100644
--- a/drivers/media/radio/radio-shark.c
+++ b/drivers/media/radio/radio-shark.c
@@ -21,6 +21,7 @@
  * GNU General Public License for more details.
 */
 
+#include <linux/find_atomic.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/leds.h>
@@ -158,10 +159,7 @@ static void shark_led_work(struct work_struct *work)
 		container_of(work, struct shark_device, led_work);
 	int i, res, brightness, actual_len;
 
-	for (i = 0; i < 3; i++) {
-		if (!test_and_clear_bit(i, &shark->brightness_new))
-			continue;
-
+	for_each_test_and_clear_bit(i, &shark->brightness_new, 3) {
 		brightness = atomic_read(&shark->brightness[i]);
 		memset(shark->transfer_buffer, 0, TB_LEN);
 		if (i != RED_LED) {
diff --git a/drivers/media/radio/radio-shark2.c b/drivers/media/radio/radio-shark2.c
index e3e6aa87fe08..d897a3e6fcb0 100644
--- a/drivers/media/radio/radio-shark2.c
+++ b/drivers/media/radio/radio-shark2.c
@@ -21,6 +21,7 @@
  * GNU General Public License for more details.
  */
 
+#include <linux/find_atomic.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/leds.h>
@@ -145,10 +146,7 @@ static void shark_led_work(struct work_struct *work)
 		container_of(work, struct shark_device, led_work);
 	int i, res, brightness, actual_len;
 
-	for (i = 0; i < 2; i++) {
-		if (!test_and_clear_bit(i, &shark->brightness_new))
-			continue;
-
+	for_each_test_and_clear_bit(i, &shark->brightness_new, 2) {
 		brightness = atomic_read(&shark->brightness[i]);
 		memset(shark->transfer_buffer, 0, TB_LEN);
 		shark->transfer_buffer[0] = 0x83 + i;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 21/40] sfc: optimize the driver by using atomic find_bit() API
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (19 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 20/40] media: radio-shark: optimize the driver " Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 22/40] tty: nozomi: optimize interrupt_handler() Yury Norov
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Edward Cree, Martin Habets, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
	linux-net-drivers
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

SFC code traverses rps_slot_map and rxq_retry_mask bit by bit. Simplify
it by using dedicated atomic find_bit() functions, as they skip already
clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
---
 drivers/net/ethernet/sfc/rx_common.c         |  5 ++---
 drivers/net/ethernet/sfc/siena/rx_common.c   |  5 ++---
 drivers/net/ethernet/sfc/siena/siena_sriov.c | 15 +++++++--------
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index dcd901eccfc8..370a2d20ccfb 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -9,6 +9,7 @@
  */
 
 #include "net_driver.h"
+#include <linux/find_atomic.h>
 #include <linux/module.h>
 #include <linux/iommu.h>
 #include <net/rps.h>
@@ -953,9 +954,7 @@ int efx_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
 	int rc;
 
 	/* find a free slot */
-	for (slot_idx = 0; slot_idx < EFX_RPS_MAX_IN_FLIGHT; slot_idx++)
-		if (!test_and_set_bit(slot_idx, &efx->rps_slot_map))
-			break;
+	slot_idx = find_and_set_bit(&efx->rps_slot_map, EFX_RPS_MAX_IN_FLIGHT);
 	if (slot_idx >= EFX_RPS_MAX_IN_FLIGHT)
 		return -EBUSY;
 
diff --git a/drivers/net/ethernet/sfc/siena/rx_common.c b/drivers/net/ethernet/sfc/siena/rx_common.c
index 219fb358a646..fc1d4d02beb6 100644
--- a/drivers/net/ethernet/sfc/siena/rx_common.c
+++ b/drivers/net/ethernet/sfc/siena/rx_common.c
@@ -9,6 +9,7 @@
  */
 
 #include "net_driver.h"
+#include <linux/find_atomic.h>
 #include <linux/module.h>
 #include <linux/iommu.h>
 #include <net/rps.h>
@@ -959,9 +960,7 @@ int efx_siena_filter_rfs(struct net_device *net_dev, const struct sk_buff *skb,
 	int rc;
 
 	/* find a free slot */
-	for (slot_idx = 0; slot_idx < EFX_RPS_MAX_IN_FLIGHT; slot_idx++)
-		if (!test_and_set_bit(slot_idx, &efx->rps_slot_map))
-			break;
+	slot_idx = find_and_set_bit(&efx->rps_slot_map, EFX_RPS_MAX_IN_FLIGHT);
 	if (slot_idx >= EFX_RPS_MAX_IN_FLIGHT)
 		return -EBUSY;
 
diff --git a/drivers/net/ethernet/sfc/siena/siena_sriov.c b/drivers/net/ethernet/sfc/siena/siena_sriov.c
index 8353c15dc233..f643413f9c20 100644
--- a/drivers/net/ethernet/sfc/siena/siena_sriov.c
+++ b/drivers/net/ethernet/sfc/siena/siena_sriov.c
@@ -3,6 +3,7 @@
  * Driver for Solarflare network controllers and boards
  * Copyright 2010-2012 Solarflare Communications Inc.
  */
+#include <linux/find_atomic.h>
 #include <linux/pci.h>
 #include <linux/module.h>
 #include "net_driver.h"
@@ -722,14 +723,12 @@ static int efx_vfdi_fini_all_queues(struct siena_vf *vf)
 					     efx_vfdi_flush_wake(vf),
 					     timeout);
 		rxqs_count = 0;
-		for (index = 0; index < count; ++index) {
-			if (test_and_clear_bit(index, vf->rxq_retry_mask)) {
-				atomic_dec(&vf->rxq_retry_count);
-				MCDI_SET_ARRAY_DWORD(
-					inbuf, FLUSH_RX_QUEUES_IN_QID_OFST,
-					rxqs_count, vf_offset + index);
-				rxqs_count++;
-			}
+		for_each_test_and_clear_bit(index, vf->rxq_retry_mask, count) {
+			atomic_dec(&vf->rxq_retry_count);
+			MCDI_SET_ARRAY_DWORD(
+				inbuf, FLUSH_RX_QUEUES_IN_QID_OFST,
+				rxqs_count, vf_offset + index);
+			rxqs_count++;
 		}
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 22/40] tty: nozomi: optimize interrupt_handler()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (20 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 21/40] sfc: " Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 23/40] usb: cdc-acm: optimize acm_softint() Yury Norov
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Greg Kroah-Hartman, Jiri Slaby, linux-serial
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

In the exit path of interrupt_handler(), dc->flip map is traversed bit
by bit to find and clear set bits and call tty_flip_buffer_push() for
corresponding ports.

Simplify it by using for_each_test_and_clear_bit(), as it skips already
clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/tty/nozomi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/nozomi.c b/drivers/tty/nozomi.c
index e28a921c1637..2fe063190867 100644
--- a/drivers/tty/nozomi.c
+++ b/drivers/tty/nozomi.c
@@ -28,6 +28,7 @@
 /* Enable this to have a lot of debug printouts */
 #define DEBUG
 
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -1201,9 +1202,8 @@ static irqreturn_t interrupt_handler(int irq, void *dev_id)
 exit_handler:
 	spin_unlock(&dc->spin_mutex);
 
-	for (a = 0; a < NOZOMI_MAX_PORTS; a++)
-		if (test_and_clear_bit(a, &dc->flip))
-			tty_flip_buffer_push(&dc->port[a].port);
+	for_each_test_and_clear_bit(a, &dc->flip, NOZOMI_MAX_PORTS)
+		tty_flip_buffer_push(&dc->port[a].port);
 
 	return IRQ_HANDLED;
 none:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 23/40] usb: cdc-acm: optimize acm_softint()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (21 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 22/40] tty: nozomi: optimize interrupt_handler() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-27 14:03   ` Greg Kroah-Hartman
  2024-06-20 17:56 ` [PATCH v4 24/40] RDMA/rtrs: optimize __rtrs_get_permit() by using find_and_set_bit_lock() Yury Norov
                   ` (17 subsequent siblings)
  40 siblings, 1 reply; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Oliver Neukum, Greg Kroah-Hartman, linux-usb
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

acm_softint() uses for-loop to traverse urbs_in_error_delay bitmap
bit by bit to find and clear set bits.

Simplify it by using for_each_test_and_clear_bit(), because it doesn't
test already clear bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
---
 drivers/usb/class/cdc-acm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 0e7439dba8fe..f8940f0d7ad8 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -18,6 +18,7 @@
 #undef DEBUG
 #undef VERBOSE_DEBUG
 
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/sched/signal.h>
 #include <linux/errno.h>
@@ -613,9 +614,8 @@ static void acm_softint(struct work_struct *work)
 	}
 
 	if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
-		for (i = 0; i < acm->rx_buflimit; i++)
-			if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
-				acm_submit_read_urb(acm, i, GFP_KERNEL);
+		for_each_test_and_clear_bit(i, &acm->urbs_in_error_delay, acm->rx_buflimit)
+			acm_submit_read_urb(acm, i, GFP_KERNEL);
 	}
 
 	if (test_and_clear_bit(EVENT_TTY_WAKEUP, &acm->flags))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 24/40] RDMA/rtrs: optimize __rtrs_get_permit() by using find_and_set_bit_lock()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (22 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 23/40] usb: cdc-acm: optimize acm_softint() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-27 12:59   ` Jinpu Wang
  2024-06-20 17:56 ` [PATCH v4 25/40] mISDN: optimize get_free_devid() Yury Norov
                   ` (16 subsequent siblings)
  40 siblings, 1 reply; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Md. Haris Iqbal, Jack Wang, Jason Gunthorpe,
	Leon Romanovsky, linux-rdma
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

The function opencodes find_and_set_bit_lock() with a while-loop polling
on test_and_set_bit_lock(). Use the dedicated function instead.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/infiniband/ulp/rtrs/rtrs-clt.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 88106cf5ce55..52b7728f6c63 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -10,6 +10,7 @@
 #undef pr_fmt
 #define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
 
+#include <linux/find_atomic.h>
 #include <linux/module.h>
 #include <linux/rculist.h>
 #include <linux/random.h>
@@ -72,18 +73,9 @@ __rtrs_get_permit(struct rtrs_clt_sess *clt, enum rtrs_clt_con_type con_type)
 	struct rtrs_permit *permit;
 	int bit;
 
-	/*
-	 * Adapted from null_blk get_tag(). Callers from different cpus may
-	 * grab the same bit, since find_first_zero_bit is not atomic.
-	 * But then the test_and_set_bit_lock will fail for all the
-	 * callers but one, so that they will loop again.
-	 * This way an explicit spinlock is not required.
-	 */
-	do {
-		bit = find_first_zero_bit(clt->permits_map, max_depth);
-		if (bit >= max_depth)
-			return NULL;
-	} while (test_and_set_bit_lock(bit, clt->permits_map));
+	bit = find_and_set_bit_lock(clt->permits_map, max_depth);
+	if (bit >= max_depth)
+		return NULL;
 
 	permit = get_permit(clt, bit);
 	WARN_ON(permit->mem_id != bit);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 25/40] mISDN: optimize get_free_devid()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (23 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 24/40] RDMA/rtrs: optimize __rtrs_get_permit() by using find_and_set_bit_lock() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 26/40] media: em28xx: cx231xx: optimize drivers by using find_and_set_bit() Yury Norov
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Karsten Keil, netdev
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

get_free_devid() traverses each bit in device_ids in an open-coded loop.
Simplify it by using the dedicated find_and_set_bit().

It makes the whole function a nice one-liner. And because MAX_DEVICE_ID
is a small constant-time value (63), on 64-bit platforms find_and_set_bit()
call will be optimized to:

	test_and_set_bit(ffs());

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/isdn/mISDN/core.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/isdn/mISDN/core.c b/drivers/isdn/mISDN/core.c
index ab8513a7acd5..d499b193529a 100644
--- a/drivers/isdn/mISDN/core.c
+++ b/drivers/isdn/mISDN/core.c
@@ -3,6 +3,7 @@
  * Copyright 2008  by Karsten Keil <kkeil@novell.com>
  */
 
+#include <linux/find_atomic.h>
 #include <linux/slab.h>
 #include <linux/types.h>
 #include <linux/stddef.h>
@@ -197,14 +198,9 @@ get_mdevice_count(void)
 static int
 get_free_devid(void)
 {
-	u_int	i;
+	int i = find_and_set_bit((u_long *)&device_ids, MAX_DEVICE_ID + 1);
 
-	for (i = 0; i <= MAX_DEVICE_ID; i++)
-		if (!test_and_set_bit(i, (u_long *)&device_ids))
-			break;
-	if (i > MAX_DEVICE_ID)
-		return -EBUSY;
-	return i;
+	return i <= MAX_DEVICE_ID ? i : -EBUSY;
 }
 
 int
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 26/40] media: em28xx: cx231xx: optimize drivers by using find_and_set_bit()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (24 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 25/40] mISDN: optimize get_free_devid() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 27/40] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Mauro Carvalho Chehab, Yury Norov, linux-media
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov, Hans Verkuil

Functions in the media/usb drivers opencode find_and_set_bit(). Simplify
them by using the function.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
---
 drivers/media/usb/cx231xx/cx231xx-cards.c | 17 +++++-----
 drivers/media/usb/em28xx/em28xx-cards.c   | 38 ++++++++++-------------
 2 files changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/media/usb/cx231xx/cx231xx-cards.c b/drivers/media/usb/cx231xx/cx231xx-cards.c
index 92efe6c1f47b..8bdfbc4454f1 100644
--- a/drivers/media/usb/cx231xx/cx231xx-cards.c
+++ b/drivers/media/usb/cx231xx/cx231xx-cards.c
@@ -9,6 +9,7 @@
  */
 
 #include "cx231xx.h"
+#include <linux/find_atomic.h>
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -1708,16 +1709,12 @@ static int cx231xx_usb_probe(struct usb_interface *interface,
 		return -ENODEV;
 
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(&cx231xx_devused, CX231XX_MAXBOARDS);
-		if (nr >= CX231XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_err(d,
-				"Supports only %i devices.\n",
-				CX231XX_MAXBOARDS);
-			return -ENOMEM;
-		}
-	} while (test_and_set_bit(nr, &cx231xx_devused));
+	nr = find_and_set_bit(&cx231xx_devused, CX231XX_MAXBOARDS);
+	if (nr >= CX231XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_err(d, "Supports only %i devices.\n", CX231XX_MAXBOARDS);
+		return -ENOMEM;
+	}
 
 	udev = usb_get_dev(interface_to_usbdev(interface));
 
diff --git a/drivers/media/usb/em28xx/em28xx-cards.c b/drivers/media/usb/em28xx/em28xx-cards.c
index bae76023cf71..59e6d7f894ad 100644
--- a/drivers/media/usb/em28xx/em28xx-cards.c
+++ b/drivers/media/usb/em28xx/em28xx-cards.c
@@ -11,6 +11,7 @@
 
 #include "em28xx.h"
 
+#include <linux/find_atomic.h>
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -3684,17 +3685,14 @@ static int em28xx_duplicate_dev(struct em28xx *dev)
 		return -ENOMEM;
 	}
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(em28xx_devused, EM28XX_MAXBOARDS);
-		if (nr >= EM28XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_warn(&dev->intf->dev, ": Supports only %i em28xx boards.\n",
-				 EM28XX_MAXBOARDS);
-			kfree(sec_dev);
-			dev->dev_next = NULL;
-			return -ENOMEM;
-		}
-	} while (test_and_set_bit(nr, em28xx_devused));
+	nr = find_and_set_bit(em28xx_devused, EM28XX_MAXBOARDS);
+	if (nr >= EM28XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_warn(&dev->intf->dev, ": Supports only %i em28xx boards.\n", EM28XX_MAXBOARDS);
+		kfree(sec_dev);
+		dev->dev_next = NULL;
+		return -ENOMEM;
+	}
 	sec_dev->devno = nr;
 	snprintf(sec_dev->name, 28, "em28xx #%d", nr);
 	sec_dev->dev_next = NULL;
@@ -3827,17 +3825,13 @@ static int em28xx_usb_probe(struct usb_interface *intf,
 	udev = usb_get_dev(interface_to_usbdev(intf));
 
 	/* Check to see next free device and mark as used */
-	do {
-		nr = find_first_zero_bit(em28xx_devused, EM28XX_MAXBOARDS);
-		if (nr >= EM28XX_MAXBOARDS) {
-			/* No free device slots */
-			dev_err(&intf->dev,
-				"Driver supports up to %i em28xx boards.\n",
-			       EM28XX_MAXBOARDS);
-			retval = -ENOMEM;
-			goto err_no_slot;
-		}
-	} while (test_and_set_bit(nr, em28xx_devused));
+	nr = find_and_set_bit(em28xx_devused, EM28XX_MAXBOARDS);
+	if (nr >= EM28XX_MAXBOARDS) {
+		/* No free device slots */
+		dev_err(&intf->dev, "Driver supports up to %i em28xx boards.\n", EM28XX_MAXBOARDS);
+		retval = -ENOMEM;
+		goto err_no_slot;
+	}
 
 	/* Don't register audio interfaces */
 	if (intf->altsetting[0].desc.bInterfaceClass == USB_CLASS_AUDIO) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 27/40] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (25 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 26/40] media: em28xx: cx231xx: optimize drivers by using find_and_set_bit() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 28/40] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Jiri Pirko, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

Optimize ofdpa_port_internal_vlan_id_get() by using find_and_set_bit(),
instead of polling every bit from bitmap in a for-loop.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 826990459fa4..d8fe018001b9 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -6,6 +6,7 @@
  * Copyright (c) 2014-2016 Jiri Pirko <jiri@mellanox.com>
  */
 
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/types.h>
 #include <linux/spinlock.h>
@@ -2249,14 +2250,11 @@ static __be16 ofdpa_port_internal_vlan_id_get(struct ofdpa_port *ofdpa_port,
 	found = entry;
 	hash_add(ofdpa->internal_vlan_tbl, &found->entry, found->ifindex);
 
-	for (i = 0; i < OFDPA_N_INTERNAL_VLANS; i++) {
-		if (test_and_set_bit(i, ofdpa->internal_vlan_bitmap))
-			continue;
+	i = find_and_set_bit(ofdpa->internal_vlan_bitmap, OFDPA_N_INTERNAL_VLANS);
+	if (i < OFDPA_N_INTERNAL_VLANS)
 		found->vlan_id = htons(OFDPA_INTERNAL_VLAN_ID_BASE + i);
-		goto found;
-	}
-
-	netdev_err(ofdpa_port->dev, "Out of internal VLAN IDs\n");
+	else
+		netdev_err(ofdpa_port->dev, "Out of internal VLAN IDs\n");
 
 found:
 	found->ref_count++;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 28/40] bluetooth: optimize cmtp_alloc_block_id()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (26 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 27/40] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 29/40] net: smc: optimize smc_wr_tx_get_free_slot_index() Yury Norov
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Karsten Keil, Marcel Holtmann, Johan Hedberg,
	Luiz Augusto von Dentz, Yury Norov, netdev, linux-bluetooth
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

Instead of polling every bit in blockids, use a dedicated
find_and_set_bit(), and make the function a simple one-liner.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 net/bluetooth/cmtp/core.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/net/bluetooth/cmtp/core.c b/net/bluetooth/cmtp/core.c
index 90d130588a3e..06732cf2661b 100644
--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -22,6 +22,7 @@
 
 #include <linux/module.h>
 
+#include <linux/find_atomic.h>
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/kernel.h>
@@ -88,15 +89,9 @@ static void __cmtp_copy_session(struct cmtp_session *session, struct cmtp_connin
 
 static inline int cmtp_alloc_block_id(struct cmtp_session *session)
 {
-	int i, id = -1;
+	int id = find_and_set_bit(&session->blockids, 16);
 
-	for (i = 0; i < 16; i++)
-		if (!test_and_set_bit(i, &session->blockids)) {
-			id = i;
-			break;
-		}
-
-	return id;
+	return id < 16 ? id : -1;
 }
 
 static inline void cmtp_free_block_id(struct cmtp_session *session, int id)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 29/40] net: smc: optimize smc_wr_tx_get_free_slot_index()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (27 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 28/40] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 30/40] ALSA: use atomic find_bit() functions where applicable Yury Norov
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Karsten Graul, Wenjia Zhang, Jan Karcher, D. Wythe,
	Tony Lu, Wen Gu, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-s390, netdev
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov, Alexandra Winter

Simplify the function by using find_and_set_bit() and make it a simple
almost one-liner.

While here, drop explicit initialization of *idx, because it's already
initialized by the caller in case of ENOLINK, or set properly with
->wr_tx_mask, if nothing is found, in case of EBUSY.

CC: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_wr.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 0021065a600a..941c2434a021 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -23,6 +23,7 @@
  */
 
 #include <linux/atomic.h>
+#include <linux/find_atomic.h>
 #include <linux/hashtable.h>
 #include <linux/wait.h>
 #include <rdma/ib_verbs.h>
@@ -170,15 +171,11 @@ void smc_wr_tx_cq_handler(struct ib_cq *ib_cq, void *cq_context)
 
 static inline int smc_wr_tx_get_free_slot_index(struct smc_link *link, u32 *idx)
 {
-	*idx = link->wr_tx_cnt;
 	if (!smc_link_sendable(link))
 		return -ENOLINK;
-	for_each_clear_bit(*idx, link->wr_tx_mask, link->wr_tx_cnt) {
-		if (!test_and_set_bit(*idx, link->wr_tx_mask))
-			return 0;
-	}
-	*idx = link->wr_tx_cnt;
-	return -EBUSY;
+
+	*idx = find_and_set_bit(link->wr_tx_mask, link->wr_tx_cnt);
+	return *idx < link->wr_tx_cnt ? 0 : -EBUSY;
 }
 
 /**
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 30/40] ALSA: use atomic find_bit() functions where applicable
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (28 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 29/40] net: smc: optimize smc_wr_tx_get_free_slot_index() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 31/40] m68k: optimize get_mmu_context() Yury Norov
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Jaroslav Kysela, Takashi Iwai, Daniel Mack,
	Cezary Rojewski, Kai Vehmanen, Yury Norov, Kees Cook, linux-sound,
	alsa-devel
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov, Takashi Iwai

ALSA code tests each bit in bitmaps in a for-loop. Switch it to
using dedicated atomic find_bit() API.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/hda_codec.c |  8 ++++----
 sound/usb/caiaq/audio.c   | 14 ++++++--------
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
index 325e8f0b99a8..7201afa82990 100644
--- a/sound/pci/hda/hda_codec.c
+++ b/sound/pci/hda/hda_codec.c
@@ -7,6 +7,7 @@
 
 #include <linux/init.h>
 #include <linux/delay.h>
+#include <linux/find_atomic.h>
 #include <linux/slab.h>
 #include <linux/mutex.h>
 #include <linux/module.h>
@@ -3263,10 +3264,9 @@ static int get_empty_pcm_device(struct hda_bus *bus, unsigned int type)
 
 #ifdef CONFIG_SND_DYNAMIC_MINORS
 	/* non-fixed slots starting from 10 */
-	for (i = 10; i < 32; i++) {
-		if (!test_and_set_bit(i, bus->pcm_dev_bits))
-			return i;
-	}
+	i = find_and_set_next_bit(bus->pcm_dev_bits, 32, 10);
+	if (i < 32)
+		return i;
 #endif
 
 	dev_warn(bus->card->dev, "Too many %s devices\n",
diff --git a/sound/usb/caiaq/audio.c b/sound/usb/caiaq/audio.c
index 4981753652a7..93ecd5cfcb7d 100644
--- a/sound/usb/caiaq/audio.c
+++ b/sound/usb/caiaq/audio.c
@@ -4,6 +4,7 @@
 */
 
 #include <linux/device.h>
+#include <linux/find_atomic.h>
 #include <linux/spinlock.h>
 #include <linux/slab.h>
 #include <linux/init.h>
@@ -610,7 +611,7 @@ static void read_completed(struct urb *urb)
 	struct snd_usb_caiaq_cb_info *info = urb->context;
 	struct snd_usb_caiaqdev *cdev;
 	struct device *dev;
-	struct urb *out = NULL;
+	struct urb *out;
 	int i, frame, len, send_it = 0, outframe = 0;
 	unsigned long flags;
 	size_t offset = 0;
@@ -625,17 +626,14 @@ static void read_completed(struct urb *urb)
 		return;
 
 	/* find an unused output urb that is unused */
-	for (i = 0; i < N_URBS; i++)
-		if (test_and_set_bit(i, &cdev->outurb_active_mask) == 0) {
-			out = cdev->data_urbs_out[i];
-			break;
-		}
-
-	if (!out) {
+	i = find_and_set_bit(&cdev->outurb_active_mask, N_URBS);
+	if (i >= N_URBS) {
 		dev_err(dev, "Unable to find an output urb to use\n");
 		goto requeue;
 	}
 
+	out = cdev->data_urbs_out[i];
+
 	/* read the recently received packet and send back one which has
 	 * the same layout */
 	for (frame = 0; frame < FRAMES_PER_URB; frame++) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 31/40] m68k: optimize get_mmu_context()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (29 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 30/40] ALSA: use atomic find_bit() functions where applicable Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 32/40] microblaze: " Yury Norov
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Geert Uytterhoeven, Hugh Dickins, Andrew Morton,
	Yury Norov, linux-m68k
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov, Greg Ungerer

get_mmu_context() opencodes atomic find_and_set_bit_wrap(). Simplify
it by using find_and_set_bit_wrap().

CC: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Greg Ungerer <gerg@linux-m68k.org>
---
 arch/m68k/include/asm/mmu_context.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/m68k/include/asm/mmu_context.h b/arch/m68k/include/asm/mmu_context.h
index 141bbdfad960..2e61063aa621 100644
--- a/arch/m68k/include/asm/mmu_context.h
+++ b/arch/m68k/include/asm/mmu_context.h
@@ -3,6 +3,7 @@
 #define __M68K_MMU_CONTEXT_H
 
 #include <asm-generic/mm_hooks.h>
+#include <linux/find_atomic.h>
 #include <linux/mm_types.h>
 
 #ifdef CONFIG_MMU
@@ -35,12 +36,11 @@ static inline void get_mmu_context(struct mm_struct *mm)
 		atomic_inc(&nr_free_contexts);
 		steal_context();
 	}
-	ctx = next_mmu_context;
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+
+	do {
+		ctx = find_and_set_bit_wrap(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	} while (ctx > LAST_CONTEXT);
+
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 	mm->context = ctx;
 	context_mm[ctx] = mm;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 32/40] microblaze: optimize get_mmu_context()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (30 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 31/40] m68k: optimize get_mmu_context() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 33/40] sh: mach-x3proto: optimize ilsel_enable() Yury Norov
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Michal Simek, Yury Norov
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

Simplify get_mmu_context() by using find_and_set_bit_wrap().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/microblaze/include/asm/mmu_context_mm.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/microblaze/include/asm/mmu_context_mm.h b/arch/microblaze/include/asm/mmu_context_mm.h
index c2c77f708455..d4d1e80b3b66 100644
--- a/arch/microblaze/include/asm/mmu_context_mm.h
+++ b/arch/microblaze/include/asm/mmu_context_mm.h
@@ -9,6 +9,7 @@
 #define _ASM_MICROBLAZE_MMU_CONTEXT_H
 
 #include <linux/atomic.h>
+#include <linux/find_atomic.h>
 #include <linux/mm_types.h>
 #include <linux/sched.h>
 
@@ -82,12 +83,11 @@ static inline void get_mmu_context(struct mm_struct *mm)
 		return;
 	while (atomic_dec_if_positive(&nr_free_contexts) < 0)
 		steal_context();
-	ctx = next_mmu_context;
-	while (test_and_set_bit(ctx, context_map)) {
-		ctx = find_next_zero_bit(context_map, LAST_CONTEXT+1, ctx);
-		if (ctx > LAST_CONTEXT)
-			ctx = 0;
-	}
+
+	do {
+		ctx = find_and_set_bit_wrap(context_map, LAST_CONTEXT + 1, next_mmu_context);
+	} while (ctx > LAST_CONTEXT);
+
 	next_mmu_context = (ctx + 1) & LAST_CONTEXT;
 	mm->context = ctx;
 	context_mm[ctx] = mm;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 33/40] sh: mach-x3proto: optimize ilsel_enable()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (31 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 32/40] microblaze: " Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-21  8:48   ` John Paul Adrian Glaubitz
  2024-06-20 17:56 ` [PATCH v4 34/40] MIPS: sgi-ip27: optimize alloc_level() Yury Norov
                   ` (7 subsequent siblings)
  40 siblings, 1 reply; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Yoshinori Sato, Rich Felker,
	John Paul Adrian Glaubitz, Geert Uytterhoeven, Yury Norov,
	linux-sh
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

Simplify ilsel_enable() by using find_and_set_bit().

Geert also pointed the bug in the old implementation:

	I don't think the old code worked as intended: the first time
	no free bit is found, bit would have been ILSEL_LEVELS, and
	test_and_set_bit() would have returned false, thus terminating
	the loop, and continuing with an out-of-range bit value? Hence
	to work correctly, bit ILSEL_LEVELS of ilsel_level_map should
	have been initialized to one?  Or am I missing something?

The new code does not have that issue.

CC: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
 arch/sh/boards/mach-x3proto/ilsel.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/sh/boards/mach-x3proto/ilsel.c b/arch/sh/boards/mach-x3proto/ilsel.c
index f0d5eb41521a..35b585e154f0 100644
--- a/arch/sh/boards/mach-x3proto/ilsel.c
+++ b/arch/sh/boards/mach-x3proto/ilsel.c
@@ -8,6 +8,7 @@
  */
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/find_atomic.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -99,8 +100,8 @@ int ilsel_enable(ilsel_source_t set)
 	}
 
 	do {
-		bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
-	} while (test_and_set_bit(bit, &ilsel_level_map));
+		bit = find_and_set_bit(&ilsel_level_map, ILSEL_LEVELS);
+	} while (bit >= ILSEL_LEVELS);
 
 	__ilsel_enable(set, bit);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 34/40] MIPS: sgi-ip27: optimize alloc_level()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (32 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 33/40] sh: mach-x3proto: optimize ilsel_enable() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 35/40] uprobes: optimize xol_take_insn_slot() Yury Norov
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Thomas Bogendoerfer, Yury Norov, Florian Fainelli,
	linux-mips
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

Simplify alloc_level by using a dedicated atomic find() API, and make
it a nice one-liner wrappers.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/mips/sgi-ip27/ip27-irq.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/arch/mips/sgi-ip27/ip27-irq.c b/arch/mips/sgi-ip27/ip27-irq.c
index 00e63e9ef61d..fc29252860a3 100644
--- a/arch/mips/sgi-ip27/ip27-irq.c
+++ b/arch/mips/sgi-ip27/ip27-irq.c
@@ -13,6 +13,7 @@
 #include <linux/ioport.h>
 #include <linux/kernel.h>
 #include <linux/bitops.h>
+#include <linux/find_atomic.h>
 #include <linux/sched.h>
 
 #include <asm/io.h>
@@ -36,17 +37,9 @@ static DEFINE_PER_CPU(unsigned long [2], irq_enable_mask);
 
 static inline int alloc_level(void)
 {
-	int level;
+	int level = find_and_set_bit(hub_irq_map, IP27_HUB_IRQ_COUNT);
 
-again:
-	level = find_first_zero_bit(hub_irq_map, IP27_HUB_IRQ_COUNT);
-	if (level >= IP27_HUB_IRQ_COUNT)
-		return -ENOSPC;
-
-	if (test_and_set_bit(level, hub_irq_map))
-		goto again;
-
-	return level;
+	return level < IP27_HUB_IRQ_COUNT ? level : -ENOSPC;
 }
 
 static void enable_hub_irq(struct irq_data *d)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 35/40] uprobes: optimize xol_take_insn_slot()
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (33 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 34/40] MIPS: sgi-ip27: optimize alloc_level() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:56 ` [PATCH v4 36/40] scsi: sr: drop locking around SR index bitmap Yury Norov
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Liang, Kan, linux-perf-users
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

The function opencodes atomic find_bit() operation. Switch to using
a dedicated function.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 kernel/events/uprobes.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2c83ba776fc7..30654c41f0b2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -9,6 +9,7 @@
  * Copyright (C) 2011-2012 Red Hat, Inc., Peter Zijlstra
  */
 
+#include <linux/find_atomic.h>
 #include <linux/kernel.h>
 #include <linux/highmem.h>
 #include <linux/pagemap.h>	/* read_mapping_page */
@@ -1581,17 +1582,13 @@ static unsigned long xol_take_insn_slot(struct xol_area *area)
 	unsigned long slot_addr;
 	int slot_nr;
 
-	do {
-		slot_nr = find_first_zero_bit(area->bitmap, UINSNS_PER_PAGE);
-		if (slot_nr < UINSNS_PER_PAGE) {
-			if (!test_and_set_bit(slot_nr, area->bitmap))
-				break;
+	while (1) {
+		slot_nr = find_and_set_bit(area->bitmap, UINSNS_PER_PAGE);
+		if (slot_nr < UINSNS_PER_PAGE)
+			break;
 
-			slot_nr = UINSNS_PER_PAGE;
-			continue;
-		}
 		wait_event(area->wq, (atomic_read(&area->slot_count) < UINSNS_PER_PAGE));
-	} while (slot_nr >= UINSNS_PER_PAGE);
+	}
 
 	slot_addr = area->vaddr + (slot_nr * UPROBE_XOL_SLOT_BYTES);
 	atomic_inc(&area->slot_count);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 36/40] scsi: sr: drop locking around SR index bitmap
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (34 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 35/40] uprobes: optimize xol_take_insn_slot() Yury Norov
@ 2024-06-20 17:56 ` Yury Norov
  2024-06-20 17:57 ` [PATCH v4 37/40] KVM: PPC: Book3s HV: drop locking around kvmppc_uvmem_bitmap Yury Norov
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:56 UTC (permalink / raw)
  To: linux-kernel, James E.J. Bottomley, Martin K. Petersen,
	linux-scsi
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

The driver accesses the sr_index_bits bitmaps to set/clear individual
bits only. Now that we have an atomic bit search helper, we can drop
the sr_index_lock that protects the sr_index_bits, and make all this
routine lockless.

While there, use DECLARE_BITMAP() to declare sr_index_bits.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/scsi/sr.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 7ab000942b97..3b4e04ed8b4a 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -33,6 +33,7 @@
  *	check resource allocation in sr_init and some cleanups
  */
 
+#include <linux/find_atomic.h>
 #include <linux/module.h>
 #include <linux/fs.h>
 #include <linux/kernel.h>
@@ -103,8 +104,7 @@ static struct scsi_driver sr_template = {
 	.done			= sr_done,
 };
 
-static unsigned long sr_index_bits[SR_DISKS / BITS_PER_LONG];
-static DEFINE_SPINLOCK(sr_index_lock);
+static DECLARE_BITMAP(sr_index_bits, SR_DISKS);
 
 static struct lock_class_key sr_bio_compl_lkclass;
 
@@ -566,10 +566,7 @@ static void sr_free_disk(struct gendisk *disk)
 {
 	struct scsi_cd *cd = disk->private_data;
 
-	spin_lock(&sr_index_lock);
 	clear_bit(MINOR(disk_devt(disk)), sr_index_bits);
-	spin_unlock(&sr_index_lock);
-
 	unregister_cdrom(&cd->cdi);
 	mutex_destroy(&cd->lock);
 	kfree(cd);
@@ -628,15 +625,11 @@ static int sr_probe(struct device *dev)
 		goto fail_free;
 	mutex_init(&cd->lock);
 
-	spin_lock(&sr_index_lock);
-	minor = find_first_zero_bit(sr_index_bits, SR_DISKS);
+	minor = find_and_set_bit(sr_index_bits, SR_DISKS);
 	if (minor == SR_DISKS) {
-		spin_unlock(&sr_index_lock);
 		error = -EBUSY;
 		goto fail_put;
 	}
-	__set_bit(minor, sr_index_bits);
-	spin_unlock(&sr_index_lock);
 
 	disk->major = SCSI_CDROM_MAJOR;
 	disk->first_minor = minor;
@@ -700,9 +693,7 @@ static int sr_probe(struct device *dev)
 unregister_cdrom:
 	unregister_cdrom(&cd->cdi);
 fail_minor:
-	spin_lock(&sr_index_lock);
 	clear_bit(minor, sr_index_bits);
-	spin_unlock(&sr_index_lock);
 fail_put:
 	put_disk(disk);
 	mutex_destroy(&cd->lock);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 37/40] KVM: PPC: Book3s HV: drop locking around kvmppc_uvmem_bitmap
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (35 preceding siblings ...)
  2024-06-20 17:56 ` [PATCH v4 36/40] scsi: sr: drop locking around SR index bitmap Yury Norov
@ 2024-06-20 17:57 ` Yury Norov
  2024-06-20 17:57 ` [PATCH v4 38/40] wifi: mac80211: drop locking around ntp_fltr_bmap Yury Norov
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:57 UTC (permalink / raw)
  To: linux-kernel, Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Naveen N. Rao, linuxppc-dev, kvm
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

The driver operates on individual bits of the kvmppc_uvmem_bitmap.
Now that we have an atomic search API for bitmaps, we can rely on
it and drop locking around the bitmap entirely.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 33 ++++++++++--------------------
 1 file changed, 11 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 92f33115144b..93d09137cb23 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -86,6 +86,7 @@
  * page-sizes, we need to break this assumption.
  */
 
+#include <linux/find_atomic.h>
 #include <linux/pagemap.h>
 #include <linux/migrate.h>
 #include <linux/kvm_host.h>
@@ -99,7 +100,6 @@
 
 static struct dev_pagemap kvmppc_uvmem_pgmap;
 static unsigned long *kvmppc_uvmem_bitmap;
-static DEFINE_SPINLOCK(kvmppc_uvmem_bitmap_lock);
 
 /*
  * States of a GFN
@@ -697,23 +697,20 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
 	struct page *dpage = NULL;
 	unsigned long bit, uvmem_pfn;
 	struct kvmppc_uvmem_page_pvt *pvt;
-	unsigned long pfn_last, pfn_first;
+	unsigned long num_pfns, pfn_first;
 
 	pfn_first = kvmppc_uvmem_pgmap.range.start >> PAGE_SHIFT;
-	pfn_last = pfn_first +
-		   (range_len(&kvmppc_uvmem_pgmap.range) >> PAGE_SHIFT);
+	num_pfns = range_len(&kvmppc_uvmem_pgmap.range) >> PAGE_SHIFT;
 
-	spin_lock(&kvmppc_uvmem_bitmap_lock);
-	bit = find_first_zero_bit(kvmppc_uvmem_bitmap,
-				  pfn_last - pfn_first);
-	if (bit >= (pfn_last - pfn_first))
-		goto out;
-	bitmap_set(kvmppc_uvmem_bitmap, bit, 1);
-	spin_unlock(&kvmppc_uvmem_bitmap_lock);
+	bit = find_and_set_bit(kvmppc_uvmem_bitmap, num_pfns);
+	if (bit >= num_pfns)
+		return NULL;
 
 	pvt = kzalloc(sizeof(*pvt), GFP_KERNEL);
-	if (!pvt)
-		goto out_clear;
+	if (!pvt) {
+		clear_bit(bit, kvmppc_uvmem_bitmap);
+		return NULL;
+	}
 
 	uvmem_pfn = bit + pfn_first;
 	kvmppc_gfn_secure_uvmem_pfn(gpa >> PAGE_SHIFT, uvmem_pfn, kvm);
@@ -725,12 +722,6 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm)
 	dpage->zone_device_data = pvt;
 	zone_device_page_init(dpage);
 	return dpage;
-out_clear:
-	spin_lock(&kvmppc_uvmem_bitmap_lock);
-	bitmap_clear(kvmppc_uvmem_bitmap, bit, 1);
-out:
-	spin_unlock(&kvmppc_uvmem_bitmap_lock);
-	return NULL;
 }
 
 /*
@@ -1021,9 +1012,7 @@ static void kvmppc_uvmem_page_free(struct page *page)
 			(kvmppc_uvmem_pgmap.range.start >> PAGE_SHIFT);
 	struct kvmppc_uvmem_page_pvt *pvt;
 
-	spin_lock(&kvmppc_uvmem_bitmap_lock);
-	bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
-	spin_unlock(&kvmppc_uvmem_bitmap_lock);
+	clear_bit(pfn, kvmppc_uvmem_bitmap);
 
 	pvt = page->zone_device_data;
 	page->zone_device_data = NULL;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 38/40] wifi: mac80211: drop locking around ntp_fltr_bmap
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (36 preceding siblings ...)
  2024-06-20 17:57 ` [PATCH v4 37/40] KVM: PPC: Book3s HV: drop locking around kvmppc_uvmem_bitmap Yury Norov
@ 2024-06-20 17:57 ` Yury Norov
  2024-06-20 17:57 ` [PATCH v4 39/40] mailbox: bcm-flexrm: simplify locking scheme Yury Norov
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:57 UTC (permalink / raw)
  To: linux-kernel, Michael Chan, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

The driver operates on individual bits of the bitmap. Now that we have
atomic find_and_set_bit() helper, we can move the map manipulation out
of ntp_fltr_lock-protected area.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index c437ca1c0fd3..5f4c3449570d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -51,6 +51,7 @@
 #include <linux/bitmap.h>
 #include <linux/cpu_rmap.h>
 #include <linux/cpumask.h>
+#include <linux/find_atomic.h>
 #include <net/pkt_cls.h>
 #include <net/page_pool/helpers.h>
 #include <linux/align.h>
@@ -5616,17 +5617,16 @@ static int bnxt_init_l2_filter(struct bnxt *bp, struct bnxt_l2_filter *fltr,
 			       struct bnxt_l2_key *key, u32 idx)
 {
 	struct hlist_head *head;
+	int bit_id;
 
 	ether_addr_copy(fltr->l2_key.dst_mac_addr, key->dst_mac_addr);
 	fltr->l2_key.vlan = key->vlan;
 	fltr->base.type = BNXT_FLTR_TYPE_L2;
 	if (fltr->base.flags) {
-		int bit_id;
-
-		bit_id = bitmap_find_free_region(bp->ntp_fltr_bmap,
-						 bp->max_fltr, 0);
-		if (bit_id < 0)
+		bit_id = find_and_set_bit(bp->ntp_fltr_bmap, bp->max_fltr);
+		if (bit_id >= bp->max_fltr)
 			return -ENOMEM;
+
 		fltr->base.sw_id = (u16)bit_id;
 		bp->ntp_fltr_count++;
 	}
@@ -14396,13 +14396,11 @@ int bnxt_insert_ntp_filter(struct bnxt *bp, struct bnxt_ntuple_filter *fltr,
 	struct hlist_head *head;
 	int bit_id;
 
-	spin_lock_bh(&bp->ntp_fltr_lock);
-	bit_id = bitmap_find_free_region(bp->ntp_fltr_bmap, bp->max_fltr, 0);
-	if (bit_id < 0) {
-		spin_unlock_bh(&bp->ntp_fltr_lock);
+	bit_id = find_and_set_bit(bp->ntp_fltr_bmap, bp->max_fltr);
+	if (bit_id >= bp->max_fltr)
 		return -ENOMEM;
-	}
 
+	spin_lock_bh(&bp->ntp_fltr_lock);
 	fltr->base.sw_id = (u16)bit_id;
 	fltr->base.type = BNXT_FLTR_TYPE_NTUPLE;
 	fltr->base.flags |= BNXT_ACT_RING_DST;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 39/40] mailbox: bcm-flexrm: simplify locking scheme
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (37 preceding siblings ...)
  2024-06-20 17:57 ` [PATCH v4 38/40] wifi: mac80211: drop locking around ntp_fltr_bmap Yury Norov
@ 2024-06-20 17:57 ` Yury Norov
  2024-06-20 17:57 ` [PATCH v4 40/40] powerpc/xive: drop locking around IRQ map Yury Norov
  2024-06-20 18:00 ` [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Linus Torvalds
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:57 UTC (permalink / raw)
  To: linux-kernel, Jassi Brar
  Cc: Yury Norov, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

Use atomic find_and_set_bit() and drop locking around
ring->requests_bmap.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 drivers/mailbox/bcm-flexrm-mailbox.c | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c
index b1abc2a0c971..7aca533a1068 100644
--- a/drivers/mailbox/bcm-flexrm-mailbox.c
+++ b/drivers/mailbox/bcm-flexrm-mailbox.c
@@ -23,6 +23,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/dmapool.h>
 #include <linux/err.h>
+#include <linux/find_atomic.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
 #include <linux/mailbox_controller.h>
@@ -989,21 +990,17 @@ static int flexrm_new_request(struct flexrm_ring *ring,
 	msg->error = 0;
 
 	/* If no requests possible then save data pointer and goto done. */
-	spin_lock_irqsave(&ring->lock, flags);
-	reqid = bitmap_find_free_region(ring->requests_bmap,
-					RING_MAX_REQ_COUNT, 0);
-	spin_unlock_irqrestore(&ring->lock, flags);
-	if (reqid < 0)
+	reqid = find_and_set_bit(ring->requests_bmap, RING_MAX_REQ_COUNT);
+	if (reqid >= RING_MAX_REQ_COUNT)
 		return -ENOSPC;
+
 	ring->requests[reqid] = msg;
 
 	/* Do DMA mappings for the message */
 	ret = flexrm_dma_map(ring->mbox->dev, msg);
 	if (ret < 0) {
 		ring->requests[reqid] = NULL;
-		spin_lock_irqsave(&ring->lock, flags);
-		bitmap_release_region(ring->requests_bmap, reqid, 0);
-		spin_unlock_irqrestore(&ring->lock, flags);
+		clear_bit(reqid, ring->requests_bmap);
 		return ret;
 	}
 
@@ -1063,9 +1060,7 @@ static int flexrm_new_request(struct flexrm_ring *ring,
 	if (exit_cleanup) {
 		flexrm_dma_unmap(ring->mbox->dev, msg);
 		ring->requests[reqid] = NULL;
-		spin_lock_irqsave(&ring->lock, flags);
-		bitmap_release_region(ring->requests_bmap, reqid, 0);
-		spin_unlock_irqrestore(&ring->lock, flags);
+		clear_bit(reqid, ring->requests_bmap);
 	}
 
 	return ret;
@@ -1130,9 +1125,7 @@ static int flexrm_process_completions(struct flexrm_ring *ring)
 
 		/* Release reqid for recycling */
 		ring->requests[reqid] = NULL;
-		spin_lock_irqsave(&ring->lock, flags);
-		bitmap_release_region(ring->requests_bmap, reqid, 0);
-		spin_unlock_irqrestore(&ring->lock, flags);
+		clear_bit(reqid, ring->requests_bmap);
 
 		/* Unmap DMA mappings */
 		flexrm_dma_unmap(ring->mbox->dev, msg);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v4 40/40] powerpc/xive: drop locking around IRQ map
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (38 preceding siblings ...)
  2024-06-20 17:57 ` [PATCH v4 39/40] mailbox: bcm-flexrm: simplify locking scheme Yury Norov
@ 2024-06-20 17:57 ` Yury Norov
  2024-06-20 18:00 ` [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Linus Torvalds
  40 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-20 17:57 UTC (permalink / raw)
  To: linux-kernel, Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Naveen N. Rao, Yury Norov, linuxppc-dev
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

The code operates on individual bits of the bitmap, and leveraging
atomic find ops we can drop locking scheme around the map.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 arch/powerpc/sysdev/xive/spapr.c | 34 ++++++--------------------------
 1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/sysdev/xive/spapr.c b/arch/powerpc/sysdev/xive/spapr.c
index e45419264391..2b3b8ad75b42 100644
--- a/arch/powerpc/sysdev/xive/spapr.c
+++ b/arch/powerpc/sysdev/xive/spapr.c
@@ -17,6 +17,7 @@
 #include <linux/spinlock.h>
 #include <linux/bitmap.h>
 #include <linux/cpumask.h>
+#include <linux/find_atomic.h>
 #include <linux/mm.h>
 #include <linux/delay.h>
 #include <linux/libfdt.h>
@@ -41,7 +42,6 @@ struct xive_irq_bitmap {
 	unsigned long		*bitmap;
 	unsigned int		base;
 	unsigned int		count;
-	spinlock_t		lock;
 	struct list_head	list;
 };
 
@@ -55,7 +55,6 @@ static int __init xive_irq_bitmap_add(int base, int count)
 	if (!xibm)
 		return -ENOMEM;
 
-	spin_lock_init(&xibm->lock);
 	xibm->base = base;
 	xibm->count = count;
 	xibm->bitmap = bitmap_zalloc(xibm->count, GFP_KERNEL);
@@ -81,47 +80,26 @@ static void xive_irq_bitmap_remove_all(void)
 	}
 }
 
-static int __xive_irq_bitmap_alloc(struct xive_irq_bitmap *xibm)
-{
-	int irq;
-
-	irq = find_first_zero_bit(xibm->bitmap, xibm->count);
-	if (irq != xibm->count) {
-		set_bit(irq, xibm->bitmap);
-		irq += xibm->base;
-	} else {
-		irq = -ENOMEM;
-	}
-
-	return irq;
-}
-
 static int xive_irq_bitmap_alloc(void)
 {
 	struct xive_irq_bitmap *xibm;
-	unsigned long flags;
-	int irq = -ENOENT;
 
 	list_for_each_entry(xibm, &xive_irq_bitmaps, list) {
-		spin_lock_irqsave(&xibm->lock, flags);
-		irq = __xive_irq_bitmap_alloc(xibm);
-		spin_unlock_irqrestore(&xibm->lock, flags);
-		if (irq >= 0)
-			break;
+		int irq = find_and_set_bit(xibm->bitmap, xibm->count);
+
+		if (irq < xibm->count)
+			return irq + xibm->base;
 	}
-	return irq;
+	return -ENOENT;
 }
 
 static void xive_irq_bitmap_free(int irq)
 {
-	unsigned long flags;
 	struct xive_irq_bitmap *xibm;
 
 	list_for_each_entry(xibm, &xive_irq_bitmaps, list) {
 		if ((irq >= xibm->base) && (irq < xibm->base + xibm->count)) {
-			spin_lock_irqsave(&xibm->lock, flags);
 			clear_bit(irq - xibm->base, xibm->bitmap);
-			spin_unlock_irqrestore(&xibm->lock, flags);
 			break;
 		}
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
  2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
                   ` (39 preceding siblings ...)
  2024-06-20 17:57 ` [PATCH v4 40/40] powerpc/xive: drop locking around IRQ map Yury Norov
@ 2024-06-20 18:00 ` Linus Torvalds
  2024-06-20 18:32   ` Yury Norov
  40 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2024-06-20 18:00 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

On Thu, 20 Jun 2024 at 10:57, Yury Norov <yury.norov@gmail.com> wrote:
>
>
> The typical lock-protected bit allocation may look like this:

If it looks like this, then nobody cares. Clearly the user in question
never actually cared about performance, and you SHOULD NOT then say
"let's optimize this that nobody cares about":.

Yury, I spend an inordinate amount of time just double-checking your
patches. I ended up having to basically undo one of them just days
ago.

New rule: before you send some optimization, you need to have NUMBERS.

Some kind of "look, this code is visible in profiles, so we actually care".

Because without numbers, I'm just not going to pull anything from you.
These insane inlines for things that don't matter need to stop.

And if they *DO* matter, you need to show that they matter.

               Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
  2024-06-20 18:00 ` [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Linus Torvalds
@ 2024-06-20 18:32   ` Yury Norov
  2024-06-20 19:26     ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Yury Norov @ 2024-06-20 18:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

On Thu, Jun 20, 2024 at 11:00:38AM -0700, Linus Torvalds wrote:
> On Thu, 20 Jun 2024 at 10:57, Yury Norov <yury.norov@gmail.com> wrote:
> >
> >
> > The typical lock-protected bit allocation may look like this:
> 
> If it looks like this, then nobody cares. Clearly the user in question
> never actually cared about performance, and you SHOULD NOT then say
> "let's optimize this that nobody cares about":.
> 
> Yury, I spend an inordinate amount of time just double-checking your
> patches. I ended up having to basically undo one of them just days
> ago.

Is that in master already? I didn't get any email, and I can't find
anything related in the master branch.

> New rule: before you send some optimization, you need to have NUMBERS.

I tried to underline that it's not a performance optimization at my
best. People notice some performance differences, but it's ~3%, no
more.

> Some kind of "look, this code is visible in profiles, so we actually care".

The original motivation comes from a KCSAN report, so it's already
visible in profiles. See [1] in cover letter. This series doesn't fix
that particular issue, but it adds tooling that allow people to search
and acquire bits in bitmaps without firing KCSAN warnings.

This series fixes one real bug in the codebase - see #33, and
simplifies bitmaps usage in many other places. Many people like
it, and acked the patches.

Again, this is NOT a performance series.

Thanks,
Yury

> Because without numbers, I'm just not going to pull anything from you.
> These insane inlines for things that don't matter need to stop.
> 
> And if they *DO* matter, you need to show that they matter.
> 
>                Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
  2024-06-20 18:32   ` Yury Norov
@ 2024-06-20 19:26     ` Linus Torvalds
  2024-06-20 20:20       ` Yury Norov
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2024-06-20 19:26 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

On Thu, 20 Jun 2024 at 11:32, Yury Norov <yury.norov@gmail.com> wrote:
>
> Is that in master already? I didn't get any email, and I can't find
> anything related in the master branch.

It's 5d272dd1b343 ("cpumask: limit FORCE_NR_CPUS to just the UP case").

> > New rule: before you send some optimization, you need to have NUMBERS.
>
> I tried to underline that it's not a performance optimization at my
> best.

If it's not about performance, then it damn well shouldn't be 90%
inline functions in a header file.

If it's a helper function, it needs to be a real function elsewhere. Not this:

 include/linux/find_atomic.h                  | 324 +++++++++++++++++++

because either performance really matters, in which case you need to
show profiles, or performance doesn't matter, in which case it damn
well shouldn't have special cases for small bitsets that double the
size of the code.

              Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
  2024-06-20 19:26     ` Linus Torvalds
@ 2024-06-20 20:20       ` Yury Norov
  2024-06-20 20:32         ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Yury Norov @ 2024-06-20 20:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

On Thu, Jun 20, 2024 at 12:26:18PM -0700, Linus Torvalds wrote:
> On Thu, 20 Jun 2024 at 11:32, Yury Norov <yury.norov@gmail.com> wrote:
> >
> > Is that in master already? I didn't get any email, and I can't find
> > anything related in the master branch.
> 
> It's 5d272dd1b343 ("cpumask: limit FORCE_NR_CPUS to just the UP case").

FORCE_NR_CPUS helped to generate a better code for me back then. I'll
check again against the current kernel.

The 5d272dd1b343 is wrong. Limiting FORCE_NR_CPUS to UP case makes no
sense because in UP case nr_cpu_ids is already a compile-time macro:

#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
#define nr_cpu_ids ((unsigned int)NR_CPUS)
#else
extern unsigned int nr_cpu_ids;
#endif

I use FORCE_NR_CPUS for my Rpi. (used, until I burnt it)

> > > New rule: before you send some optimization, you need to have NUMBERS.
> >
> > I tried to underline that it's not a performance optimization at my
> > best.
> 
> If it's not about performance, then it damn well shouldn't be 90%
> inline functions in a header file.
> 
> If it's a helper function, it needs to be a real function elsewhere. Not this:
> 
>  include/linux/find_atomic.h                  | 324 +++++++++++++++++++
> 
> because either performance really matters, in which case you need to
> show profiles, or performance doesn't matter, in which case it damn
> well shouldn't have special cases for small bitsets that double the
> size of the code.

This small_const_nbits() thing is a compile-time optimization for a
single-word bitmap with a compile-time length.

If the bitmap is longer, or nbits is not known at compile time, the
inline part goes away entirely at compile time.

In the other case, outline part goes away. So those converting from
find_bit() + test_and_set_bit() will see no new outline function
calls.

This inline + outline implementation is traditional for bitmaps, and
for some people it's important. For example, Sean Christopherson
explicitly asked to add a notice that converting to the new API will
still generate inline code. See patch #13.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 00/40] lib/find: add atomic find_bit() primitives
  2024-06-20 20:20       ` Yury Norov
@ 2024-06-20 20:32         ` Linus Torvalds
  0 siblings, 0 replies; 51+ messages in thread
From: Linus Torvalds @ 2024-06-20 20:32 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, David S. Miller, H. Peter Anvin,
	James E.J. Bottomley, K. Y. Srinivasan, Md. Haris Iqbal,
	Akinobu Mita, Andrew Morton, Bjorn Andersson, Borislav Petkov,
	Chaitanya Kulkarni, Christian Brauner, Damien Le Moal,
	Dave Hansen, David Disseldorp, Edward Cree, Eric Dumazet,
	Fenghua Yu, Geert Uytterhoeven, Greg Kroah-Hartman,
	Gregory Greenman, Hans Verkuil, Hans de Goede, Hugh Dickins,
	Ingo Molnar, Jakub Kicinski, Jaroslav Kysela, Jason Gunthorpe,
	Jens Axboe, Jiri Pirko, Jiri Slaby, Kalle Valo, Karsten Graul,
	Karsten Keil, Kees Cook, Leon Romanovsky, Mark Rutland,
	Martin Habets, Mauro Carvalho Chehab, Michael Ellerman,
	Michal Simek, Nicholas Piggin, Oliver Neukum, Paolo Abeni,
	Paolo Bonzini, Peter Zijlstra, Ping-Ke Shih, Rich Felker,
	Rob Herring, Robin Murphy, Sean Christopherson, Shuai Xue,
	Stanislaw Gruszka, Steven Rostedt, Thomas Bogendoerfer,
	Thomas Gleixner, Valentin Schneider, Vitaly Kuznetsov,
	Wenjia Zhang, Will Deacon, Yoshinori Sato,
	GR-QLogic-Storage-Upstream, alsa-devel, ath10k, dmaengine, iommu,
	kvm, linux-arm-kernel, linux-arm-msm, linux-block,
	linux-bluetooth, linux-hyperv, linux-m68k, linux-media,
	linux-mips, linux-net-drivers, linux-pci, linux-rdma, linux-s390,
	linux-scsi, linux-serial, linux-sh, linux-sound, linux-usb,
	linux-wireless, linuxppc-dev, mpi3mr-linuxdrv.pdl, netdev,
	sparclinux, x86, Alexey Klimov, Bart Van Assche, Jan Kara,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

On Thu, 20 Jun 2024 at 13:20, Yury Norov <yury.norov@gmail.com> wrote:
>
> FORCE_NR_CPUS helped to generate a better code for me back then. I'll
> check again against the current kernel.

Of _course_ it generates better code.

But when "better code" is a source of bugs, and isn't actually useful
in general, it's not better, is it.

> The 5d272dd1b343 is wrong. Limiting FORCE_NR_CPUS to UP case makes no
> sense because in UP case nr_cpu_ids is already a compile-time macro:

Yury, I'm very aware. That was obviously intentional. the whole point
of the commit is to just disable the the whole thing as useless and
problematic.

I could have just ripped it out entirely. I ended up doing a one-liner instead.

                Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 33/40] sh: mach-x3proto: optimize ilsel_enable()
  2024-06-20 17:56 ` [PATCH v4 33/40] sh: mach-x3proto: optimize ilsel_enable() Yury Norov
@ 2024-06-21  8:48   ` John Paul Adrian Glaubitz
  2024-06-21 14:30     ` Yury Norov
  0 siblings, 1 reply; 51+ messages in thread
From: John Paul Adrian Glaubitz @ 2024-06-21  8:48 UTC (permalink / raw)
  To: Yury Norov, linux-kernel, Yoshinori Sato, Rich Felker,
	Geert Uytterhoeven, linux-sh
  Cc: Alexey Klimov, Bart Van Assche, Jan Kara, Linus Torvalds,
	Matthew Wilcox, Mirsad Todorovac, Rasmus Villemoes,
	Sergey Shtylyov

Hi Yury,

thanks for your patch!

On Thu, 2024-06-20 at 10:56 -0700, Yury Norov wrote:
> Simplify ilsel_enable() by using find_and_set_bit().
> 
> Geert also pointed the bug in the old implementation:
> 
> 	I don't think the old code worked as intended: the first time
> 	no free bit is found, bit would have been ILSEL_LEVELS, and
> 	test_and_set_bit() would have returned false, thus terminating
> 	the loop, and continuing with an out-of-range bit value? Hence
> 	to work correctly, bit ILSEL_LEVELS of ilsel_level_map should
> 	have been initialized to one?  Or am I missing something?
> 
> The new code does not have that issue.
> 
> CC: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
> ---
>  arch/sh/boards/mach-x3proto/ilsel.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/sh/boards/mach-x3proto/ilsel.c b/arch/sh/boards/mach-x3proto/ilsel.c
> index f0d5eb41521a..35b585e154f0 100644
> --- a/arch/sh/boards/mach-x3proto/ilsel.c
> +++ b/arch/sh/boards/mach-x3proto/ilsel.c
> @@ -8,6 +8,7 @@
>   */
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include <linux/find_atomic.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
>  #include <linux/module.h>
> @@ -99,8 +100,8 @@ int ilsel_enable(ilsel_source_t set)
>  	}
>  
>  	do {
> -		bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
> -	} while (test_and_set_bit(bit, &ilsel_level_map));
> +		bit = find_and_set_bit(&ilsel_level_map, ILSEL_LEVELS);
> +	} while (bit >= ILSEL_LEVELS);
>  
>  	__ilsel_enable(set, bit);

I will need to take a closer look at the whole code in ilsel_enable() to understand what's
happening here. If Geert's explanation is correct, it sounds more like you're fixing a bug
and saying you're optimizing the function in the patch subject would sound more like an
euphemism.

Also, I think we should add a Fixes tag if possible in case your patch fixes an actual bug.

I will have a closer look over the weekend.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 33/40] sh: mach-x3proto: optimize ilsel_enable()
  2024-06-21  8:48   ` John Paul Adrian Glaubitz
@ 2024-06-21 14:30     ` Yury Norov
  0 siblings, 0 replies; 51+ messages in thread
From: Yury Norov @ 2024-06-21 14:30 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: linux-kernel, Yoshinori Sato, Rich Felker, Geert Uytterhoeven,
	linux-sh, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

On Fri, Jun 21, 2024 at 10:48:44AM +0200, John Paul Adrian Glaubitz wrote:
> Hi Yury,
> 
> thanks for your patch!
> 
> On Thu, 2024-06-20 at 10:56 -0700, Yury Norov wrote:
> > Simplify ilsel_enable() by using find_and_set_bit().
> > 
> > Geert also pointed the bug in the old implementation:
> > 
> > 	I don't think the old code worked as intended: the first time
> > 	no free bit is found, bit would have been ILSEL_LEVELS, and
> > 	test_and_set_bit() would have returned false, thus terminating
> > 	the loop, and continuing with an out-of-range bit value? Hence
> > 	to work correctly, bit ILSEL_LEVELS of ilsel_level_map should
> > 	have been initialized to one?  Or am I missing something?
> > 
> > The new code does not have that issue.
> > 
> > CC: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > ---
> >  arch/sh/boards/mach-x3proto/ilsel.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/sh/boards/mach-x3proto/ilsel.c b/arch/sh/boards/mach-x3proto/ilsel.c
> > index f0d5eb41521a..35b585e154f0 100644
> > --- a/arch/sh/boards/mach-x3proto/ilsel.c
> > +++ b/arch/sh/boards/mach-x3proto/ilsel.c
> > @@ -8,6 +8,7 @@
> >   */
> >  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> >  
> > +#include <linux/find_atomic.h>
> >  #include <linux/init.h>
> >  #include <linux/kernel.h>
> >  #include <linux/module.h>
> > @@ -99,8 +100,8 @@ int ilsel_enable(ilsel_source_t set)
> >  	}
> >  
> >  	do {
> > -		bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
> > -	} while (test_and_set_bit(bit, &ilsel_level_map));
> > +		bit = find_and_set_bit(&ilsel_level_map, ILSEL_LEVELS);
> > +	} while (bit >= ILSEL_LEVELS);
> >  
> >  	__ilsel_enable(set, bit);
> 
> I will need to take a closer look at the whole code in ilsel_enable() to understand what's
> happening here. If Geert's explanation is correct, it sounds more like you're fixing a bug
> and saying you're optimizing the function in the patch subject would sound more like an
> euphemism.
> 
> Also, I think we should add a Fixes tag if possible in case your patch fixes an actual bug.
> 
> I will have a closer look over the weekend.

Hi John,

The problem is that if the ilsel_level_map if dense, the @bit
will be set to ILSEL_LEVELS. The following test_and_set_bit()
will therefore access a bit beyond the end of bitmap. Which in
turn is undef.

I'm not familiar to the subsystem as whole, so I can't say if it's
ever possible to have the ilsel_level_map all set. If you take a
look that would be great.

If this series will not move, the fix for this code would be:

  do {
          bit = find_first_zero_bit(&ilsel_level_map, ILSEL_LEVELS);
  } while (bit >= ILSEL_LEVELS || test_and_set_bit(bit, &ilsel_level_map));

It would work, but because find_first_zero_bit() is not designed to
work correctly in concurrent environment, it may trigger KCSAN and/or
return something non-relevant. See cover letter on this series for
details.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 19/40] iommu: optimize subsystem by using atomic find_bit() API
  2024-06-20 17:56 ` [PATCH v4 19/40] iommu: optimize subsystem " Yury Norov
@ 2024-06-25 12:16   ` Joerg Roedel
  0 siblings, 0 replies; 51+ messages in thread
From: Joerg Roedel @ 2024-06-25 12:16 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Will Deacon, Robin Murphy, Andy Gross,
	Bjorn Andersson, Konrad Dybcio, linux-arm-kernel, iommu,
	linux-arm-msm, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

On Thu, Jun 20, 2024 at 10:56:42AM -0700, Yury Norov wrote:
>  drivers/iommu/arm/arm-smmu/arm-smmu.h | 11 +++--------
>  drivers/iommu/msm_iommu.c             | 19 +++++--------------

Please split that up into an arm-smmu and msm part, so that these can be
reviewed and merged via separate branches.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 24/40] RDMA/rtrs: optimize __rtrs_get_permit() by using find_and_set_bit_lock()
  2024-06-20 17:56 ` [PATCH v4 24/40] RDMA/rtrs: optimize __rtrs_get_permit() by using find_and_set_bit_lock() Yury Norov
@ 2024-06-27 12:59   ` Jinpu Wang
  0 siblings, 0 replies; 51+ messages in thread
From: Jinpu Wang @ 2024-06-27 12:59 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Md. Haris Iqbal, Jason Gunthorpe, Leon Romanovsky,
	linux-rdma, Alexey Klimov, Bart Van Assche, Jan Kara,
	Linus Torvalds, Matthew Wilcox, Mirsad Todorovac,
	Rasmus Villemoes, Sergey Shtylyov

On Thu, Jun 20, 2024 at 7:58 PM Yury Norov <yury.norov@gmail.com> wrote:
>
> The function opencodes find_and_set_bit_lock() with a while-loop polling
> on test_and_set_bit_lock(). Use the dedicated function instead.
>
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
lgtm, thx!
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
> ---
>  drivers/infiniband/ulp/rtrs/rtrs-clt.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> index 88106cf5ce55..52b7728f6c63 100644
> --- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> +++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
> @@ -10,6 +10,7 @@
>  #undef pr_fmt
>  #define pr_fmt(fmt) KBUILD_MODNAME " L" __stringify(__LINE__) ": " fmt
>
> +#include <linux/find_atomic.h>
>  #include <linux/module.h>
>  #include <linux/rculist.h>
>  #include <linux/random.h>
> @@ -72,18 +73,9 @@ __rtrs_get_permit(struct rtrs_clt_sess *clt, enum rtrs_clt_con_type con_type)
>         struct rtrs_permit *permit;
>         int bit;
>
> -       /*
> -        * Adapted from null_blk get_tag(). Callers from different cpus may
> -        * grab the same bit, since find_first_zero_bit is not atomic.
> -        * But then the test_and_set_bit_lock will fail for all the
> -        * callers but one, so that they will loop again.
> -        * This way an explicit spinlock is not required.
> -        */
> -       do {
> -               bit = find_first_zero_bit(clt->permits_map, max_depth);
> -               if (bit >= max_depth)
> -                       return NULL;
> -       } while (test_and_set_bit_lock(bit, clt->permits_map));
> +       bit = find_and_set_bit_lock(clt->permits_map, max_depth);
> +       if (bit >= max_depth)
> +               return NULL;
>
>         permit = get_permit(clt, bit);
>         WARN_ON(permit->mem_id != bit);
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v4 23/40] usb: cdc-acm: optimize acm_softint()
  2024-06-20 17:56 ` [PATCH v4 23/40] usb: cdc-acm: optimize acm_softint() Yury Norov
@ 2024-06-27 14:03   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 51+ messages in thread
From: Greg Kroah-Hartman @ 2024-06-27 14:03 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Oliver Neukum, linux-usb, Alexey Klimov,
	Bart Van Assche, Jan Kara, Linus Torvalds, Matthew Wilcox,
	Mirsad Todorovac, Rasmus Villemoes, Sergey Shtylyov

On Thu, Jun 20, 2024 at 10:56:46AM -0700, Yury Norov wrote:
> acm_softint() uses for-loop to traverse urbs_in_error_delay bitmap
> bit by bit to find and clear set bits.
> 
> Simplify it by using for_each_test_and_clear_bit(), because it doesn't
> test already clear bits.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> Acked-by: Oliver Neukum <oneukum@suse.com>
> ---
>  drivers/usb/class/cdc-acm.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
> index 0e7439dba8fe..f8940f0d7ad8 100644
> --- a/drivers/usb/class/cdc-acm.c
> +++ b/drivers/usb/class/cdc-acm.c
> @@ -18,6 +18,7 @@
>  #undef DEBUG
>  #undef VERBOSE_DEBUG
>  
> +#include <linux/find_atomic.h>
>  #include <linux/kernel.h>
>  #include <linux/sched/signal.h>
>  #include <linux/errno.h>
> @@ -613,9 +614,8 @@ static void acm_softint(struct work_struct *work)
>  	}
>  
>  	if (test_and_clear_bit(ACM_ERROR_DELAY, &acm->flags)) {
> -		for (i = 0; i < acm->rx_buflimit; i++)
> -			if (test_and_clear_bit(i, &acm->urbs_in_error_delay))
> -				acm_submit_read_urb(acm, i, GFP_KERNEL);
> +		for_each_test_and_clear_bit(i, &acm->urbs_in_error_delay, acm->rx_buflimit)
> +			acm_submit_read_urb(acm, i, GFP_KERNEL);
>  	}
>  
>  	if (test_and_clear_bit(EVENT_TTY_WAKEUP, &acm->flags))
> -- 
> 2.43.0
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- This looks like a new version of a previously submitted patch, but you
  did not list below the --- line any changes from the previous version.
  Please read the section entitled "The canonical patch format" in the
  kernel file, Documentation/process/submitting-patches.rst for what
  needs to be done here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2024-06-27 14:03 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-20 17:56 [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Yury Norov
2024-06-20 17:56 ` [PATCH v4 01/40] " Yury Norov
2024-06-20 17:56 ` [PATCH v4 02/40] lib/find: add test for atomic find_bit() ops Yury Norov
2024-06-20 17:56 ` [PATCH v4 03/40] lib/sbitmap; optimize __sbitmap_get_word() by using find_and_set_bit() Yury Norov
2024-06-20 17:56 ` [PATCH v4 04/40] watch_queue: optimize post_one_notification() by using find_and_clear_bit() Yury Norov
2024-06-20 17:56 ` [PATCH v4 05/40] sched: add cpumask_find_and_set() and use it in __mm_cid_get() Yury Norov
2024-06-20 17:56 ` [PATCH v4 06/40] mips: sgi-ip30: optimize heart_alloc_int() by using find_and_set_bit() Yury Norov
2024-06-20 17:56 ` [PATCH v4 07/40] sparc: optimize alloc_msi() " Yury Norov
2024-06-20 17:56 ` [PATCH v4 08/40] perf/arm: use atomic find_bit() API Yury Norov
2024-06-20 17:56 ` [PATCH v4 09/40] drivers/perf: optimize ali_drw_get_counter_idx() by using find_and_set_bit() Yury Norov
2024-06-20 17:56 ` [PATCH v4 10/40] dmaengine: idxd: optimize perfmon_assign_event() Yury Norov
2024-06-20 17:56 ` [PATCH v4 11/40] ath10k: optimize ath10k_snoc_napi_poll() Yury Norov
2024-06-20 17:56 ` [PATCH v4 12/40] wifi: rtw88: optimize the driver by using atomic iterator Yury Norov
2024-06-20 17:56 ` [PATCH v4 13/40] KVM: x86: hyper-v: optimize and cleanup kvm_hv_process_stimers() Yury Norov
2024-06-20 17:56 ` [PATCH v4 14/40] PCI: hv: Optimize hv_get_dom_num() by using find_and_set_bit() Yury Norov
2024-06-20 17:56 ` [PATCH v4 15/40] scsi: core: optimize scsi_evt_emit() by using an atomic iterator Yury Norov
2024-06-20 17:56 ` [PATCH v4 16/40] scsi: mpi3mr: optimize the driver by using find_and_set_bit() Yury Norov
2024-06-20 17:56 ` [PATCH v4 17/40] scsi: qedi: optimize qedi_get_task_idx() " Yury Norov
2024-06-20 17:56 ` [PATCH v4 18/40] powerpc: optimize arch code by using atomic find_bit() API Yury Norov
2024-06-20 17:56 ` [PATCH v4 19/40] iommu: optimize subsystem " Yury Norov
2024-06-25 12:16   ` Joerg Roedel
2024-06-20 17:56 ` [PATCH v4 20/40] media: radio-shark: optimize the driver " Yury Norov
2024-06-20 17:56 ` [PATCH v4 21/40] sfc: " Yury Norov
2024-06-20 17:56 ` [PATCH v4 22/40] tty: nozomi: optimize interrupt_handler() Yury Norov
2024-06-20 17:56 ` [PATCH v4 23/40] usb: cdc-acm: optimize acm_softint() Yury Norov
2024-06-27 14:03   ` Greg Kroah-Hartman
2024-06-20 17:56 ` [PATCH v4 24/40] RDMA/rtrs: optimize __rtrs_get_permit() by using find_and_set_bit_lock() Yury Norov
2024-06-27 12:59   ` Jinpu Wang
2024-06-20 17:56 ` [PATCH v4 25/40] mISDN: optimize get_free_devid() Yury Norov
2024-06-20 17:56 ` [PATCH v4 26/40] media: em28xx: cx231xx: optimize drivers by using find_and_set_bit() Yury Norov
2024-06-20 17:56 ` [PATCH v4 27/40] ethernet: rocker: optimize ofdpa_port_internal_vlan_id_get() Yury Norov
2024-06-20 17:56 ` [PATCH v4 28/40] bluetooth: optimize cmtp_alloc_block_id() Yury Norov
2024-06-20 17:56 ` [PATCH v4 29/40] net: smc: optimize smc_wr_tx_get_free_slot_index() Yury Norov
2024-06-20 17:56 ` [PATCH v4 30/40] ALSA: use atomic find_bit() functions where applicable Yury Norov
2024-06-20 17:56 ` [PATCH v4 31/40] m68k: optimize get_mmu_context() Yury Norov
2024-06-20 17:56 ` [PATCH v4 32/40] microblaze: " Yury Norov
2024-06-20 17:56 ` [PATCH v4 33/40] sh: mach-x3proto: optimize ilsel_enable() Yury Norov
2024-06-21  8:48   ` John Paul Adrian Glaubitz
2024-06-21 14:30     ` Yury Norov
2024-06-20 17:56 ` [PATCH v4 34/40] MIPS: sgi-ip27: optimize alloc_level() Yury Norov
2024-06-20 17:56 ` [PATCH v4 35/40] uprobes: optimize xol_take_insn_slot() Yury Norov
2024-06-20 17:56 ` [PATCH v4 36/40] scsi: sr: drop locking around SR index bitmap Yury Norov
2024-06-20 17:57 ` [PATCH v4 37/40] KVM: PPC: Book3s HV: drop locking around kvmppc_uvmem_bitmap Yury Norov
2024-06-20 17:57 ` [PATCH v4 38/40] wifi: mac80211: drop locking around ntp_fltr_bmap Yury Norov
2024-06-20 17:57 ` [PATCH v4 39/40] mailbox: bcm-flexrm: simplify locking scheme Yury Norov
2024-06-20 17:57 ` [PATCH v4 40/40] powerpc/xive: drop locking around IRQ map Yury Norov
2024-06-20 18:00 ` [PATCH v4 00/40] lib/find: add atomic find_bit() primitives Linus Torvalds
2024-06-20 18:32   ` Yury Norov
2024-06-20 19:26     ` Linus Torvalds
2024-06-20 20:20       ` Yury Norov
2024-06-20 20:32         ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox