public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps
@ 2023-10-09 15:10 Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 01/14] bitops: add missing prototype check Alexander Lobakin
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Derived from the PFCP support series[0] as this grew bigger (2 -> 14
commits) and involved more core bitmap changes. Only commits 10 and 11
are from the mentioned tree, the rest is new. PFCP itself still depends
on this series.

IP tunnels have their flags defined as `__be16`, including UAPI, and
after GTP was accepted, there are no more free bits left. UAPI (incl.
direct usage of one of the user structs) and explicit Endianness only
complicate things.
Since it would either way end up with hundreds of locs due to all that,
pick bitmaps right from the start to store the flags in the most native
and scalable format with rich API. I don't think it's worth trying to
praise luck and pick smth like u32 only to redo everything in x years :)
More details regarding the IP tunnel flags is in 11 and 14.

The rest is just a good bunch of prereqs and tests: a couple of new
helpers and extensions to the old ones, a few optimizations to partially
mitigate IP tunnel object code growth due to __be16 -> long, and
decouping one UAPI struct used throughout the whole kernel into the
userspace and the kernel space counterparts to eliminate the dependency.

[0] https://lore.kernel.org/netdev/20230721071532.613888-1-marcin.szycik@linux.intel.com

Alexander Lobakin (14):
  bitops: add missing prototype check
  bitops: make BYTES_TO_BITS() treewide-available
  bitops: let the compiler optimize __assign_bit()
  linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros
  s390/cio: rename bitmap_size() -> idset_bitmap_size()
  fs/ntfs3: rename bitmap_size() -> ntfs3_bitmap_size()
  btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits()
  bitmap: introduce generic optimized bitmap_size()
  bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits()
  ip_tunnel: use a separate struct to store tunnel params in the kernel
  ip_tunnel: convert __be16 tunnel flags to bitmaps
  lib/bitmap: add compile-time test for __assign_bit() optimization
  lib/bitmap: add tests for bitmap_{get,set}_bits()
  lib/bitmap: add tests for IP tunnel flags conversion helpers

 drivers/md/dm-clone-metadata.c                |   5 -
 drivers/net/bareudp.c                         |  19 +-
 .../ethernet/mellanox/mlx5/core/en/tc_tun.h   |   2 +-
 .../mellanox/mlx5/core/en/tc_tun_encap.c      |   6 +-
 .../mellanox/mlx5/core/en/tc_tun_geneve.c     |  12 +-
 .../mellanox/mlx5/core/en/tc_tun_gre.c        |   8 +-
 .../mellanox/mlx5/core/en/tc_tun_vxlan.c      |   9 +-
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |  16 +-
 .../ethernet/mellanox/mlxsw/spectrum_ipip.c   |  56 +++--
 .../ethernet/mellanox/mlxsw/spectrum_ipip.h   |   2 +-
 .../ethernet/mellanox/mlxsw/spectrum_span.c   |  10 +-
 .../ethernet/netronome/nfp/flower/action.c    |  27 ++-
 drivers/net/geneve.c                          |  44 ++--
 drivers/net/vxlan/vxlan_core.c                |  14 +-
 drivers/s390/cio/idset.c                      |  10 +-
 fs/btrfs/free-space-cache.c                   |   8 +-
 fs/ntfs3/bitmap.c                             |   4 +-
 fs/ntfs3/fsntfs.c                             |   2 +-
 fs/ntfs3/index.c                              |  11 +-
 fs/ntfs3/ntfs_fs.h                            |   2 +-
 fs/ntfs3/super.c                              |   2 +-
 include/linux/bitmap.h                        |  59 ++++--
 include/linux/bitops.h                        |  13 +-
 include/linux/cpumask.h                       |   2 +-
 include/linux/linkmode.h                      |  27 +--
 include/linux/netdevice.h                     |   7 +-
 include/net/dst_metadata.h                    |  10 +-
 include/net/flow_dissector.h                  |   2 +-
 include/net/gre.h                             |  70 +++---
 include/net/ip6_tunnel.h                      |   4 +-
 include/net/ip_tunnels.h                      | 136 ++++++++++--
 include/net/udp_tunnel.h                      |   4 +-
 include/uapi/linux/if_tunnel.h                |  33 +++
 kernel/trace/trace_probe.c                    |   2 -
 lib/math/prime_numbers.c                      |   2 -
 lib/test_bitmap.c                             | 200 +++++++++++++++++-
 net/bridge/br_vlan_tunnel.c                   |   9 +-
 net/core/filter.c                             |  26 +--
 net/core/flow_dissector.c                     |  20 +-
 net/ipv4/fou_bpf.c                            |   2 +-
 net/ipv4/gre_demux.c                          |   2 +-
 net/ipv4/ip_gre.c                             | 144 ++++++++-----
 net/ipv4/ip_tunnel.c                          | 109 +++++++---
 net/ipv4/ip_tunnel_core.c                     |  82 ++++---
 net/ipv4/ip_vti.c                             |  41 ++--
 net/ipv4/ipip.c                               |  33 +--
 net/ipv4/ipmr.c                               |   2 +-
 net/ipv4/udp_tunnel_core.c                    |   5 +-
 net/ipv6/addrconf.c                           |   3 +-
 net/ipv6/ip6_gre.c                            |  85 ++++----
 net/ipv6/ip6_tunnel.c                         |  14 +-
 net/ipv6/sit.c                                |  38 ++--
 net/netfilter/ipvs/ip_vs_core.c               |   6 +-
 net/netfilter/ipvs/ip_vs_xmit.c               |  20 +-
 net/netfilter/nft_tunnel.c                    |  44 ++--
 net/openvswitch/flow_netlink.c                |  61 +++---
 net/psample/psample.c                         |  26 +--
 net/sched/act_tunnel_key.c                    |  36 ++--
 net/sched/cls_flower.c                        |  27 +--
 tools/include/linux/bitmap.h                  |   8 +-
 tools/include/linux/bitops.h                  |   2 +
 tools/perf/util/probe-finder.c                |   2 -
 62 files changed, 1116 insertions(+), 571 deletions(-)

---
Not sure whether it's fine to have that all in one series, but OTOH
there's not much stuff I could split (like, 3 commits), it either
depends directly (new helpers etc.) or will just generate suboptimal
code w/o some of the commits.

I'm also thinking of which tree this would ideally be taken through.
The main subject is networking, but most of the commits are generic.
My idea is to push this via Yury / bitmaps and then ask the netdev
maintainers to pull his tree before they take PFCP (dependent on this
one).

Speaking of bitmap_{read,write}() from [1] vs bitmap_{get,set}_bits()
from #09: they don't really conflict, because the former are
generic-generic and support bound crossing, while the latter require
the width to be a pow-2 and the offset to be a multiple of the width
in order to preserve the optimization level as close to the current
bitmap_{get,set}_value8() as possible...

Old pfcp -> bitmap changelog:

As for former commits (now 10 and 11), almost all of the changes were
suggested by Andy, notably: stop violating bitmap API, use
__assign_bit() where appropriate, and add more tests to make sure
everything works as expected. Apart from that, add simple wrappers for
bitmap_*() used in the IP tunnel code to avoid manually specifying
``__IP_TUNNEL_FLAG_NUM`` each time.

[1] https://lore.kernel.org/lkml/20231006134529.2816540-2-glider@google.com
-- 
2.41.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/14] bitops: add missing prototype check
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 02/14] bitops: make BYTES_TO_BITS() treewide-available Alexander Lobakin
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Commit 8238b4579866 ("wait_on_bit: add an acquire memory barrier") added
a new bitop, test_bit_acquire(), with proper wrapping to try optimize it
at compile-time, but missed the list of bitops used for checking their
prototypes a bit below.
The functions added have consistent prototypes, so that no more changes
are required and no functional changes take place.

Fixes: 8238b4579866 ("wait_on_bit: add an acquire memory barrier")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/bitops.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2ba557e067fe..f7f5a783da2a 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -80,6 +80,7 @@ __check_bitop_pr(__test_and_set_bit);
 __check_bitop_pr(__test_and_clear_bit);
 __check_bitop_pr(__test_and_change_bit);
 __check_bitop_pr(test_bit);
+__check_bitop_pr(test_bit_acquire);
 
 #undef __check_bitop_pr
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/14] bitops: make BYTES_TO_BITS() treewide-available
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 01/14] bitops: add missing prototype check Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 03/14] bitops: let the compiler optimize __assign_bit() Alexander Lobakin
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Avoid open-coding that simple expression each time by moving
BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export
it to the rest of the kernel.
Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always
equals to %BITS_PER_BYTE, regardless of the target architecture.
Do the same for the tools ecosystem as well (incl. its version of
bitops.h).

Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/bitops.h         | 2 ++
 kernel/trace/trace_probe.c     | 2 --
 tools/include/linux/bitops.h   | 2 ++
 tools/perf/util/probe-finder.c | 2 --
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index f7f5a783da2a..e0cd09eb91cd 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -21,6 +21,8 @@
 #define BITS_TO_U32(nr)		__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u32))
 #define BITS_TO_BYTES(nr)	__KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(char))
 
+#define BYTES_TO_BITS(nb)	((nb) * BITS_PER_BYTE)
+
 extern unsigned int __sw_hweight8(unsigned int w);
 extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 4dc74d73fc1d..2b743c1e37db 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1053,8 +1053,6 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 	return ret;
 }
 
-#define BYTES_TO_BITS(nb)	((BITS_PER_LONG * (nb)) / sizeof(long))
-
 /* Bitfield type needs to be parsed into a fetch function */
 static int __parse_bitfield_probe_arg(const char *bf,
 				      const struct fetch_type *t,
diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h
index f18683b95ea6..bc6600466e7b 100644
--- a/tools/include/linux/bitops.h
+++ b/tools/include/linux/bitops.h
@@ -20,6 +20,8 @@
 #define BITS_TO_U32(nr)		DIV_ROUND_UP(nr, BITS_PER_TYPE(u32))
 #define BITS_TO_BYTES(nr)	DIV_ROUND_UP(nr, BITS_PER_TYPE(char))
 
+#define BYTES_TO_BITS(nb)	((nb) * BITS_PER_BYTE)
+
 extern unsigned int __sw_hweight8(unsigned int w);
 extern unsigned int __sw_hweight16(unsigned int w);
 extern unsigned int __sw_hweight32(unsigned int w);
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index f171360b0ef4..35f66c12ad8a 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -304,8 +304,6 @@ static int convert_variable_location(Dwarf_Die *vr_die, Dwarf_Addr addr,
 	return ret2;
 }
 
-#define BYTES_TO_BITS(nb)	((nb) * BITS_PER_LONG / sizeof(long))
-
 static int convert_variable_type(Dwarf_Die *vr_die,
 				 struct probe_trace_arg *tvar,
 				 const char *cast, bool user_access)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/14] bitops: let the compiler optimize __assign_bit()
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 01/14] bitops: add missing prototype check Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 02/14] bitops: make BYTES_TO_BITS() treewide-available Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 16:18   ` Yury Norov
  2023-10-09 15:10 ` [PATCH 04/14] linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros Alexander Lobakin
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops
on compile-time constants"), the compilers are able to expand inline
bitmap operations to compile-time initializers when possible.
However, during the round of replacement if-__set-else-__clear with
__assign_bit() as per Andy's advice, bloat-o-meter showed +1024 bytes
difference in object code size for one module (even one function),
where the pattern:

	DECLARE_BITMAP(foo) = { }; // on the stack, zeroed

	if (a)
		__set_bit(const_bit_num, foo);
	if (b)
		__set_bit(another_const_bit_num, foo);
	...

is heavily used, although there should be no difference: the bitmap is
zeroed, so the second half of __assign_bit() should be compiled-out as
a no-op.
I either missed the fact that __assign_bit() has bitmap pointer marked
as `volatile` (as we usually do for bitmaps) or was hoping that the
compilers would at least try to look past the `volatile` for
__always_inline functions. Anyhow, due to that attribute, the compilers
were always compiling the whole expression and no mentioned compile-time
optimizations were working.

Convert __assign_bit() to a macro since it's a very simple if-else and
all of the checks are performed inside __set_bit() and __clear_bit(),
thus that wrapper has to be as transparent as possible. After that
change, despite it showing only -20 bytes change for vmlinux (due to
that it's still relatively unpopular), no drastic code size changes
happen when replacing if-set-else-clear for onstack bitmaps with
__assign_bit(), meaning the compiler now expands them to the actual
operations will all the expected optimizations.

Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/bitops.h | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index e0cd09eb91cd..f98f4fd1047f 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -284,14 +284,8 @@ static __always_inline void assign_bit(long nr, volatile unsigned long *addr,
 		clear_bit(nr, addr);
 }
 
-static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
-					 bool value)
-{
-	if (value)
-		__set_bit(nr, addr);
-	else
-		__clear_bit(nr, addr);
-}
+#define __assign_bit(nr, addr, value)				\
+	((value) ? __set_bit(nr, addr) : __clear_bit(nr, addr))
 
 /**
  * __ptr_set_bit - Set bit in a pointer's value
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/14] linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (2 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 03/14] bitops: let the compiler optimize __assign_bit() Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 05/14] s390/cio: rename bitmap_size() -> idset_bitmap_size() Alexander Lobakin
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops
on compile-time constants"), the non-atomic bitops are macros which can
be expanded by the compilers into compile-time expressions, which will
result in better optimized object code. Unfortunately, turned out that
passing `volatile` to those macros discards any possibility of
optimization, as the compilers then don't even try to look whether
the passed bitmap is known at compilation time. In addition to that,
the mentioned linkmode helpers are marked with `inline`, not
`__always_inline`, meaning that it's not guaranteed some compiler won't
uninline them for no reason, which will also effectively prevent them
from being optimized (it's a well-known thing the compilers sometimes
uninline `2 + 2`).
Convert linkmode_*_bit() from inlines to macros. Their calling
convention are 1:1 with the corresponding bitops, so that it's not even
needed to enumerate and map the arguments, only the names. No changes in
vmlinux' object code (compiled by LLVM for x86_64) whatsoever, but that
doesn't necessarily means the change is meaningless.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/linkmode.h | 27 ++++-----------------------
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/include/linux/linkmode.h b/include/linux/linkmode.h
index 15e0e0209da4..f231e2edbfa5 100644
--- a/include/linux/linkmode.h
+++ b/include/linux/linkmode.h
@@ -38,10 +38,10 @@ static inline int linkmode_andnot(unsigned long *dst, const unsigned long *src1,
 	return bitmap_andnot(dst, src1, src2,  __ETHTOOL_LINK_MODE_MASK_NBITS);
 }
 
-static inline void linkmode_set_bit(int nr, volatile unsigned long *addr)
-{
-	__set_bit(nr, addr);
-}
+#define linkmode_test_bit	test_bit
+#define linkmode_set_bit	__set_bit
+#define linkmode_clear_bit	__clear_bit
+#define linkmode_mod_bit	__assign_bit
 
 static inline void linkmode_set_bit_array(const int *array, int array_size,
 					  unsigned long *addr)
@@ -52,25 +52,6 @@ static inline void linkmode_set_bit_array(const int *array, int array_size,
 		linkmode_set_bit(array[i], addr);
 }
 
-static inline void linkmode_clear_bit(int nr, volatile unsigned long *addr)
-{
-	__clear_bit(nr, addr);
-}
-
-static inline void linkmode_mod_bit(int nr, volatile unsigned long *addr,
-				    int set)
-{
-	if (set)
-		linkmode_set_bit(nr, addr);
-	else
-		linkmode_clear_bit(nr, addr);
-}
-
-static inline int linkmode_test_bit(int nr, const volatile unsigned long *addr)
-{
-	return test_bit(nr, addr);
-}
-
 static inline int linkmode_equal(const unsigned long *src1,
 				 const unsigned long *src2)
 {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/14] s390/cio: rename bitmap_size() -> idset_bitmap_size()
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (3 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 04/14] linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 16:35   ` Yury Norov
  2023-10-09 15:10 ` [PATCH 06/14] fs/ntfs3: rename bitmap_size() -> ntfs3_bitmap_size() Alexander Lobakin
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

bitmap_size() is a pretty generic name and one may want to use it for
a generic bitmap API function. At the same time, its logic is not
"generic", i.e. it's not just `nbits -> size of bitmap in bytes`
converter as it would be expected from its name.
Add the prefix 'idset_' used throughout the file where the function
resides.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
idset_new() really wants its vmalloc() + memset() pair to be replaced
with vzalloc().
---
 drivers/s390/cio/idset.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/cio/idset.c b/drivers/s390/cio/idset.c
index 45f9c0736be4..0a1105a483bf 100644
--- a/drivers/s390/cio/idset.c
+++ b/drivers/s390/cio/idset.c
@@ -16,7 +16,7 @@ struct idset {
 	unsigned long bitmap[];
 };
 
-static inline unsigned long bitmap_size(int num_ssid, int num_id)
+static inline unsigned long idset_bitmap_size(int num_ssid, int num_id)
 {
 	return BITS_TO_LONGS(num_ssid * num_id) * sizeof(unsigned long);
 }
@@ -25,11 +25,12 @@ static struct idset *idset_new(int num_ssid, int num_id)
 {
 	struct idset *set;
 
-	set = vmalloc(sizeof(struct idset) + bitmap_size(num_ssid, num_id));
+	set = vmalloc(sizeof(struct idset) +
+		      idset_bitmap_size(num_ssid, num_id));
 	if (set) {
 		set->num_ssid = num_ssid;
 		set->num_id = num_id;
-		memset(set->bitmap, 0, bitmap_size(num_ssid, num_id));
+		memset(set->bitmap, 0, idset_bitmap_size(num_ssid, num_id));
 	}
 	return set;
 }
@@ -41,7 +42,8 @@ void idset_free(struct idset *set)
 
 void idset_fill(struct idset *set)
 {
-	memset(set->bitmap, 0xff, bitmap_size(set->num_ssid, set->num_id));
+	memset(set->bitmap, 0xff,
+	       idset_bitmap_size(set->num_ssid, set->num_id));
 }
 
 static inline void idset_add(struct idset *set, int ssid, int id)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/14] fs/ntfs3: rename bitmap_size() -> ntfs3_bitmap_size()
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (4 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 05/14] s390/cio: rename bitmap_size() -> idset_bitmap_size() Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 16:50   ` Yury Norov
  2023-10-09 15:10 ` [PATCH 07/14] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits() Alexander Lobakin
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

bitmap_size() is a pretty generic name and one may want to use it for
a generic bitmap API function. At the same time, its logic is
NTFS-specific, as it aligns to the sizeof(u64), not the sizeof(long)
(although it uses ideologically right ALIGN() instead of division).
Add the prefix 'ntfs3_' used for that FS (not just 'ntfs_' to not mix
it with the legacy module).

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 fs/ntfs3/bitmap.c  |  4 ++--
 fs/ntfs3/fsntfs.c  |  2 +-
 fs/ntfs3/index.c   | 11 ++++++-----
 fs/ntfs3/ntfs_fs.h |  2 +-
 fs/ntfs3/super.c   |  2 +-
 5 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/ntfs3/bitmap.c b/fs/ntfs3/bitmap.c
index 107e808e06ea..a2e18f13e93a 100644
--- a/fs/ntfs3/bitmap.c
+++ b/fs/ntfs3/bitmap.c
@@ -653,7 +653,7 @@ int wnd_init(struct wnd_bitmap *wnd, struct super_block *sb, size_t nbits)
 	wnd->total_zeroes = nbits;
 	wnd->extent_max = MINUS_ONE_T;
 	wnd->zone_bit = wnd->zone_end = 0;
-	wnd->nwnd = bytes_to_block(sb, bitmap_size(nbits));
+	wnd->nwnd = bytes_to_block(sb, ntfs3_bitmap_size(nbits));
 	wnd->bits_last = nbits & (wbits - 1);
 	if (!wnd->bits_last)
 		wnd->bits_last = wbits;
@@ -1345,7 +1345,7 @@ int wnd_extend(struct wnd_bitmap *wnd, size_t new_bits)
 		return -EINVAL;
 
 	/* Align to 8 byte boundary. */
-	new_wnd = bytes_to_block(sb, bitmap_size(new_bits));
+	new_wnd = bytes_to_block(sb, ntfs3_bitmap_size(new_bits));
 	new_last = new_bits & (wbits - 1);
 	if (!new_last)
 		new_last = wbits;
diff --git a/fs/ntfs3/fsntfs.c b/fs/ntfs3/fsntfs.c
index 33afee0f5559..7a14d2347f27 100644
--- a/fs/ntfs3/fsntfs.c
+++ b/fs/ntfs3/fsntfs.c
@@ -522,7 +522,7 @@ static int ntfs_extend_mft(struct ntfs_sb_info *sbi)
 	ni->mi.dirty = true;
 
 	/* Step 2: Resize $MFT::BITMAP. */
-	new_bitmap_bytes = bitmap_size(new_mft_total);
+	new_bitmap_bytes = ntfs3_bitmap_size(new_mft_total);
 
 	err = attr_set_size(ni, ATTR_BITMAP, NULL, 0, &sbi->mft.bitmap.run,
 			    new_bitmap_bytes, &new_bitmap_bytes, true, NULL);
diff --git a/fs/ntfs3/index.c b/fs/ntfs3/index.c
index 124c6e822623..ab53a4b6ddf8 100644
--- a/fs/ntfs3/index.c
+++ b/fs/ntfs3/index.c
@@ -1453,8 +1453,8 @@ static int indx_create_allocate(struct ntfs_index *indx, struct ntfs_inode *ni,
 
 	alloc->nres.valid_size = alloc->nres.data_size = cpu_to_le64(data_size);
 
-	err = ni_insert_resident(ni, bitmap_size(1), ATTR_BITMAP, in->name,
-				 in->name_len, &bitmap, NULL, NULL);
+	err = ni_insert_resident(ni, ntfs3_bitmap_size(1), ATTR_BITMAP,
+				 in->name, in->name_len, &bitmap, NULL, NULL);
 	if (err)
 		goto out2;
 
@@ -1515,8 +1515,9 @@ static int indx_add_allocate(struct ntfs_index *indx, struct ntfs_inode *ni,
 	if (bmp) {
 		/* Increase bitmap. */
 		err = attr_set_size(ni, ATTR_BITMAP, in->name, in->name_len,
-				    &indx->bitmap_run, bitmap_size(bit + 1),
-				    NULL, true, NULL);
+				    &indx->bitmap_run,
+				    ntfs3_bitmap_size(bit + 1), NULL, true,
+				    NULL);
 		if (err)
 			goto out1;
 	}
@@ -2089,7 +2090,7 @@ static int indx_shrink(struct ntfs_index *indx, struct ntfs_inode *ni,
 	if (in->name == I30_NAME)
 		ni->vfs_inode.i_size = new_data;
 
-	bpb = bitmap_size(bit);
+	bpb = ntfs3_bitmap_size(bit);
 	if (bpb * 8 == nbits)
 		return 0;
 
diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h
index 629403ede6e5..93333156aac6 100644
--- a/fs/ntfs3/ntfs_fs.h
+++ b/fs/ntfs3/ntfs_fs.h
@@ -961,7 +961,7 @@ static inline bool run_is_empty(struct runs_tree *run)
 }
 
 /* NTFS uses quad aligned bitmaps. */
-static inline size_t bitmap_size(size_t bits)
+static inline size_t ntfs3_bitmap_size(size_t bits)
 {
 	return ALIGN((bits + 7) >> 3, 8);
 }
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index cfec5e0c7f66..b1fb6efe7084 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -1285,7 +1285,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
 
 	/* Check bitmap boundary. */
 	tt = sbi->used.bitmap.nbits;
-	if (inode->i_size < bitmap_size(tt)) {
+	if (inode->i_size < ntfs3_bitmap_size(tt)) {
 		ntfs_err(sb, "$Bitmap is corrupted.");
 		err = -EINVAL;
 		goto put_inode_out;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/14] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits()
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (5 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 06/14] fs/ntfs3: rename bitmap_size() -> ntfs3_bitmap_size() Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 15:23   ` David Sterba
  2023-10-09 15:10 ` [PATCH 08/14] bitmap: introduce generic optimized bitmap_size() Alexander Lobakin
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

bitmap_set_bits() does not start with the FS' prefix and may collide
with a new generic helper one day. It operates with the FS-specific
types, so there's no change those two could do the same thing.
Just add the prefix to exclude such possible conflict.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 fs/btrfs/free-space-cache.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 27fad70451aa..94249b5ee447 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1909,9 +1909,9 @@ static inline void bitmap_clear_bits(struct btrfs_free_space_ctl *ctl,
 		ctl->free_space -= bytes;
 }
 
-static void bitmap_set_bits(struct btrfs_free_space_ctl *ctl,
-			    struct btrfs_free_space *info, u64 offset,
-			    u64 bytes)
+static void btrfs_bitmap_set_bits(struct btrfs_free_space_ctl *ctl,
+				  struct btrfs_free_space *info, u64 offset,
+				  u64 bytes)
 {
 	unsigned long start, count, end;
 	int extent_delta = 1;
@@ -2247,7 +2247,7 @@ static u64 add_bytes_to_bitmap(struct btrfs_free_space_ctl *ctl,
 
 	bytes_to_set = min(end - offset, bytes);
 
-	bitmap_set_bits(ctl, info, offset, bytes_to_set);
+	btrfs_bitmap_set_bits(ctl, info, offset, bytes_to_set);
 
 	return bytes_to_set;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/14] bitmap: introduce generic optimized bitmap_size()
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (6 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 07/14] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits() Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 17:04   ` Yury Norov
  2023-10-09 15:10 ` [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits() Alexander Lobakin
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

The number of times yet another open coded
`BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge.
Some generic helper is long overdue.

Add one, bitmap_size(), but with one detail.
BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both
divident and divisor are compile-time constants or when the divisor
is not a pow-of-2. When it is however, the compilers sometimes tend
to generate suboptimal code (GCC 13):

48 83 c0 3f          	add    $0x3f,%rax
48 c1 e8 06          	shr    $0x6,%rax
48 8d 14 c5 00 00 00 00	lea    0x0(,%rax,8),%rdx

%BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does
full division of `nbits + 63` by it and then multiplication by 8.
Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC:

8d 50 3f             	lea    0x3f(%rax),%edx
c1 ea 03             	shr    $0x3,%edx
81 e2 f8 ff ff 1f    	and    $0x1ffffff8,%edx

Now it divides `nbits + 63` by 8 and then masks bits[2:0].
bloat-o-meter:

add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617)

Clang does it better and generates the same code before/after starting
from -O1, except that with the ALIGN() approach it uses %edx and thus
still saves some bytes:

add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520)

Note that we can't expand DIV_ROUND_UP() by adding a check and using
this approach there, as it's used in array declarations where
expressions are not allowed.
Add this helper to tools/ as well.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/md/dm-clone-metadata.c | 5 -----
 include/linux/bitmap.h         | 8 +++++---
 include/linux/cpumask.h        | 2 +-
 lib/math/prime_numbers.c       | 2 --
 tools/include/linux/bitmap.h   | 8 +++++---
 5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/md/dm-clone-metadata.c b/drivers/md/dm-clone-metadata.c
index c43d55672bce..47c1fa7aad8b 100644
--- a/drivers/md/dm-clone-metadata.c
+++ b/drivers/md/dm-clone-metadata.c
@@ -465,11 +465,6 @@ static void __destroy_persistent_data_structures(struct dm_clone_metadata *cmd)
 
 /*---------------------------------------------------------------------------*/
 
-static size_t bitmap_size(unsigned long nr_bits)
-{
-	return BITS_TO_LONGS(nr_bits) * sizeof(long);
-}
-
 static int __dirty_map_init(struct dirty_map *dmap, unsigned long nr_words,
 			    unsigned long nr_regions)
 {
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 03644237e1ef..63e422f8ba3d 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -237,9 +237,11 @@ extern int bitmap_print_list_to_buf(char *buf, const unsigned long *maskp,
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
 #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
 
+#define bitmap_size(nbits)	(ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE)
+
 static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
 {
-	unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
+	unsigned int len = bitmap_size(nbits);
 
 	if (small_const_nbits(nbits))
 		*dst = 0;
@@ -249,7 +251,7 @@ static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
 
 static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
 {
-	unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
+	unsigned int len = bitmap_size(nbits);
 
 	if (small_const_nbits(nbits))
 		*dst = ~0UL;
@@ -260,7 +262,7 @@ static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
 static inline void bitmap_copy(unsigned long *dst, const unsigned long *src,
 			unsigned int nbits)
 {
-	unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
+	unsigned int len = bitmap_size(nbits);
 
 	if (small_const_nbits(nbits))
 		*dst = *src;
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f10fb87d49db..dbdbf1451cad 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -821,7 +821,7 @@ static inline int cpulist_parse(const char *buf, struct cpumask *dstp)
  */
 static inline unsigned int cpumask_size(void)
 {
-	return BITS_TO_LONGS(large_cpumask_bits) * sizeof(long);
+	return bitmap_size(large_cpumask_bits);
 }
 
 /*
diff --git a/lib/math/prime_numbers.c b/lib/math/prime_numbers.c
index d42cebf7407f..d3b64b10da1c 100644
--- a/lib/math/prime_numbers.c
+++ b/lib/math/prime_numbers.c
@@ -6,8 +6,6 @@
 #include <linux/prime_numbers.h>
 #include <linux/slab.h>
 
-#define bitmap_size(nbits) (BITS_TO_LONGS(nbits) * sizeof(unsigned long))
-
 struct primes {
 	struct rcu_head rcu;
 	unsigned long last, sz;
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
index f3566ea0f932..81a2299ace15 100644
--- a/tools/include/linux/bitmap.h
+++ b/tools/include/linux/bitmap.h
@@ -2,6 +2,7 @@
 #ifndef _TOOLS_LINUX_BITMAP_H
 #define _TOOLS_LINUX_BITMAP_H
 
+#include <linux/align.h>
 #include <string.h>
 #include <linux/bitops.h>
 #include <linux/find.h>
@@ -25,13 +26,14 @@ bool __bitmap_intersects(const unsigned long *bitmap1,
 #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
 #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
 
+#define bitmap_size(nbits)	(ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE)
+
 static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
 {
 	if (small_const_nbits(nbits))
 		*dst = 0UL;
 	else {
-		int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
-		memset(dst, 0, len);
+		memset(dst, 0, bitmap_size(nbits));
 	}
 }
 
@@ -83,7 +85,7 @@ static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
  */
 static inline unsigned long *bitmap_zalloc(int nbits)
 {
-	return calloc(1, BITS_TO_LONGS(nbits) * sizeof(unsigned long));
+	return calloc(1, bitmap_size(nbits));
 }
 
 /*
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits()
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (7 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 08/14] bitmap: introduce generic optimized bitmap_size() Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 16:31   ` Yury Norov
  2023-10-09 15:10 ` [PATCH 10/14] ip_tunnel: use a separate struct to store tunnel params in the kernel Alexander Lobakin
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
particular offset. Currently, there are only bitmap_{get,set}_value8()
to do that for 8 bits and that's it.
Instead of introducing a separate pair for u16 and so on, which doesn't
scale well, extend the existing functions to be able to pass the wanted
value width. Make both offset and width arbitrary, but in order to not
over complicate the current logic and keep the helpers as optimized as
the current ones, require the width to be a pow-2 value and the offset
to be a multiple of the width, while the target piece should not cross
a %BITS_PER_LONG boundary and stay within one long.
Avoid adjusting all the already existing callsites by defining oneliner
wrapper macros named after the former functions. bloat-o-meter shows
almost no difference (+1-2 bytes in a couple of places), meaning the
new helpers get optimized just nicely.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/bitmap.h | 51 ++++++++++++++++++++++++++++++------------
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 63e422f8ba3d..9c010a7fa331 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -6,8 +6,10 @@
 
 #include <linux/align.h>
 #include <linux/bitops.h>
+#include <linux/bug.h>
 #include <linux/find.h>
 #include <linux/limits.h>
+#include <linux/log2.h>
 #include <linux/string.h>
 #include <linux/types.h>
 
@@ -569,38 +571,59 @@ static inline void bitmap_from_u64(unsigned long *dst, u64 mask)
 }
 
 /**
- * bitmap_get_value8 - get an 8-bit value within a memory region
+ * bitmap_get_bits - get a 8/16/32/64-bit value within a memory region
  * @map: address to the bitmap memory region
- * @start: bit offset of the 8-bit value; must be a multiple of 8
+ * @start: bit offset of the value; must be a multiple of @len
+ * @len: bit width of the value; must be a power of two
  *
- * Returns the 8-bit value located at the @start bit offset within the @src
- * memory region.
+ * Return: the 8/16/32/64-bit value located at the @start bit offset within
+ * the @src memory region. Its position (@start + @len) can't cross
+ * a ``BITS_PER_LONG`` boundary.
  */
-static inline unsigned long bitmap_get_value8(const unsigned long *map,
-					      unsigned long start)
+static inline unsigned long bitmap_get_bits(const unsigned long *map,
+					    unsigned long start, size_t len)
 {
 	const size_t index = BIT_WORD(start);
 	const unsigned long offset = start % BITS_PER_LONG;
 
-	return (map[index] >> offset) & 0xFF;
+	if (WARN_ON_ONCE(!is_power_of_2(len) || offset % len ||
+			 offset + len > BITS_PER_LONG))
+		return 0;
+
+	return (map[index] >> offset) & GENMASK(len - 1, 0);
 }
 
 /**
- * bitmap_set_value8 - set an 8-bit value within a memory region
+ * bitmap_set_bits - set a 8/16/32/64-bit value within a memory region
  * @map: address to the bitmap memory region
- * @value: the 8-bit value; values wider than 8 bits may clobber bitmap
- * @start: bit offset of the 8-bit value; must be a multiple of 8
+ * @start: bit offset of the value; must be a multiple of @len
+ * @value: new value to set
+ * @len: bit width of the value; must be a power of two
+ *
+ * Replaces the 8/16/32/64-bit value located at the @start bit offset within
+ * the @src memory region with the new @value. Its position (@start + @len)
+ * can't cross a ``BITS_PER_LONG`` boundary.
  */
-static inline void bitmap_set_value8(unsigned long *map, unsigned long value,
-				     unsigned long start)
+static inline void bitmap_set_bits(unsigned long *map, unsigned long start,
+				   unsigned long value, size_t len)
 {
 	const size_t index = BIT_WORD(start);
 	const unsigned long offset = start % BITS_PER_LONG;
+	unsigned long mask = GENMASK(len - 1, 0);
 
-	map[index] &= ~(0xFFUL << offset);
-	map[index] |= value << offset;
+	if (WARN_ON_ONCE(!is_power_of_2(len) || offset % len ||
+			 offset + len > BITS_PER_LONG))
+		return;
+
+	map[index] &= ~(mask << offset);
+	map[index] |= (value & mask) << offset;
 }
 
+#define bitmap_get_value8(map, start)				\
+	bitmap_get_bits(map, start, BITS_PER_BYTE)
+#define bitmap_set_value8(map, value, start)			\
+	bitmap_set_bits(map, start, value, BITS_PER_BYTE)
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __LINUX_BITMAP_H */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/14] ip_tunnel: use a separate struct to store tunnel params in the kernel
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (8 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits() Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 12/14] lib/bitmap: add compile-time test for __assign_bit() optimization Alexander Lobakin
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Unlike IPv6 tunnels which use purely-kernel __ip6_tnl_parm structure
to store params inside the kernel, IPv4 tunnel code uses the same
ip_tunnel_parm which is being used to talk with the userspace.
This makes it difficult to alter or add any fields or use a
different format for whatever data.
Define struct ip_tunnel_parm_kern, a 1:1 copy of ip_tunnel_parm for
now, and use it throughout the code. Define the pieces, where the copy
user <-> kernel happens, as standalone functions, and copy the data
there field-by-field, so that the kernel-side structure could be easily
modified later on and the users wouldn't have to care about this.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 .../ethernet/mellanox/mlxsw/spectrum_ipip.c   | 30 +++++----
 .../ethernet/mellanox/mlxsw/spectrum_ipip.h   |  2 +-
 .../ethernet/mellanox/mlxsw/spectrum_span.c   |  4 +-
 include/linux/netdevice.h                     |  7 ++-
 include/net/ip_tunnels.h                      | 25 ++++++--
 net/ipv4/ip_gre.c                             | 17 ++---
 net/ipv4/ip_tunnel.c                          | 63 +++++++++++++++----
 net/ipv4/ip_tunnel_core.c                     |  2 +-
 net/ipv4/ip_vti.c                             | 12 ++--
 net/ipv4/ipip.c                               | 12 ++--
 net/ipv4/ipmr.c                               |  2 +-
 net/ipv6/addrconf.c                           |  3 +-
 net/ipv6/sit.c                                | 33 +++++-----
 13 files changed, 137 insertions(+), 75 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
index 3340b4a694c3..d67df358a52f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.c
@@ -8,7 +8,7 @@
 #include "spectrum_ipip.h"
 #include "reg.h"
 
-struct ip_tunnel_parm
+struct ip_tunnel_parm_kern
 mlxsw_sp_ipip_netdev_parms4(const struct net_device *ol_dev)
 {
 	struct ip_tunnel *tun = netdev_priv(ol_dev);
@@ -24,7 +24,8 @@ mlxsw_sp_ipip_netdev_parms6(const struct net_device *ol_dev)
 	return tun->parms;
 }
 
-static bool mlxsw_sp_ipip_parms4_has_ikey(const struct ip_tunnel_parm *parms)
+static bool
+mlxsw_sp_ipip_parms4_has_ikey(const struct ip_tunnel_parm_kern *parms)
 {
 	return !!(parms->i_flags & TUNNEL_KEY);
 }
@@ -34,7 +35,8 @@ static bool mlxsw_sp_ipip_parms6_has_ikey(const struct __ip6_tnl_parm *parms)
 	return !!(parms->i_flags & TUNNEL_KEY);
 }
 
-static bool mlxsw_sp_ipip_parms4_has_okey(const struct ip_tunnel_parm *parms)
+static bool
+mlxsw_sp_ipip_parms4_has_okey(const struct ip_tunnel_parm_kern *parms)
 {
 	return !!(parms->o_flags & TUNNEL_KEY);
 }
@@ -44,7 +46,7 @@ static bool mlxsw_sp_ipip_parms6_has_okey(const struct __ip6_tnl_parm *parms)
 	return !!(parms->o_flags & TUNNEL_KEY);
 }
 
-static u32 mlxsw_sp_ipip_parms4_ikey(const struct ip_tunnel_parm *parms)
+static u32 mlxsw_sp_ipip_parms4_ikey(const struct ip_tunnel_parm_kern *parms)
 {
 	return mlxsw_sp_ipip_parms4_has_ikey(parms) ?
 		be32_to_cpu(parms->i_key) : 0;
@@ -56,7 +58,7 @@ static u32 mlxsw_sp_ipip_parms6_ikey(const struct __ip6_tnl_parm *parms)
 		be32_to_cpu(parms->i_key) : 0;
 }
 
-static u32 mlxsw_sp_ipip_parms4_okey(const struct ip_tunnel_parm *parms)
+static u32 mlxsw_sp_ipip_parms4_okey(const struct ip_tunnel_parm_kern *parms)
 {
 	return mlxsw_sp_ipip_parms4_has_okey(parms) ?
 		be32_to_cpu(parms->o_key) : 0;
@@ -69,7 +71,7 @@ static u32 mlxsw_sp_ipip_parms6_okey(const struct __ip6_tnl_parm *parms)
 }
 
 static union mlxsw_sp_l3addr
-mlxsw_sp_ipip_parms4_saddr(const struct ip_tunnel_parm *parms)
+mlxsw_sp_ipip_parms4_saddr(const struct ip_tunnel_parm_kern *parms)
 {
 	return (union mlxsw_sp_l3addr) { .addr4 = parms->iph.saddr };
 }
@@ -81,7 +83,7 @@ mlxsw_sp_ipip_parms6_saddr(const struct __ip6_tnl_parm *parms)
 }
 
 static union mlxsw_sp_l3addr
-mlxsw_sp_ipip_parms4_daddr(const struct ip_tunnel_parm *parms)
+mlxsw_sp_ipip_parms4_daddr(const struct ip_tunnel_parm_kern *parms)
 {
 	return (union mlxsw_sp_l3addr) { .addr4 = parms->iph.daddr };
 }
@@ -96,7 +98,7 @@ union mlxsw_sp_l3addr
 mlxsw_sp_ipip_netdev_saddr(enum mlxsw_sp_l3proto proto,
 			   const struct net_device *ol_dev)
 {
-	struct ip_tunnel_parm parms4;
+	struct ip_tunnel_parm_kern parms4;
 	struct __ip6_tnl_parm parms6;
 
 	switch (proto) {
@@ -115,7 +117,9 @@ mlxsw_sp_ipip_netdev_saddr(enum mlxsw_sp_l3proto proto,
 static __be32 mlxsw_sp_ipip_netdev_daddr4(const struct net_device *ol_dev)
 {
 
-	struct ip_tunnel_parm parms4 = mlxsw_sp_ipip_netdev_parms4(ol_dev);
+	struct ip_tunnel_parm_kern parms4;
+
+	parms4 = mlxsw_sp_ipip_netdev_parms4(ol_dev);
 
 	return mlxsw_sp_ipip_parms4_daddr(&parms4).addr4;
 }
@@ -124,7 +128,7 @@ static union mlxsw_sp_l3addr
 mlxsw_sp_ipip_netdev_daddr(enum mlxsw_sp_l3proto proto,
 			   const struct net_device *ol_dev)
 {
-	struct ip_tunnel_parm parms4;
+	struct ip_tunnel_parm_kern parms4;
 	struct __ip6_tnl_parm parms6;
 
 	switch (proto) {
@@ -150,7 +154,7 @@ bool mlxsw_sp_l3addr_is_zero(union mlxsw_sp_l3addr addr)
 static struct mlxsw_sp_ipip_parms
 mlxsw_sp_ipip_netdev_parms_init_gre4(const struct net_device *ol_dev)
 {
-	struct ip_tunnel_parm parms = mlxsw_sp_ipip_netdev_parms4(ol_dev);
+	struct ip_tunnel_parm_kern parms = mlxsw_sp_ipip_netdev_parms4(ol_dev);
 
 	return (struct mlxsw_sp_ipip_parms) {
 		.proto = MLXSW_SP_L3_PROTO_IPV4,
@@ -187,8 +191,8 @@ mlxsw_sp_ipip_decap_config_gre4(struct mlxsw_sp *mlxsw_sp,
 {
 	u16 rif_index = mlxsw_sp_ipip_lb_rif_index(ipip_entry->ol_lb);
 	u16 ul_rif_id = mlxsw_sp_ipip_lb_ul_rif_id(ipip_entry->ol_lb);
+	struct ip_tunnel_parm_kern parms;
 	char rtdp_pl[MLXSW_REG_RTDP_LEN];
-	struct ip_tunnel_parm parms;
 	unsigned int type_check;
 	bool has_ikey;
 	u32 daddr4;
@@ -252,7 +256,7 @@ static struct mlxsw_sp_rif_ipip_lb_config
 mlxsw_sp_ipip_ol_loopback_config_gre4(struct mlxsw_sp *mlxsw_sp,
 				      const struct net_device *ol_dev)
 {
-	struct ip_tunnel_parm parms = mlxsw_sp_ipip_netdev_parms4(ol_dev);
+	struct ip_tunnel_parm_kern parms = mlxsw_sp_ipip_netdev_parms4(ol_dev);
 	enum mlxsw_reg_ritr_loopback_ipip_type lb_ipipt;
 
 	lb_ipipt = mlxsw_sp_ipip_parms4_has_okey(&parms) ?
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.h
index a35f009da561..a66173779641 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_ipip.h
@@ -9,7 +9,7 @@
 #include <linux/if_tunnel.h>
 #include <net/ip6_tunnel.h>
 
-struct ip_tunnel_parm
+struct ip_tunnel_parm_kern
 mlxsw_sp_ipip_netdev_parms4(const struct net_device *ol_dev);
 struct __ip6_tnl_parm
 mlxsw_sp_ipip_netdev_parms6(const struct net_device *ol_dev);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
index b3472fb94617..ee08184bd60f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
@@ -413,8 +413,8 @@ mlxsw_sp_span_gretap4_route(const struct net_device *to_dev,
 			    __be32 *saddrp, __be32 *daddrp)
 {
 	struct ip_tunnel *tun = netdev_priv(to_dev);
+	struct ip_tunnel_parm_kern parms;
 	struct net_device *dev = NULL;
-	struct ip_tunnel_parm parms;
 	struct rtable *rt = NULL;
 	struct flowi4 fl4;
 
@@ -451,7 +451,7 @@ mlxsw_sp_span_entry_gretap4_parms(struct mlxsw_sp *mlxsw_sp,
 				  const struct net_device *to_dev,
 				  struct mlxsw_sp_span_parms *sparmsp)
 {
-	struct ip_tunnel_parm tparm = mlxsw_sp_ipip_netdev_parms4(to_dev);
+	struct ip_tunnel_parm_kern tparm = mlxsw_sp_ipip_netdev_parms4(to_dev);
 	union mlxsw_sp_l3addr saddr = { .addr4 = tparm.iph.saddr };
 	union mlxsw_sp_l3addr daddr = { .addr4 = tparm.iph.daddr };
 	bool inherit_tos = tparm.iph.tos & 0x1;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0896aaa91dd7..34e01d29b449 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -59,7 +59,7 @@ struct ethtool_ops;
 struct kernel_hwtstamp_config;
 struct phy_device;
 struct dsa_port;
-struct ip_tunnel_parm;
+struct ip_tunnel_parm_kern;
 struct macsec_context;
 struct macsec_ops;
 struct netdev_name_node;
@@ -1382,7 +1382,7 @@ struct netdev_net_notifier {
  *	queue id bound to an AF_XDP socket. The flags field specifies if
  *	only RX, only Tx, or both should be woken up using the flags
  *	XDP_WAKEUP_RX and XDP_WAKEUP_TX.
- * int (*ndo_tunnel_ctl)(struct net_device *dev, struct ip_tunnel_parm *p,
+ * int (*ndo_tunnel_ctl)(struct net_device *dev, struct ip_tunnel_parm_kern *p,
  *			 int cmd);
  *	Add, change, delete or get information on an IPv4 tunnel.
  * struct net_device *(*ndo_get_peer_dev)(struct net_device *dev);
@@ -1633,7 +1633,8 @@ struct net_device_ops {
 	int			(*ndo_xsk_wakeup)(struct net_device *dev,
 						  u32 queue_id, u32 flags);
 	int			(*ndo_tunnel_ctl)(struct net_device *dev,
-						  struct ip_tunnel_parm *p, int cmd);
+						  struct ip_tunnel_parm_kern *p,
+						  int cmd);
 	struct net_device *	(*ndo_get_peer_dev)(struct net_device *dev);
 	int                     (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx,
                                                          struct net_device_path *path);
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index f346b4efbc30..6da667b743e1 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -110,6 +110,17 @@ struct ip_tunnel_prl_entry {
 
 struct metadata_dst;
 
+/* Kernel-side copy of ip_tunnel_parm */
+struct ip_tunnel_parm_kern {
+	char			name[IFNAMSIZ];
+	int			link;
+	__be16			i_flags;
+	__be16			o_flags;
+	__be32			i_key;
+	__be32			o_key;
+	struct iphdr		iph;
+};
+
 struct ip_tunnel {
 	struct ip_tunnel __rcu	*next;
 	struct hlist_node hash_node;
@@ -136,7 +147,7 @@ struct ip_tunnel {
 
 	struct dst_cache dst_cache;
 
-	struct ip_tunnel_parm parms;
+	struct ip_tunnel_parm_kern parms;
 
 	int		mlink;
 	int		encap_hlen;	/* Encap header length (FOU,GUE) */
@@ -290,7 +301,11 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		    const struct iphdr *tnl_params, const u8 protocol);
 void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		       const u8 proto, int tunnel_hlen);
-int ip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd);
+int ip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p,
+		  int cmd);
+bool ip_tunnel_parm_from_user(struct ip_tunnel_parm_kern *kp,
+			      const void __user *data);
+bool ip_tunnel_parm_to_user(void __user *data, struct ip_tunnel_parm_kern *kp);
 int ip_tunnel_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
 			     void __user *data, int cmd);
 int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict);
@@ -306,16 +321,16 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
 		  const struct tnl_ptk_info *tpi, struct metadata_dst *tun_dst,
 		  bool log_ecn_error);
 int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[],
-			 struct ip_tunnel_parm *p, __u32 fwmark);
+			 struct ip_tunnel_parm_kern *p, __u32 fwmark);
 int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
-		      struct ip_tunnel_parm *p, __u32 fwmark);
+		      struct ip_tunnel_parm_kern *p, __u32 fwmark);
 void ip_tunnel_setup(struct net_device *dev, unsigned int net_id);
 
 bool ip_tunnel_netlink_encap_parms(struct nlattr *data[],
 				   struct ip_tunnel_encap *encap);
 
 void ip_tunnel_netlink_parms(struct nlattr *data[],
-			     struct ip_tunnel_parm *parms);
+			     struct ip_tunnel_parm_kern *parms);
 
 extern const struct header_ops ip_tunnel_header_ops;
 __be16 ip_tunnel_parse_protocol(const struct sk_buff *skb);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 22a26d1d29a0..b65318a55ae8 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -782,7 +782,8 @@ static void ipgre_link_update(struct net_device *dev, bool set_mtu)
 	}
 }
 
-static int ipgre_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm *p,
+static int ipgre_tunnel_ctl(struct net_device *dev,
+			    struct ip_tunnel_parm_kern *p,
 			    int cmd)
 {
 	int err;
@@ -1126,7 +1127,7 @@ static int erspan_validate(struct nlattr *tb[], struct nlattr *data[],
 static int ipgre_netlink_parms(struct net_device *dev,
 				struct nlattr *data[],
 				struct nlattr *tb[],
-				struct ip_tunnel_parm *parms,
+				struct ip_tunnel_parm_kern *parms,
 				__u32 *fwmark)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
@@ -1193,7 +1194,7 @@ static int ipgre_netlink_parms(struct net_device *dev,
 static int erspan_netlink_parms(struct net_device *dev,
 				struct nlattr *data[],
 				struct nlattr *tb[],
-				struct ip_tunnel_parm *parms,
+				struct ip_tunnel_parm_kern *parms,
 				__u32 *fwmark)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
@@ -1352,7 +1353,7 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[],
 			 struct netlink_ext_ack *extack)
 {
-	struct ip_tunnel_parm p;
+	struct ip_tunnel_parm_kern p;
 	__u32 fwmark = 0;
 	int err;
 
@@ -1370,7 +1371,7 @@ static int erspan_newlink(struct net *src_net, struct net_device *dev,
 			  struct nlattr *tb[], struct nlattr *data[],
 			  struct netlink_ext_ack *extack)
 {
-	struct ip_tunnel_parm p;
+	struct ip_tunnel_parm_kern p;
 	__u32 fwmark = 0;
 	int err;
 
@@ -1389,8 +1390,8 @@ static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[],
 			    struct netlink_ext_ack *extack)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
+	struct ip_tunnel_parm_kern p;
 	__u32 fwmark = t->fwmark;
-	struct ip_tunnel_parm p;
 	int err;
 
 	err = ipgre_newlink_encap_setup(dev, data);
@@ -1418,8 +1419,8 @@ static int erspan_changelink(struct net_device *dev, struct nlattr *tb[],
 			     struct netlink_ext_ack *extack)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
+	struct ip_tunnel_parm_kern p;
 	__u32 fwmark = t->fwmark;
-	struct ip_tunnel_parm p;
 	int err;
 
 	err = ipgre_newlink_encap_setup(dev, data);
@@ -1491,7 +1492,7 @@ static size_t ipgre_get_size(const struct net_device *dev)
 static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
-	struct ip_tunnel_parm *p = &t->parms;
+	struct ip_tunnel_parm_kern *p = &t->parms;
 	__be16 o_flags = p->o_flags;
 
 	if (nla_put_u32(skb, IFLA_GRE_LINK, p->link) ||
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index beeae624c412..658f46208c8d 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -56,7 +56,7 @@ static unsigned int ip_tunnel_hash(__be32 key, __be32 remote)
 			 IP_TNL_HASH_BITS);
 }
 
-static bool ip_tunnel_key_match(const struct ip_tunnel_parm *p,
+static bool ip_tunnel_key_match(const struct ip_tunnel_parm_kern *p,
 				__be16 flags, __be32 key)
 {
 	if (p->i_flags & TUNNEL_KEY) {
@@ -172,7 +172,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net *itn,
 EXPORT_SYMBOL_GPL(ip_tunnel_lookup);
 
 static struct hlist_head *ip_bucket(struct ip_tunnel_net *itn,
-				    struct ip_tunnel_parm *parms)
+				    struct ip_tunnel_parm_kern *parms)
 {
 	unsigned int h;
 	__be32 remote;
@@ -207,7 +207,7 @@ static void ip_tunnel_del(struct ip_tunnel_net *itn, struct ip_tunnel *t)
 }
 
 static struct ip_tunnel *ip_tunnel_find(struct ip_tunnel_net *itn,
-					struct ip_tunnel_parm *parms,
+					struct ip_tunnel_parm_kern *parms,
 					int type)
 {
 	__be32 remote = parms->iph.daddr;
@@ -231,7 +231,7 @@ static struct ip_tunnel *ip_tunnel_find(struct ip_tunnel_net *itn,
 
 static struct net_device *__ip_tunnel_create(struct net *net,
 					     const struct rtnl_link_ops *ops,
-					     struct ip_tunnel_parm *parms)
+					     struct ip_tunnel_parm_kern *parms)
 {
 	int err;
 	struct ip_tunnel *tunnel;
@@ -327,7 +327,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev)
 
 static struct ip_tunnel *ip_tunnel_create(struct net *net,
 					  struct ip_tunnel_net *itn,
-					  struct ip_tunnel_parm *parms)
+					  struct ip_tunnel_parm_kern *parms)
 {
 	struct ip_tunnel *nt;
 	struct net_device *dev;
@@ -845,7 +845,7 @@ EXPORT_SYMBOL_GPL(ip_tunnel_xmit);
 static void ip_tunnel_update(struct ip_tunnel_net *itn,
 			     struct ip_tunnel *t,
 			     struct net_device *dev,
-			     struct ip_tunnel_parm *p,
+			     struct ip_tunnel_parm_kern *p,
 			     bool set_mtu,
 			     __u32 fwmark)
 {
@@ -877,7 +877,8 @@ static void ip_tunnel_update(struct ip_tunnel_net *itn,
 	netdev_state_change(dev);
 }
 
-int ip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
+int ip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p,
+		  int cmd)
 {
 	int err = 0;
 	struct ip_tunnel *t = netdev_priv(dev);
@@ -979,16 +980,52 @@ int ip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_ctl);
 
+bool ip_tunnel_parm_from_user(struct ip_tunnel_parm_kern *kp,
+			      const void __user *data)
+{
+	struct ip_tunnel_parm p;
+
+	if (copy_from_user(&p, data, sizeof(p)))
+		return false;
+
+	strscpy(kp->name, p.name, sizeof(kp->name));
+	kp->link = p.link;
+	kp->i_flags = p.i_flags;
+	kp->o_flags = p.o_flags;
+	kp->i_key = p.i_key;
+	kp->o_key = p.o_key;
+	memcpy(&kp->iph, &p.iph, min(sizeof(kp->iph), sizeof(p.iph)));
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(ip_tunnel_parm_from_user);
+
+bool ip_tunnel_parm_to_user(void __user *data, struct ip_tunnel_parm_kern *kp)
+{
+	struct ip_tunnel_parm p;
+
+	strscpy(p.name, kp->name, sizeof(p.name));
+	p.link = kp->link;
+	p.i_flags = kp->i_flags;
+	p.o_flags = kp->o_flags;
+	p.i_key = kp->i_key;
+	p.o_key = kp->o_key;
+	memcpy(&p.iph, &kp->iph, min(sizeof(p.iph), sizeof(kp->iph)));
+
+	return !copy_to_user(data, &p, sizeof(p));
+}
+EXPORT_SYMBOL_GPL(ip_tunnel_parm_to_user);
+
 int ip_tunnel_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
 			     void __user *data, int cmd)
 {
-	struct ip_tunnel_parm p;
+	struct ip_tunnel_parm_kern p;
 	int err;
 
-	if (copy_from_user(&p, data, sizeof(p)))
+	if (!ip_tunnel_parm_from_user(&p, data))
 		return -EFAULT;
 	err = dev->netdev_ops->ndo_tunnel_ctl(dev, &p, cmd);
-	if (!err && copy_to_user(data, &p, sizeof(p)))
+	if (!err && !ip_tunnel_parm_to_user(data, &p))
 		return -EFAULT;
 	return err;
 }
@@ -1067,7 +1104,7 @@ int ip_tunnel_init_net(struct net *net, unsigned int ip_tnl_net_id,
 				  struct rtnl_link_ops *ops, char *devname)
 {
 	struct ip_tunnel_net *itn = net_generic(net, ip_tnl_net_id);
-	struct ip_tunnel_parm parms;
+	struct ip_tunnel_parm_kern parms;
 	unsigned int i;
 
 	itn->rtnl_link_ops = ops;
@@ -1147,7 +1184,7 @@ void ip_tunnel_delete_nets(struct list_head *net_list, unsigned int id,
 EXPORT_SYMBOL_GPL(ip_tunnel_delete_nets);
 
 int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
-		      struct ip_tunnel_parm *p, __u32 fwmark)
+		      struct ip_tunnel_parm_kern *p, __u32 fwmark)
 {
 	struct ip_tunnel *nt;
 	struct net *net = dev_net(dev);
@@ -1201,7 +1238,7 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
 EXPORT_SYMBOL_GPL(ip_tunnel_newlink);
 
 int ip_tunnel_changelink(struct net_device *dev, struct nlattr *tb[],
-			 struct ip_tunnel_parm *p, __u32 fwmark)
+			 struct ip_tunnel_parm_kern *p, __u32 fwmark)
 {
 	struct ip_tunnel *t;
 	struct ip_tunnel *tunnel = netdev_priv(dev);
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 586b1b3e35b8..f89e3bdef123 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -1116,7 +1116,7 @@ bool ip_tunnel_netlink_encap_parms(struct nlattr *data[],
 EXPORT_SYMBOL_GPL(ip_tunnel_netlink_encap_parms);
 
 void ip_tunnel_netlink_parms(struct nlattr *data[],
-			     struct ip_tunnel_parm *parms)
+			     struct ip_tunnel_parm_kern *parms)
 {
 	if (data[IFLA_IPTUN_LINK])
 		parms->link = nla_get_u32(data[IFLA_IPTUN_LINK]);
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index d1e7d0ceb7ed..e1718c088433 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -167,7 +167,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev,
 			    struct flowi *fl)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	struct ip_tunnel_parm *parms = &tunnel->parms;
+	struct ip_tunnel_parm_kern *parms = &tunnel->parms;
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *tdev;	/* Device to other host */
 	int pkt_len = skb->len;
@@ -373,7 +373,7 @@ static int vti4_err(struct sk_buff *skb, u32 info)
 }
 
 static int
-vti_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
+vti_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p, int cmd)
 {
 	int err = 0;
 
@@ -529,7 +529,7 @@ static int vti_tunnel_validate(struct nlattr *tb[], struct nlattr *data[],
 }
 
 static void vti_netlink_parms(struct nlattr *data[],
-			      struct ip_tunnel_parm *parms,
+			      struct ip_tunnel_parm_kern *parms,
 			      __u32 *fwmark)
 {
 	memset(parms, 0, sizeof(*parms));
@@ -564,7 +564,7 @@ static int vti_newlink(struct net *src_net, struct net_device *dev,
 		       struct nlattr *tb[], struct nlattr *data[],
 		       struct netlink_ext_ack *extack)
 {
-	struct ip_tunnel_parm parms;
+	struct ip_tunnel_parm_kern parms;
 	__u32 fwmark = 0;
 
 	vti_netlink_parms(data, &parms, &fwmark);
@@ -576,8 +576,8 @@ static int vti_changelink(struct net_device *dev, struct nlattr *tb[],
 			  struct netlink_ext_ack *extack)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
+	struct ip_tunnel_parm_kern p;
 	__u32 fwmark = t->fwmark;
-	struct ip_tunnel_parm p;
 
 	vti_netlink_parms(data, &p, &fwmark);
 	return ip_tunnel_changelink(dev, tb, &p, fwmark);
@@ -604,7 +604,7 @@ static size_t vti_get_size(const struct net_device *dev)
 static int vti_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
-	struct ip_tunnel_parm *p = &t->parms;
+	struct ip_tunnel_parm_kern *p = &t->parms;
 
 	if (nla_put_u32(skb, IFLA_VTI_LINK, p->link) ||
 	    nla_put_be32(skb, IFLA_VTI_IKEY, p->i_key) ||
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 27b8f83c6ea2..0dd2d3b55c75 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -330,7 +330,7 @@ static bool ipip_tunnel_ioctl_verify_protocol(u8 ipproto)
 }
 
 static int
-ipip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
+ipip_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p, int cmd)
 {
 	if (cmd == SIOCADDTUNNEL || cmd == SIOCCHGTUNNEL) {
 		if (p->iph.version != 4 ||
@@ -405,8 +405,8 @@ static int ipip_tunnel_validate(struct nlattr *tb[], struct nlattr *data[],
 }
 
 static void ipip_netlink_parms(struct nlattr *data[],
-			       struct ip_tunnel_parm *parms, bool *collect_md,
-			       __u32 *fwmark)
+			       struct ip_tunnel_parm_kern *parms,
+			       bool *collect_md, __u32 *fwmark)
 {
 	memset(parms, 0, sizeof(*parms));
 
@@ -432,8 +432,8 @@ static int ipip_newlink(struct net *src_net, struct net_device *dev,
 			struct netlink_ext_ack *extack)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
-	struct ip_tunnel_parm p;
 	struct ip_tunnel_encap ipencap;
+	struct ip_tunnel_parm_kern p;
 	__u32 fwmark = 0;
 
 	if (ip_tunnel_netlink_encap_parms(data, &ipencap)) {
@@ -452,8 +452,8 @@ static int ipip_changelink(struct net_device *dev, struct nlattr *tb[],
 			   struct netlink_ext_ack *extack)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
-	struct ip_tunnel_parm p;
 	struct ip_tunnel_encap ipencap;
+	struct ip_tunnel_parm_kern p;
 	bool collect_md;
 	__u32 fwmark = t->fwmark;
 
@@ -510,7 +510,7 @@ static size_t ipip_get_size(const struct net_device *dev)
 static int ipip_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	struct ip_tunnel_parm *parm = &tunnel->parms;
+	struct ip_tunnel_parm_kern *parm = &tunnel->parms;
 
 	if (nla_put_u32(skb, IFLA_IPTUN_LINK, parm->link) ||
 	    nla_put_in_addr(skb, IFLA_IPTUN_LOCAL, parm->iph.saddr) ||
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 9e222a57bc2b..799d1bccf38d 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -441,7 +441,7 @@ static bool ipmr_init_vif_indev(const struct net_device *dev)
 static struct net_device *ipmr_new_tunnel(struct net *net, struct vifctl *v)
 {
 	struct net_device *tunnel_dev, *new_dev;
-	struct ip_tunnel_parm p = { };
+	struct ip_tunnel_parm_kern p = { };
 	int err;
 
 	tunnel_dev = __dev_get_by_name(net, "tunl0");
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 0b6ee962c84e..2bd0cf2b5a90 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -63,6 +63,7 @@
 #include <linux/string.h>
 #include <linux/hash.h>
 
+#include <net/ip_tunnels.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/snmp.h>
@@ -2857,7 +2858,7 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
 static int addrconf_set_sit_dstaddr(struct net *net, struct net_device *dev,
 		struct in6_ifreq *ireq)
 {
-	struct ip_tunnel_parm p = { };
+	struct ip_tunnel_parm_kern p = { };
 	int err;
 
 	if (!(ipv6_addr_type(&ireq->ifr6_addr) & IPV6_ADDR_COMPATv4))
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index cc24cefdb85c..19af34e4c13c 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -132,8 +132,8 @@ static struct ip_tunnel *ipip6_tunnel_lookup(struct net *net,
 	return NULL;
 }
 
-static struct ip_tunnel __rcu **__ipip6_bucket(struct sit_net *sitn,
-		struct ip_tunnel_parm *parms)
+static struct ip_tunnel __rcu **
+__ipip6_bucket(struct sit_net *sitn, struct ip_tunnel_parm_kern *parms)
 {
 	__be32 remote = parms->iph.daddr;
 	__be32 local = parms->iph.saddr;
@@ -226,7 +226,8 @@ static int ipip6_tunnel_create(struct net_device *dev)
 }
 
 static struct ip_tunnel *ipip6_tunnel_locate(struct net *net,
-		struct ip_tunnel_parm *parms, int create)
+					     struct ip_tunnel_parm_kern *parms,
+					     int create)
 {
 	__be32 remote = parms->iph.daddr;
 	__be32 local = parms->iph.saddr;
@@ -1135,7 +1136,8 @@ static void ipip6_tunnel_bind_dev(struct net_device *dev)
 	dev->needed_headroom = t_hlen + hlen;
 }
 
-static void ipip6_tunnel_update(struct ip_tunnel *t, struct ip_tunnel_parm *p,
+static void ipip6_tunnel_update(struct ip_tunnel *t,
+				struct ip_tunnel_parm_kern *p,
 				__u32 fwmark)
 {
 	struct net *net = t->net;
@@ -1196,11 +1198,11 @@ static int
 ipip6_tunnel_get6rd(struct net_device *dev, struct ip_tunnel_parm __user *data)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
+	struct ip_tunnel_parm_kern p;
 	struct ip_tunnel_6rd ip6rd;
-	struct ip_tunnel_parm p;
 
 	if (dev == dev_to_sit_net(dev)->fb_tunnel_dev) {
-		if (copy_from_user(&p, data, sizeof(p)))
+		if (!ip_tunnel_parm_from_user(&p, data))
 			return -EFAULT;
 		t = ipip6_tunnel_locate(t->net, &p, 0);
 	}
@@ -1251,7 +1253,7 @@ static bool ipip6_valid_ip_proto(u8 ipproto)
 }
 
 static int
-__ipip6_tunnel_ioctl_validate(struct net *net, struct ip_tunnel_parm *p)
+__ipip6_tunnel_ioctl_validate(struct net *net, struct ip_tunnel_parm_kern *p)
 {
 	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
@@ -1268,7 +1270,7 @@ __ipip6_tunnel_ioctl_validate(struct net *net, struct ip_tunnel_parm *p)
 }
 
 static int
-ipip6_tunnel_get(struct net_device *dev, struct ip_tunnel_parm *p)
+ipip6_tunnel_get(struct net_device *dev, struct ip_tunnel_parm_kern *p)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
 
@@ -1281,7 +1283,7 @@ ipip6_tunnel_get(struct net_device *dev, struct ip_tunnel_parm *p)
 }
 
 static int
-ipip6_tunnel_add(struct net_device *dev, struct ip_tunnel_parm *p)
+ipip6_tunnel_add(struct net_device *dev, struct ip_tunnel_parm_kern *p)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
 	int err;
@@ -1297,7 +1299,7 @@ ipip6_tunnel_add(struct net_device *dev, struct ip_tunnel_parm *p)
 }
 
 static int
-ipip6_tunnel_change(struct net_device *dev, struct ip_tunnel_parm *p)
+ipip6_tunnel_change(struct net_device *dev, struct ip_tunnel_parm_kern *p)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
 	int err;
@@ -1328,7 +1330,7 @@ ipip6_tunnel_change(struct net_device *dev, struct ip_tunnel_parm *p)
 }
 
 static int
-ipip6_tunnel_del(struct net_device *dev, struct ip_tunnel_parm *p)
+ipip6_tunnel_del(struct net_device *dev, struct ip_tunnel_parm_kern *p)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
 
@@ -1348,7 +1350,8 @@ ipip6_tunnel_del(struct net_device *dev, struct ip_tunnel_parm *p)
 }
 
 static int
-ipip6_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd)
+ipip6_tunnel_ctl(struct net_device *dev, struct ip_tunnel_parm_kern *p,
+		 int cmd)
 {
 	switch (cmd) {
 	case SIOCGETTUNNEL:
@@ -1494,7 +1497,7 @@ static int ipip6_validate(struct nlattr *tb[], struct nlattr *data[],
 }
 
 static void ipip6_netlink_parms(struct nlattr *data[],
-				struct ip_tunnel_parm *parms,
+				struct ip_tunnel_parm_kern *parms,
 				__u32 *fwmark)
 {
 	memset(parms, 0, sizeof(*parms));
@@ -1603,8 +1606,8 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[],
 			    struct netlink_ext_ack *extack)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
-	struct ip_tunnel_parm p;
 	struct ip_tunnel_encap ipencap;
+	struct ip_tunnel_parm_kern p;
 	struct net *net = t->net;
 	struct sit_net *sitn = net_generic(net, sit_net_id);
 #ifdef CONFIG_IPV6_SIT_6RD
@@ -1691,7 +1694,7 @@ static size_t ipip6_get_size(const struct net_device *dev)
 static int ipip6_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
-	struct ip_tunnel_parm *parm = &tunnel->parms;
+	struct ip_tunnel_parm_kern *parm = &tunnel->parms;
 
 	if (nla_put_u32(skb, IFLA_IPTUN_LINK, parm->link) ||
 	    nla_put_in_addr(skb, IFLA_IPTUN_LOCAL, parm->iph.saddr) ||
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/14] lib/bitmap: add compile-time test for __assign_bit() optimization
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (9 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 10/14] ip_tunnel: use a separate struct to store tunnel params in the kernel Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 13/14] lib/bitmap: add tests for bitmap_{get,set}_bits() Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 14/14] lib/bitmap: add tests for IP tunnel flags conversion helpers Alexander Lobakin
  12 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Commit dc34d5036692 ("lib: test_bitmap: add compile-time
optimization/evaluations assertions") initially missed __assign_bit(),
which led to that quite a time passed before I realized it doesn't get
optimized at compilation time. Now that it does, add test for that just
to make sure nothing will break one day.
To make things more interesting, use bitmap_complement() and
bitmap_full(), thus checking their compile-time evaluation as well. And
remove the misleading comment mentioning the workaround removed recently
in favor of adding the whole file to GCov exceptions.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 lib/test_bitmap.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index f2ea9f30c7c5..cbf1b9611616 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -1181,14 +1181,7 @@ static void __init test_bitmap_const_eval(void)
 	 * in runtime.
 	 */
 
-	/*
-	 * Equals to `unsigned long bitmap[1] = { GENMASK(6, 5), }`.
-	 * Clang on s390 optimizes bitops at compile-time as intended, but at
-	 * the same time stops treating @bitmap and @bitopvar as compile-time
-	 * constants after regular test_bit() is executed, thus triggering the
-	 * build bugs below. So, call const_test_bit() there directly until
-	 * the compiler is fixed.
-	 */
+	/* Equals to `unsigned long bitmap[1] = { GENMASK(6, 5), }` */
 	bitmap_clear(bitmap, 0, BITS_PER_LONG);
 	if (!test_bit(7, bitmap))
 		bitmap_set(bitmap, 5, 2);
@@ -1220,6 +1213,15 @@ static void __init test_bitmap_const_eval(void)
 	/* ~BIT(25) */
 	BUILD_BUG_ON(!__builtin_constant_p(~var));
 	BUILD_BUG_ON(~var != ~BIT(25));
+
+	/* ~BIT(25) | BIT(25) == ~0UL */
+	bitmap_complement(&var, &var, BITS_PER_LONG);
+	__assign_bit(25, &var, true);
+
+	/* !(~(~0UL)) == 1 */
+	res = bitmap_full(&var, BITS_PER_LONG);
+	BUILD_BUG_ON(!__builtin_constant_p(res));
+	BUILD_BUG_ON(!res);
 }
 
 static void __init selftest(void)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/14] lib/bitmap: add tests for bitmap_{get,set}_bits()
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (10 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 12/14] lib/bitmap: add compile-time test for __assign_bit() optimization Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  2023-10-09 15:10 ` [PATCH 14/14] lib/bitmap: add tests for IP tunnel flags conversion helpers Alexander Lobakin
  12 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

bitmap_{get,set}_value8() is now bitmap_{get,set}_bits(). The former
didn't have any dedicated tests in the bitmap test suite.
Add a pack of test cases with variable offset, width, and value to set
(for _set_bits()), randomly generated with the only limitation:
``offset % 32 + width <= 32``, to make sure the tests won't fail or
trigger kernel warnings on 32-bit platforms. For _get_bits(), compare
the return values with the expected ones, calculated and saved manually.
For _set_bit(), do several modifications of the source bitmap and then
compare against the expected resulting one, also pre-calculated.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 lib/test_bitmap.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index cbf1b9611616..6037b66fd30a 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -1161,6 +1161,82 @@ static void __init test_bitmap_print_buf(void)
 	}
 }
 
+struct getset_test {
+	u16	offset;
+	u16	width;
+	union {
+		u32	expect;
+		u32	value;
+	};
+};
+
+#define GETSET_TEST(o, w, v) {	\
+	.offset	= (o),		\
+	.width	= (w),		\
+	.value	= (v),		\
+}
+
+static const unsigned long getset_src[] __initconst = {
+	BITMAP_FROM_U64(0x4329c918b472468eULL),
+	BITMAP_FROM_U64(0xb2c20a622474a119ULL),
+	BITMAP_FROM_U64(0x3a08cb5591cea40dULL),
+	BITMAP_FROM_U64(0xc9a7550384e145f8ULL),
+};
+
+static const struct getset_test get_bits_test[] __initconst = {
+	GETSET_TEST(208, 16, 0x84e1),
+	GETSET_TEST(104, 8, 0xa),
+	GETSET_TEST(224, 32, 0xc9a75503),
+	GETSET_TEST(64, 16, 0xa119),
+	GETSET_TEST(169, 1, 0x1),
+	GETSET_TEST(144, 8, 0xce),
+	GETSET_TEST(80, 4, 0x4),
+	GETSET_TEST(24, 4, 0x4),
+};
+
+static const struct getset_test set_bits_test[] __initconst = {
+	GETSET_TEST(56, 4, 0xa),
+	GETSET_TEST(80, 16, 0xb17a),
+	GETSET_TEST(112, 8, 0x1b),
+	GETSET_TEST(0, 32, 0xe8a555f2),
+	GETSET_TEST(16, 2, 0),
+	GETSET_TEST(72, 8, 0x7d),
+	GETSET_TEST(47, 1, 0),
+	GETSET_TEST(160, 16, 0x1622),
+};
+
+static const unsigned long getset_out[] __initconst = {
+	BITMAP_FROM_U64(0x4a294918e8a455f2ULL),
+	BITMAP_FROM_U64(0xb21b0a62b17a7d19ULL),
+	BITMAP_FROM_U64(0x3a08162291cea40dULL),
+	BITMAP_FROM_U64(0xc9a7550384e145f8ULL),
+};
+
+#define GETSET_TEST_BITS	BYTES_TO_BITS(sizeof(getset_out))
+
+static void __init test_bitmap_getset_bits(void)
+{
+	DECLARE_BITMAP(out, GETSET_TEST_BITS);
+
+	for (u32 i = 0; i < ARRAY_SIZE(get_bits_test); i++) {
+		const struct getset_test *test = &get_bits_test[i];
+		u32 val;
+
+		val = bitmap_get_bits(getset_src, test->offset, test->width);
+		expect_eq_uint(test->expect, val);
+	}
+
+	bitmap_copy(out, getset_src, GETSET_TEST_BITS);
+
+	for (u32 i = 0; i < ARRAY_SIZE(set_bits_test); i++) {
+		const struct getset_test *test = &set_bits_test[i];
+
+		bitmap_set_bits(out, test->offset, test->value, test->width);
+	}
+
+	expect_eq_bitmap(getset_out, out, GETSET_TEST_BITS);
+}
+
 /*
  * FIXME: Clang breaks compile-time evaluations when KASAN and GCOV are enabled.
  * To workaround it, GCOV is force-disabled in Makefile for this configuration.
@@ -1238,6 +1314,7 @@ static void __init selftest(void)
 	test_mem_optimisations();
 	test_bitmap_cut();
 	test_bitmap_print_buf();
+	test_bitmap_getset_bits();
 	test_bitmap_const_eval();
 
 	test_find_nth_bit();
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 14/14] lib/bitmap: add tests for IP tunnel flags conversion helpers
  2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
                   ` (11 preceding siblings ...)
  2023-10-09 15:10 ` [PATCH 13/14] lib/bitmap: add tests for bitmap_{get,set}_bits() Alexander Lobakin
@ 2023-10-09 15:10 ` Alexander Lobakin
  12 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-09 15:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Alexander Lobakin, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

Now that there are helpers for converting IP tunnel flags between the
old __be16 format and the bitmap format, make sure they work as expected
by adding a couple of tests to the bitmap testing suite. The helpers are
all inline, so no dependencies on the related CONFIG_* (or a standalone
module) are needed.

Cover three possible cases:

1. No bits past BIT(15) are set, VTI/SIT bits are not set. This
   conversion is almost a direct assignment.
2. No bits past BIT(15) are set, but VTI/SIT bit is set. During the
   conversion, it must be transformed into BIT(16) in the bitmap,
   but still compatible with the __be16 format.
3. The bitmap has bits past BIT(15) set (not the VTI/SIT one). The
   result will be truncated.
   Note that currently __IP_TUNNEL_FLAG_NUM is 17 (incl. special),
   which means that the result of this case is currently
   semi-false-positive. When BIT(17) is finally here, it will be
   adjusted accordingly.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 lib/test_bitmap.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 6037b66fd30a..23da19e1ff7e 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -14,6 +14,8 @@
 #include <linux/string.h>
 #include <linux/uaccess.h>
 
+#include <net/ip_tunnels.h>
+
 #include "../tools/testing/selftests/kselftest_module.h"
 
 #define EXP1_IN_BITS	(sizeof(exp1) * 8)
@@ -1300,6 +1302,108 @@ static void __init test_bitmap_const_eval(void)
 	BUILD_BUG_ON(!res);
 }
 
+struct ip_tunnel_flags_test {
+	const u16	*src_bits;
+	const u16	*exp_bits;
+	u8		src_num;
+	u8		exp_num;
+	__be16		exp_val;
+	bool		exp_comp:1;
+};
+
+#define IP_TUNNEL_FLAGS_TEST(src, comp, eval, exp) {	\
+	.src_bits	= (src),			\
+	.src_num	= ARRAY_SIZE(src),		\
+	.exp_comp	= (comp),			\
+	.exp_val	= (eval),			\
+	.exp_bits	= (exp),			\
+	.exp_num	= ARRAY_SIZE(exp),		\
+}
+
+/* These are __be16-compatible and can be compared as is */
+static const u16 ip_tunnel_flags_1[] __initconst = {
+	IP_TUNNEL_KEY_BIT,
+	IP_TUNNEL_STRICT_BIT,
+	IP_TUNNEL_ERSPAN_OPT_BIT,
+};
+
+/*
+ * Due to the previous flags design limitation, setting either
+ * ``IP_TUNNEL_CSUM_BIT`` (on Big Endian) or ``IP_TUNNEL_DONT_FRAGMENT_BIT``
+ * (on Little) also sets VTI/ISATAP bit. In the bitmap implementation, they
+ * correspond to ``BIT(16)``, which is bigger than ``U16_MAX``, but still is
+ * backward-compatible.
+ */
+#ifdef __BIG_ENDIAN
+#define IP_TUNNEL_CONFLICT_BIT	IP_TUNNEL_CSUM_BIT
+#else
+#define IP_TUNNEL_CONFLICT_BIT	IP_TUNNEL_DONT_FRAGMENT_BIT
+#endif
+
+static const u16 ip_tunnel_flags_2_src[] __initconst = {
+	IP_TUNNEL_CONFLICT_BIT,
+};
+
+static const u16 ip_tunnel_flags_2_exp[] __initconst = {
+	IP_TUNNEL_CONFLICT_BIT,
+	IP_TUNNEL_SIT_ISATAP_BIT,
+};
+
+/* Bits 17 and higher are not compatible with __be16 flags */
+static const u16 ip_tunnel_flags_3_src[] __initconst = {
+	IP_TUNNEL_VXLAN_OPT_BIT,
+	17,
+	18,
+	20,
+};
+
+static const u16 ip_tunnel_flags_3_exp[] __initconst = {
+	IP_TUNNEL_VXLAN_OPT_BIT,
+};
+
+static const struct ip_tunnel_flags_test ip_tunnel_flags_test[] __initconst = {
+	IP_TUNNEL_FLAGS_TEST(ip_tunnel_flags_1, true,
+			     cpu_to_be16(BIT(IP_TUNNEL_KEY_BIT) |
+					 BIT(IP_TUNNEL_STRICT_BIT) |
+					 BIT(IP_TUNNEL_ERSPAN_OPT_BIT)),
+			     ip_tunnel_flags_1),
+	IP_TUNNEL_FLAGS_TEST(ip_tunnel_flags_2_src, true, VTI_ISVTI,
+			     ip_tunnel_flags_2_exp),
+	IP_TUNNEL_FLAGS_TEST(ip_tunnel_flags_3_src,
+			     /*
+			      * This must be set to ``false`` once
+			      * ``__IP_TUNNEL_FLAG_NUM`` goes above 17.
+			      */
+			     true,
+			     cpu_to_be16(BIT(IP_TUNNEL_VXLAN_OPT_BIT)),
+			     ip_tunnel_flags_3_exp),
+};
+
+static void __init test_ip_tunnel_flags(void)
+{
+	for (u32 i = 0; i < ARRAY_SIZE(ip_tunnel_flags_test); i++) {
+		typeof(*ip_tunnel_flags_test) *test = &ip_tunnel_flags_test[i];
+		IP_TUNNEL_DECLARE_FLAGS(src) = { };
+		IP_TUNNEL_DECLARE_FLAGS(exp) = { };
+		IP_TUNNEL_DECLARE_FLAGS(out);
+
+		for (u32 j = 0; j < test->src_num; j++)
+			__set_bit(test->src_bits[j], src);
+
+		for (u32 j = 0; j < test->exp_num; j++)
+			__set_bit(test->exp_bits[j], exp);
+
+		ip_tunnel_flags_from_be16(out, test->exp_val);
+
+		expect_eq_uint(test->exp_comp,
+			       ip_tunnel_flags_is_be16_compat(src));
+		expect_eq_uint((__force u16)test->exp_val,
+			       (__force u16)ip_tunnel_flags_to_be16(src));
+
+		__ipt_flag_op(expect_eq_bitmap, exp, out);
+	}
+}
+
 static void __init selftest(void)
 {
 	test_zero_clear();
@@ -1316,6 +1420,7 @@ static void __init selftest(void)
 	test_bitmap_print_buf();
 	test_bitmap_getset_bits();
 	test_bitmap_const_eval();
+	test_ip_tunnel_flags();
 
 	test_find_nth_bit();
 	test_for_each_set_bit();
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 07/14] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits()
  2023-10-09 15:10 ` [PATCH 07/14] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits() Alexander Lobakin
@ 2023-10-09 15:23   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2023-10-09 15:23 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Yury Norov, Andy Shevchenko, Rasmus Villemoes,
	Alexander Potapenko, Jakub Kicinski, Eric Dumazet, David Ahern,
	Przemek Kitszel, Simon Horman, netdev, linux-btrfs, dm-devel,
	ntfs3, linux-s390, linux-kernel

On Mon, Oct 09, 2023 at 05:10:19PM +0200, Alexander Lobakin wrote:
> bitmap_set_bits() does not start with the FS' prefix and may collide
> with a new generic helper one day. It operates with the FS-specific
> types, so there's no change those two could do the same thing.
> Just add the prefix to exclude such possible conflict.
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Acked-by: David Sterba <dsterba@suse.com>

We don't have any other code pending that would potentially collide with
this change so I don't care when and via which tree this gets merged. I
can take it by btrfs too.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 03/14] bitops: let the compiler optimize __assign_bit()
  2023-10-09 15:10 ` [PATCH 03/14] bitops: let the compiler optimize __assign_bit() Alexander Lobakin
@ 2023-10-09 16:18   ` Yury Norov
  2023-10-11  7:25     ` Alexander Lobakin
  0 siblings, 1 reply; 26+ messages in thread
From: Yury Norov @ 2023-10-09 16:18 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

On Mon, Oct 09, 2023 at 05:10:15PM +0200, Alexander Lobakin wrote:
> Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops
> on compile-time constants"), the compilers are able to expand inline
> bitmap operations to compile-time initializers when possible.
> However, during the round of replacement if-__set-else-__clear with
> __assign_bit() as per Andy's advice, bloat-o-meter showed +1024 bytes
> difference in object code size for one module (even one function),
> where the pattern:
> 
> 	DECLARE_BITMAP(foo) = { }; // on the stack, zeroed
> 
> 	if (a)
> 		__set_bit(const_bit_num, foo);
> 	if (b)
> 		__set_bit(another_const_bit_num, foo);
> 	...
> 
> is heavily used, although there should be no difference: the bitmap is
> zeroed, so the second half of __assign_bit() should be compiled-out as
> a no-op.
> I either missed the fact that __assign_bit() has bitmap pointer marked
> as `volatile` (as we usually do for bitmaps) or was hoping that the

No, we usually don't. Atomic ops on individual bits is a notable exception
for bitmaps, as the comment for generic_test_bit() says, for example:
         /*
          * Unlike the bitops with the '__' prefix above, this one *is* atomic,
          * so `volatile` must always stay here with no cast-aways. See
          * `Documentation/atomic_bitops.txt` for the details.
          */

For non-atomic single-bit operations and all multi-bit ops, volatile is
useless, and generic___test_and_set_bit() in the same file casts the 
*addr to a plain 'unsigned long *'.

> compilers would at least try to look past the `volatile` for
> __always_inline functions. Anyhow, due to that attribute, the compilers
> were always compiling the whole expression and no mentioned compile-time
> optimizations were working.
> 
> Convert __assign_bit() to a macro since it's a very simple if-else and
> all of the checks are performed inside __set_bit() and __clear_bit(),
> thus that wrapper has to be as transparent as possible. After that
> change, despite it showing only -20 bytes change for vmlinux (due to
> that it's still relatively unpopular), no drastic code size changes
> happen when replacing if-set-else-clear for onstack bitmaps with
> __assign_bit(), meaning the compiler now expands them to the actual
> operations will all the expected optimizations.
> 
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  include/linux/bitops.h | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index e0cd09eb91cd..f98f4fd1047f 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -284,14 +284,8 @@ static __always_inline void assign_bit(long nr, volatile unsigned long *addr,
>  		clear_bit(nr, addr);
>  }
>  
> -static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
> -					 bool value)
> -{
> -	if (value)
> -		__set_bit(nr, addr);
> -	else
> -		__clear_bit(nr, addr);
> -}
> +#define __assign_bit(nr, addr, value)				\
> +	((value) ? __set_bit(nr, addr) : __clear_bit(nr, addr))

Can you protect nr and addr with braces just as well?
Can you convert the atomic version too, to keep them synchronized ?

>  
>  /**
>   * __ptr_set_bit - Set bit in a pointer's value
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits()
  2023-10-09 15:10 ` [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits() Alexander Lobakin
@ 2023-10-09 16:31   ` Yury Norov
  2023-10-11  9:33     ` Alexander Lobakin
  0 siblings, 1 reply; 26+ messages in thread
From: Yury Norov @ 2023-10-09 16:31 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

+ Alexander Potapenko <glider@google.com>

On Mon, Oct 09, 2023 at 05:10:21PM +0200, Alexander Lobakin wrote:
> Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
> particular offset. Currently, there are only bitmap_{get,set}_value8()
> to do that for 8 bits and that's it.

And also a series from Alexander Potapenko, which I really hope will
get into the -next really soon. It introduces bitmap_read/write which
can set up to BITS_PER_LONG at once, with no limitations on alignment
of position and length:

https://lore.kernel.org/linux-arm-kernel/ZRXbOoKHHafCWQCW@yury-ThinkPad/T/#mc311037494229647088b3a84b9f0d9b50bf227cb

Can you consider building your series on top of it?

> Instead of introducing a separate pair for u16 and so on, which doesn't
> scale well, extend the existing functions to be able to pass the wanted
> value width. Make both offset and width arbitrary, but in order to not
> over complicate the current logic and keep the helpers as optimized as
> the current ones, require the width to be a pow-2 value and the offset
> to be a multiple of the width, while the target piece should not cross
> a %BITS_PER_LONG boundary and stay within one long.
> Avoid adjusting all the already existing callsites by defining oneliner
> wrapper macros named after the former functions. bloat-o-meter shows
> almost no difference (+1-2 bytes in a couple of places), meaning the
> new helpers get optimized just nicely.
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  include/linux/bitmap.h | 51 ++++++++++++++++++++++++++++++------------
>  1 file changed, 37 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 63e422f8ba3d..9c010a7fa331 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -6,8 +6,10 @@
>  
>  #include <linux/align.h>
>  #include <linux/bitops.h>
> +#include <linux/bug.h>
>  #include <linux/find.h>
>  #include <linux/limits.h>
> +#include <linux/log2.h>
>  #include <linux/string.h>
>  #include <linux/types.h>
>  
> @@ -569,38 +571,59 @@ static inline void bitmap_from_u64(unsigned long *dst, u64 mask)
>  }
>  
>  /**
> - * bitmap_get_value8 - get an 8-bit value within a memory region
> + * bitmap_get_bits - get a 8/16/32/64-bit value within a memory region
>   * @map: address to the bitmap memory region
> - * @start: bit offset of the 8-bit value; must be a multiple of 8
> + * @start: bit offset of the value; must be a multiple of @len
> + * @len: bit width of the value; must be a power of two
>   *
> - * Returns the 8-bit value located at the @start bit offset within the @src
> - * memory region.
> + * Return: the 8/16/32/64-bit value located at the @start bit offset within
> + * the @src memory region. Its position (@start + @len) can't cross
> + * a ``BITS_PER_LONG`` boundary.
>   */
> -static inline unsigned long bitmap_get_value8(const unsigned long *map,
> -					      unsigned long start)
> +static inline unsigned long bitmap_get_bits(const unsigned long *map,
> +					    unsigned long start, size_t len)
>  {
>  	const size_t index = BIT_WORD(start);
>  	const unsigned long offset = start % BITS_PER_LONG;
>  
> -	return (map[index] >> offset) & 0xFF;
> +	if (WARN_ON_ONCE(!is_power_of_2(len) || offset % len ||
> +			 offset + len > BITS_PER_LONG))
> +		return 0;
> +
> +	return (map[index] >> offset) & GENMASK(len - 1, 0);
>  }
>  
>  /**
> - * bitmap_set_value8 - set an 8-bit value within a memory region
> + * bitmap_set_bits - set a 8/16/32/64-bit value within a memory region
>   * @map: address to the bitmap memory region
> - * @value: the 8-bit value; values wider than 8 bits may clobber bitmap
> - * @start: bit offset of the 8-bit value; must be a multiple of 8
> + * @start: bit offset of the value; must be a multiple of @len
> + * @value: new value to set
> + * @len: bit width of the value; must be a power of two
> + *
> + * Replaces the 8/16/32/64-bit value located at the @start bit offset within
> + * the @src memory region with the new @value. Its position (@start + @len)
> + * can't cross a ``BITS_PER_LONG`` boundary.
>   */
> -static inline void bitmap_set_value8(unsigned long *map, unsigned long value,
> -				     unsigned long start)
> +static inline void bitmap_set_bits(unsigned long *map, unsigned long start,
> +				   unsigned long value, size_t len)
>  {
>  	const size_t index = BIT_WORD(start);
>  	const unsigned long offset = start % BITS_PER_LONG;
> +	unsigned long mask = GENMASK(len - 1, 0);
>  
> -	map[index] &= ~(0xFFUL << offset);
> -	map[index] |= value << offset;
> +	if (WARN_ON_ONCE(!is_power_of_2(len) || offset % len ||
> +			 offset + len > BITS_PER_LONG))
> +		return;
> +
> +	map[index] &= ~(mask << offset);
> +	map[index] |= (value & mask) << offset;
>  }
>  
> +#define bitmap_get_value8(map, start)				\
> +	bitmap_get_bits(map, start, BITS_PER_BYTE)
> +#define bitmap_set_value8(map, value, start)			\
> +	bitmap_set_bits(map, start, value, BITS_PER_BYTE)
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* __LINUX_BITMAP_H */
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 05/14] s390/cio: rename bitmap_size() -> idset_bitmap_size()
  2023-10-09 15:10 ` [PATCH 05/14] s390/cio: rename bitmap_size() -> idset_bitmap_size() Alexander Lobakin
@ 2023-10-09 16:35   ` Yury Norov
  2023-10-11  7:28     ` Alexander Lobakin
  0 siblings, 1 reply; 26+ messages in thread
From: Yury Norov @ 2023-10-09 16:35 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

On Mon, Oct 09, 2023 at 05:10:17PM +0200, Alexander Lobakin wrote:
> bitmap_size() is a pretty generic name and one may want to use it for
> a generic bitmap API function. At the same time, its logic is not
> "generic", i.e. it's not just `nbits -> size of bitmap in bytes`
> converter as it would be expected from its name.
> Add the prefix 'idset_' used throughout the file where the function
> resides.

At the first glance, this custom implementation just duplicates the
generic one that you introduce in the following patch. If so, why
don't you switch idset to just use generic bitmap_size()?

> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
> idset_new() really wants its vmalloc() + memset() pair to be replaced
> with vzalloc().
> ---
>  drivers/s390/cio/idset.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/s390/cio/idset.c b/drivers/s390/cio/idset.c
> index 45f9c0736be4..0a1105a483bf 100644
> --- a/drivers/s390/cio/idset.c
> +++ b/drivers/s390/cio/idset.c
> @@ -16,7 +16,7 @@ struct idset {
>  	unsigned long bitmap[];
>  };
>  
> -static inline unsigned long bitmap_size(int num_ssid, int num_id)
> +static inline unsigned long idset_bitmap_size(int num_ssid, int num_id)
>  {
>  	return BITS_TO_LONGS(num_ssid * num_id) * sizeof(unsigned long);
>  }
> @@ -25,11 +25,12 @@ static struct idset *idset_new(int num_ssid, int num_id)
>  {
>  	struct idset *set;
>  
> -	set = vmalloc(sizeof(struct idset) + bitmap_size(num_ssid, num_id));
> +	set = vmalloc(sizeof(struct idset) +
> +		      idset_bitmap_size(num_ssid, num_id));
>  	if (set) {
>  		set->num_ssid = num_ssid;
>  		set->num_id = num_id;
> -		memset(set->bitmap, 0, bitmap_size(num_ssid, num_id));
> +		memset(set->bitmap, 0, idset_bitmap_size(num_ssid, num_id));
>  	}
>  	return set;
>  }
> @@ -41,7 +42,8 @@ void idset_free(struct idset *set)
>  
>  void idset_fill(struct idset *set)
>  {
> -	memset(set->bitmap, 0xff, bitmap_size(set->num_ssid, set->num_id));
> +	memset(set->bitmap, 0xff,
> +	       idset_bitmap_size(set->num_ssid, set->num_id));
>  }
>  
>  static inline void idset_add(struct idset *set, int ssid, int id)
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 06/14] fs/ntfs3: rename bitmap_size() -> ntfs3_bitmap_size()
  2023-10-09 15:10 ` [PATCH 06/14] fs/ntfs3: rename bitmap_size() -> ntfs3_bitmap_size() Alexander Lobakin
@ 2023-10-09 16:50   ` Yury Norov
  2023-10-11  7:36     ` Alexander Lobakin
  0 siblings, 1 reply; 26+ messages in thread
From: Yury Norov @ 2023-10-09 16:50 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

On Mon, Oct 09, 2023 at 05:10:18PM +0200, Alexander Lobakin wrote:
> bitmap_size() is a pretty generic name and one may want to use it for
> a generic bitmap API function. At the same time, its logic is
> NTFS-specific, as it aligns to the sizeof(u64), not the sizeof(long)
> (although it uses ideologically right ALIGN() instead of division).
> Add the prefix 'ntfs3_' used for that FS (not just 'ntfs_' to not mix
> it with the legacy module).
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  fs/ntfs3/bitmap.c  |  4 ++--
>  fs/ntfs3/fsntfs.c  |  2 +-
>  fs/ntfs3/index.c   | 11 ++++++-----
>  fs/ntfs3/ntfs_fs.h |  2 +-
>  fs/ntfs3/super.c   |  2 +-
>  5 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ntfs3/bitmap.c b/fs/ntfs3/bitmap.c
> index 107e808e06ea..a2e18f13e93a 100644
> --- a/fs/ntfs3/bitmap.c
> +++ b/fs/ntfs3/bitmap.c
> @@ -653,7 +653,7 @@ int wnd_init(struct wnd_bitmap *wnd, struct super_block *sb, size_t nbits)
>  	wnd->total_zeroes = nbits;
>  	wnd->extent_max = MINUS_ONE_T;
>  	wnd->zone_bit = wnd->zone_end = 0;
> -	wnd->nwnd = bytes_to_block(sb, bitmap_size(nbits));
> +	wnd->nwnd = bytes_to_block(sb, ntfs3_bitmap_size(nbits));
>  	wnd->bits_last = nbits & (wbits - 1);
>  	if (!wnd->bits_last)
>  		wnd->bits_last = wbits;
> @@ -1345,7 +1345,7 @@ int wnd_extend(struct wnd_bitmap *wnd, size_t new_bits)
>  		return -EINVAL;
>  
>  	/* Align to 8 byte boundary. */
> -	new_wnd = bytes_to_block(sb, bitmap_size(new_bits));
> +	new_wnd = bytes_to_block(sb, ntfs3_bitmap_size(new_bits));
>  	new_last = new_bits & (wbits - 1);
>  	if (!new_last)
>  		new_last = wbits;
> diff --git a/fs/ntfs3/fsntfs.c b/fs/ntfs3/fsntfs.c
> index 33afee0f5559..7a14d2347f27 100644
> --- a/fs/ntfs3/fsntfs.c
> +++ b/fs/ntfs3/fsntfs.c
> @@ -522,7 +522,7 @@ static int ntfs_extend_mft(struct ntfs_sb_info *sbi)
>  	ni->mi.dirty = true;
>  
>  	/* Step 2: Resize $MFT::BITMAP. */
> -	new_bitmap_bytes = bitmap_size(new_mft_total);
> +	new_bitmap_bytes = ntfs3_bitmap_size(new_mft_total);
>  
>  	err = attr_set_size(ni, ATTR_BITMAP, NULL, 0, &sbi->mft.bitmap.run,
>  			    new_bitmap_bytes, &new_bitmap_bytes, true, NULL);
> diff --git a/fs/ntfs3/index.c b/fs/ntfs3/index.c
> index 124c6e822623..ab53a4b6ddf8 100644
> --- a/fs/ntfs3/index.c
> +++ b/fs/ntfs3/index.c
> @@ -1453,8 +1453,8 @@ static int indx_create_allocate(struct ntfs_index *indx, struct ntfs_inode *ni,
>  
>  	alloc->nres.valid_size = alloc->nres.data_size = cpu_to_le64(data_size);
>  
> -	err = ni_insert_resident(ni, bitmap_size(1), ATTR_BITMAP, in->name,
> -				 in->name_len, &bitmap, NULL, NULL);
> +	err = ni_insert_resident(ni, ntfs3_bitmap_size(1), ATTR_BITMAP,
> +				 in->name, in->name_len, &bitmap, NULL, NULL);
>  	if (err)
>  		goto out2;
>  
> @@ -1515,8 +1515,9 @@ static int indx_add_allocate(struct ntfs_index *indx, struct ntfs_inode *ni,
>  	if (bmp) {
>  		/* Increase bitmap. */
>  		err = attr_set_size(ni, ATTR_BITMAP, in->name, in->name_len,
> -				    &indx->bitmap_run, bitmap_size(bit + 1),
> -				    NULL, true, NULL);
> +				    &indx->bitmap_run,
> +				    ntfs3_bitmap_size(bit + 1), NULL, true,
> +				    NULL);
>  		if (err)
>  			goto out1;
>  	}
> @@ -2089,7 +2090,7 @@ static int indx_shrink(struct ntfs_index *indx, struct ntfs_inode *ni,
>  	if (in->name == I30_NAME)
>  		ni->vfs_inode.i_size = new_data;
>  
> -	bpb = bitmap_size(bit);
> +	bpb = ntfs3_bitmap_size(bit);
>  	if (bpb * 8 == nbits)
>  		return 0;
>  
> diff --git a/fs/ntfs3/ntfs_fs.h b/fs/ntfs3/ntfs_fs.h
> index 629403ede6e5..93333156aac6 100644
> --- a/fs/ntfs3/ntfs_fs.h
> +++ b/fs/ntfs3/ntfs_fs.h
> @@ -961,7 +961,7 @@ static inline bool run_is_empty(struct runs_tree *run)
>  }
>  
>  /* NTFS uses quad aligned bitmaps. */
> -static inline size_t bitmap_size(size_t bits)
> +static inline size_t ntfs3_bitmap_size(size_t bits)
>  {
>  	return ALIGN((bits + 7) >> 3, 8);
>  }

This looks like duplicating BITS_TO_U64(). If so, why not just switch
to using the macro while you're here?

> diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
> index cfec5e0c7f66..b1fb6efe7084 100644
> --- a/fs/ntfs3/super.c
> +++ b/fs/ntfs3/super.c
> @@ -1285,7 +1285,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
>  
>  	/* Check bitmap boundary. */
>  	tt = sbi->used.bitmap.nbits;
> -	if (inode->i_size < bitmap_size(tt)) {
> +	if (inode->i_size < ntfs3_bitmap_size(tt)) {
>  		ntfs_err(sb, "$Bitmap is corrupted.");
>  		err = -EINVAL;
>  		goto put_inode_out;
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/14] bitmap: introduce generic optimized bitmap_size()
  2023-10-09 15:10 ` [PATCH 08/14] bitmap: introduce generic optimized bitmap_size() Alexander Lobakin
@ 2023-10-09 17:04   ` Yury Norov
  0 siblings, 0 replies; 26+ messages in thread
From: Yury Norov @ 2023-10-09 17:04 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

On Mon, Oct 09, 2023 at 05:10:20PM +0200, Alexander Lobakin wrote:
> The number of times yet another open coded
> `BITS_TO_LONGS(nbits) * sizeof(long)` can be spotted is huge.
> Some generic helper is long overdue.

OK, I see your point. Indeed, opencoding this again and again may be
annoying. 

Acked-by: Yury Norov <yury.norov@gmail.com>
 
> Add one, bitmap_size(), but with one detail.
> BITS_TO_LONGS() uses DIV_ROUND_UP(). The latter works well when both
> divident and divisor are compile-time constants or when the divisor
> is not a pow-of-2. When it is however, the compilers sometimes tend
> to generate suboptimal code (GCC 13):
> 
> 48 83 c0 3f          	add    $0x3f,%rax
> 48 c1 e8 06          	shr    $0x6,%rax
> 48 8d 14 c5 00 00 00 00	lea    0x0(,%rax,8),%rdx
> 
> %BITS_PER_LONG is always a pow-2 (either 32 or 64), but GCC still does
> full division of `nbits + 63` by it and then multiplication by 8.
> Instead of BITS_TO_LONGS(), use ALIGN() and then divide by 8. GCC:
> 
> 8d 50 3f             	lea    0x3f(%rax),%edx
> c1 ea 03             	shr    $0x3,%edx
> 81 e2 f8 ff ff 1f    	and    $0x1ffffff8,%edx
> 
> Now it divides `nbits + 63` by 8 and then masks bits[2:0].
> bloat-o-meter:
> 
> add/remove: 0/0 grow/shrink: 20/133 up/down: 156/-773 (-617)
> 
> Clang does it better and generates the same code before/after starting
> from -O1, except that with the ALIGN() approach it uses %edx and thus
> still saves some bytes:
> 
> add/remove: 0/0 grow/shrink: 9/133 up/down: 18/-538 (-520)
> 
> Note that we can't expand DIV_ROUND_UP() by adding a check and using
> this approach there, as it's used in array declarations where
> expressions are not allowed.
> Add this helper to tools/ as well.
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  drivers/md/dm-clone-metadata.c | 5 -----
>  include/linux/bitmap.h         | 8 +++++---
>  include/linux/cpumask.h        | 2 +-
>  lib/math/prime_numbers.c       | 2 --
>  tools/include/linux/bitmap.h   | 8 +++++---
>  5 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/md/dm-clone-metadata.c b/drivers/md/dm-clone-metadata.c
> index c43d55672bce..47c1fa7aad8b 100644
> --- a/drivers/md/dm-clone-metadata.c
> +++ b/drivers/md/dm-clone-metadata.c
> @@ -465,11 +465,6 @@ static void __destroy_persistent_data_structures(struct dm_clone_metadata *cmd)
>  
>  /*---------------------------------------------------------------------------*/
>  
> -static size_t bitmap_size(unsigned long nr_bits)
> -{
> -	return BITS_TO_LONGS(nr_bits) * sizeof(long);
> -}
> -
>  static int __dirty_map_init(struct dirty_map *dmap, unsigned long nr_words,
>  			    unsigned long nr_regions)
>  {
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 03644237e1ef..63e422f8ba3d 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -237,9 +237,11 @@ extern int bitmap_print_list_to_buf(char *buf, const unsigned long *maskp,
>  #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
>  #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
>  
> +#define bitmap_size(nbits)	(ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE)
> +
>  static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
>  {
> -	unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
> +	unsigned int len = bitmap_size(nbits);
>  
>  	if (small_const_nbits(nbits))
>  		*dst = 0;
> @@ -249,7 +251,7 @@ static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
>  
>  static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
>  {
> -	unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
> +	unsigned int len = bitmap_size(nbits);
>  
>  	if (small_const_nbits(nbits))
>  		*dst = ~0UL;
> @@ -260,7 +262,7 @@ static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
>  static inline void bitmap_copy(unsigned long *dst, const unsigned long *src,
>  			unsigned int nbits)
>  {
> -	unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
> +	unsigned int len = bitmap_size(nbits);
>  
>  	if (small_const_nbits(nbits))
>  		*dst = *src;
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index f10fb87d49db..dbdbf1451cad 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -821,7 +821,7 @@ static inline int cpulist_parse(const char *buf, struct cpumask *dstp)
>   */
>  static inline unsigned int cpumask_size(void)
>  {
> -	return BITS_TO_LONGS(large_cpumask_bits) * sizeof(long);
> +	return bitmap_size(large_cpumask_bits);
>  }
>  
>  /*
> diff --git a/lib/math/prime_numbers.c b/lib/math/prime_numbers.c
> index d42cebf7407f..d3b64b10da1c 100644
> --- a/lib/math/prime_numbers.c
> +++ b/lib/math/prime_numbers.c
> @@ -6,8 +6,6 @@
>  #include <linux/prime_numbers.h>
>  #include <linux/slab.h>
>  
> -#define bitmap_size(nbits) (BITS_TO_LONGS(nbits) * sizeof(unsigned long))
> -
>  struct primes {
>  	struct rcu_head rcu;
>  	unsigned long last, sz;
> diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h
> index f3566ea0f932..81a2299ace15 100644
> --- a/tools/include/linux/bitmap.h
> +++ b/tools/include/linux/bitmap.h
> @@ -2,6 +2,7 @@
>  #ifndef _TOOLS_LINUX_BITMAP_H
>  #define _TOOLS_LINUX_BITMAP_H
>  
> +#include <linux/align.h>
>  #include <string.h>
>  #include <linux/bitops.h>
>  #include <linux/find.h>
> @@ -25,13 +26,14 @@ bool __bitmap_intersects(const unsigned long *bitmap1,
>  #define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1)))
>  #define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1)))
>  
> +#define bitmap_size(nbits)	(ALIGN(nbits, BITS_PER_LONG) / BITS_PER_BYTE)
> +
>  static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
>  {
>  	if (small_const_nbits(nbits))
>  		*dst = 0UL;
>  	else {
> -		int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
> -		memset(dst, 0, len);
> +		memset(dst, 0, bitmap_size(nbits));
>  	}
>  }
>  
> @@ -83,7 +85,7 @@ static inline void bitmap_or(unsigned long *dst, const unsigned long *src1,
>   */
>  static inline unsigned long *bitmap_zalloc(int nbits)
>  {
> -	return calloc(1, BITS_TO_LONGS(nbits) * sizeof(unsigned long));
> +	return calloc(1, bitmap_size(nbits));
>  }
>  
>  /*
> -- 
> 2.41.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 03/14] bitops: let the compiler optimize __assign_bit()
  2023-10-09 16:18   ` Yury Norov
@ 2023-10-11  7:25     ` Alexander Lobakin
  0 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-11  7:25 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 9 Oct 2023 09:18:40 -0700

> On Mon, Oct 09, 2023 at 05:10:15PM +0200, Alexander Lobakin wrote:

[...]

>> -static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
>> -					 bool value)
>> -{
>> -	if (value)
>> -		__set_bit(nr, addr);
>> -	else
>> -		__clear_bit(nr, addr);
>> -}
>> +#define __assign_bit(nr, addr, value)				\
>> +	((value) ? __set_bit(nr, addr) : __clear_bit(nr, addr))
> 
> Can you protect nr and addr with braces just as well?
> Can you convert the atomic version too, to keep them synchronized ?

+ for both. I didn't convert assign_bit() as I thought it wouldn't give
any optimization improvements, but yeah, let the compiler decide.

> 
>>  
>>  /**
>>   * __ptr_set_bit - Set bit in a pointer's value
>> -- 
>> 2.41.0

Thanks,
Olek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 05/14] s390/cio: rename bitmap_size() -> idset_bitmap_size()
  2023-10-09 16:35   ` Yury Norov
@ 2023-10-11  7:28     ` Alexander Lobakin
  0 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-11  7:28 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 9 Oct 2023 09:35:20 -0700

> On Mon, Oct 09, 2023 at 05:10:17PM +0200, Alexander Lobakin wrote:
>> bitmap_size() is a pretty generic name and one may want to use it for
>> a generic bitmap API function. At the same time, its logic is not
>> "generic", i.e. it's not just `nbits -> size of bitmap in bytes`
>> converter as it would be expected from its name.
>> Add the prefix 'idset_' used throughout the file where the function
>> resides.
> 
> At the first glance, this custom implementation just duplicates the
> generic one that you introduce in the following patch. If so, why
> don't you switch idset to just use generic bitmap_size()?

I didn't want to introduce any semantic changes, but good point, why not.

> 
>>
>> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>

[...]

Thanks,
Olek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 06/14] fs/ntfs3: rename bitmap_size() -> ntfs3_bitmap_size()
  2023-10-09 16:50   ` Yury Norov
@ 2023-10-11  7:36     ` Alexander Lobakin
  0 siblings, 0 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-11  7:36 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 9 Oct 2023 09:50:15 -0700

> On Mon, Oct 09, 2023 at 05:10:18PM +0200, Alexander Lobakin wrote:
>> bitmap_size() is a pretty generic name and one may want to use it for

[...]

>> @@ -961,7 +961,7 @@ static inline bool run_is_empty(struct runs_tree *run)
>>  }
>>  
>>  /* NTFS uses quad aligned bitmaps. */
>> -static inline size_t bitmap_size(size_t bits)
>> +static inline size_t ntfs3_bitmap_size(size_t bits)
>>  {
>>  	return ALIGN((bits + 7) >> 3, 8);
>>  }
> 
> This looks like duplicating BITS_TO_U64(). If so, why not just switch
> to using the macro while you're here?

I thought that this

	return BITS_TO_U64(bits) * sizeof(u64);

would give worse optimization, but actually

add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-32 (-32)

Nice, makes sense to do it.

> 
>> diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
>> index cfec5e0c7f66..b1fb6efe7084 100644
>> --- a/fs/ntfs3/super.c
>> +++ b/fs/ntfs3/super.c
>> @@ -1285,7 +1285,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
>>  
>>  	/* Check bitmap boundary. */
>>  	tt = sbi->used.bitmap.nbits;
>> -	if (inode->i_size < bitmap_size(tt)) {
>> +	if (inode->i_size < ntfs3_bitmap_size(tt)) {
>>  		ntfs_err(sb, "$Bitmap is corrupted.");
>>  		err = -EINVAL;
>>  		goto put_inode_out;
>> -- 
>> 2.41.0

Thanks,
Olek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits()
  2023-10-09 16:31   ` Yury Norov
@ 2023-10-11  9:33     ` Alexander Lobakin
  2023-10-11 10:36       ` Andy Shevchenko
  2023-10-15  2:20       ` Yury Norov
  0 siblings, 2 replies; 26+ messages in thread
From: Alexander Lobakin @ 2023-10-11  9:33 UTC (permalink / raw)
  To: Yury Norov
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 9 Oct 2023 09:31:15 -0700

> + Alexander Potapenko <glider@google.com>
> 
> On Mon, Oct 09, 2023 at 05:10:21PM +0200, Alexander Lobakin wrote:
>> Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
>> particular offset. Currently, there are only bitmap_{get,set}_value8()
>> to do that for 8 bits and that's it.
> 
> And also a series from Alexander Potapenko, which I really hope will
> get into the -next really soon. It introduces bitmap_read/write which
> can set up to BITS_PER_LONG at once, with no limitations on alignment
> of position and length:
> 
> https://lore.kernel.org/linux-arm-kernel/ZRXbOoKHHafCWQCW@yury-ThinkPad/T/#mc311037494229647088b3a84b9f0d9b50bf227cb
> 
> Can you consider building your series on top of it?

Yeah, I mentioned in the cover letter that I'm aware of it and in fact
it doesn't conflict much, as the functions I'm adding here get optimized
as much as the original bitmap_{get,set}_value8(), while Alexander's
generic helpers are heavier.
I realize lots of calls will be optimized as well due to the offset and
the width being compile-time constants, but not all of them. The idea of
keeping two pairs of helpers initially came from Andy if I understood
him correctly.
What do you think? I can provide some bloat-o-meter stats after
rebasing. And either way, I see no issue in basing this series on top of
Alex' one.

> 
>> Instead of introducing a separate pair for u16 and so on, which doesn't
>> scale well, extend the existing functions to be able to pass the wanted
>> value width. Make both offset and width arbitrary, but in order to not
>> over complicate the current logic and keep the helpers as optimized as
>> the current ones, require the width to be a pow-2 value and the offset
>> to be a multiple of the width, while the target piece should not cross
>> a %BITS_PER_LONG boundary and stay within one long.
>> Avoid adjusting all the already existing callsites by defining oneliner
>> wrapper macros named after the former functions. bloat-o-meter shows
>> almost no difference (+1-2 bytes in a couple of places), meaning the

[...]

Thanks,
Olek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits()
  2023-10-11  9:33     ` Alexander Lobakin
@ 2023-10-11 10:36       ` Andy Shevchenko
  2023-10-15  2:20       ` Yury Norov
  1 sibling, 0 replies; 26+ messages in thread
From: Andy Shevchenko @ 2023-10-11 10:36 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Yury Norov, Rasmus Villemoes, Alexander Potapenko, Jakub Kicinski,
	Eric Dumazet, David Ahern, Przemek Kitszel, Simon Horman, netdev,
	linux-btrfs, dm-devel, ntfs3, linux-s390, linux-kernel

On Wed, Oct 11, 2023 at 11:33:25AM +0200, Alexander Lobakin wrote:
> From: Yury Norov <yury.norov@gmail.com>
> Date: Mon, 9 Oct 2023 09:31:15 -0700
> 
> > + Alexander Potapenko <glider@google.com>
> > 
> > On Mon, Oct 09, 2023 at 05:10:21PM +0200, Alexander Lobakin wrote:
> >> Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
> >> particular offset. Currently, there are only bitmap_{get,set}_value8()
> >> to do that for 8 bits and that's it.
> > 
> > And also a series from Alexander Potapenko, which I really hope will
> > get into the -next really soon. It introduces bitmap_read/write which
> > can set up to BITS_PER_LONG at once, with no limitations on alignment
> > of position and length:
> > 
> > https://lore.kernel.org/linux-arm-kernel/ZRXbOoKHHafCWQCW@yury-ThinkPad/T/#mc311037494229647088b3a84b9f0d9b50bf227cb
> > 
> > Can you consider building your series on top of it?
> 
> Yeah, I mentioned in the cover letter that I'm aware of it and in fact
> it doesn't conflict much, as the functions I'm adding here get optimized
> as much as the original bitmap_{get,set}_value8(), while Alexander's
> generic helpers are heavier.
> I realize lots of calls will be optimized as well due to the offset and
> the width being compile-time constants, but not all of them. The idea of
> keeping two pairs of helpers initially came from Andy if I understood
> him correctly.

Just a disclaimer: The idea came before I saw the series by Alexander Potapenko.

> What do you think? I can provide some bloat-o-meter stats after
> rebasing. And either way, I see no issue in basing this series on top of
> Alex' one.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits()
  2023-10-11  9:33     ` Alexander Lobakin
  2023-10-11 10:36       ` Andy Shevchenko
@ 2023-10-15  2:20       ` Yury Norov
  1 sibling, 0 replies; 26+ messages in thread
From: Yury Norov @ 2023-10-15  2:20 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Andy Shevchenko, Rasmus Villemoes, Alexander Potapenko,
	Jakub Kicinski, Eric Dumazet, David Ahern, Przemek Kitszel,
	Simon Horman, netdev, linux-btrfs, dm-devel, ntfs3, linux-s390,
	linux-kernel

On Wed, Oct 11, 2023 at 11:33:25AM +0200, Alexander Lobakin wrote:
> From: Yury Norov <yury.norov@gmail.com>
> Date: Mon, 9 Oct 2023 09:31:15 -0700
> 
> > + Alexander Potapenko <glider@google.com>
> > 
> > On Mon, Oct 09, 2023 at 05:10:21PM +0200, Alexander Lobakin wrote:
> >> Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
> >> particular offset. Currently, there are only bitmap_{get,set}_value8()
> >> to do that for 8 bits and that's it.
> > 
> > And also a series from Alexander Potapenko, which I really hope will
> > get into the -next really soon. It introduces bitmap_read/write which
> > can set up to BITS_PER_LONG at once, with no limitations on alignment
> > of position and length:
> > 
> > https://lore.kernel.org/linux-arm-kernel/ZRXbOoKHHafCWQCW@yury-ThinkPad/T/#mc311037494229647088b3a84b9f0d9b50bf227cb
> > 
> > Can you consider building your series on top of it?
> 
> Yeah, I mentioned in the cover letter that I'm aware of it and in fact
> it doesn't conflict much, as the functions I'm adding here get optimized
> as much as the original bitmap_{get,set}_value8(), while Alexander's
> generic helpers are heavier.
> I realize lots of calls will be optimized as well due to the offset and
> the width being compile-time constants, but not all of them. The idea of
> keeping two pairs of helpers initially came from Andy if I understood
> him correctly.
> What do you think? I can provide some bloat-o-meter stats after
> rebasing. And either way, I see no issue in basing this series on top of
> Alex' one.

You're right, let's try both and see what how worse is one comparing
to another wrt bloat-o-meter and overall code generation. If the
difference is not that terrible, I'd stick to universal and simpler
for users version.

If the difference is significant, we'd have to keep both. Maybe it's
worth to try merge the aligned case into generic one, but it's not the
purpose of your series, of course.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-10-15  2:20 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-09 15:10 [PATCH 00/14] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
2023-10-09 15:10 ` [PATCH 01/14] bitops: add missing prototype check Alexander Lobakin
2023-10-09 15:10 ` [PATCH 02/14] bitops: make BYTES_TO_BITS() treewide-available Alexander Lobakin
2023-10-09 15:10 ` [PATCH 03/14] bitops: let the compiler optimize __assign_bit() Alexander Lobakin
2023-10-09 16:18   ` Yury Norov
2023-10-11  7:25     ` Alexander Lobakin
2023-10-09 15:10 ` [PATCH 04/14] linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros Alexander Lobakin
2023-10-09 15:10 ` [PATCH 05/14] s390/cio: rename bitmap_size() -> idset_bitmap_size() Alexander Lobakin
2023-10-09 16:35   ` Yury Norov
2023-10-11  7:28     ` Alexander Lobakin
2023-10-09 15:10 ` [PATCH 06/14] fs/ntfs3: rename bitmap_size() -> ntfs3_bitmap_size() Alexander Lobakin
2023-10-09 16:50   ` Yury Norov
2023-10-11  7:36     ` Alexander Lobakin
2023-10-09 15:10 ` [PATCH 07/14] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits() Alexander Lobakin
2023-10-09 15:23   ` David Sterba
2023-10-09 15:10 ` [PATCH 08/14] bitmap: introduce generic optimized bitmap_size() Alexander Lobakin
2023-10-09 17:04   ` Yury Norov
2023-10-09 15:10 ` [PATCH 09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits() Alexander Lobakin
2023-10-09 16:31   ` Yury Norov
2023-10-11  9:33     ` Alexander Lobakin
2023-10-11 10:36       ` Andy Shevchenko
2023-10-15  2:20       ` Yury Norov
2023-10-09 15:10 ` [PATCH 10/14] ip_tunnel: use a separate struct to store tunnel params in the kernel Alexander Lobakin
2023-10-09 15:10 ` [PATCH 12/14] lib/bitmap: add compile-time test for __assign_bit() optimization Alexander Lobakin
2023-10-09 15:10 ` [PATCH 13/14] lib/bitmap: add tests for bitmap_{get,set}_bits() Alexander Lobakin
2023-10-09 15:10 ` [PATCH 14/14] lib/bitmap: add tests for IP tunnel flags conversion helpers Alexander Lobakin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox