* [rfc, rft, PATCH v1 0/2] bitmap: clean up and (micro?)optimization
@ 2023-08-17 16:54 Andy Shevchenko
2023-08-17 16:54 ` [PATCH v1 1/2] bitmap: Use constants and macros from bits.h Andy Shevchenko
2023-08-17 16:54 ` [PATCH v1 2/2] bitmap: Optimize memset() calls Andy Shevchenko
0 siblings, 2 replies; 8+ messages in thread
From: Andy Shevchenko @ 2023-08-17 16:54 UTC (permalink / raw)
To: Andy Shevchenko, Yury Norov, linux-kernel; +Cc: Rasmus Villemoes
Just wanted to discuss the idea, which is represented by these two
patches. Comments, remarks, tests are all welcome!
Andy Shevchenko (2):
bitmap: Use constants and macros from bits.h
bitmap: Optimize memset() calls
include/linux/bitmap.h | 26 +++++++++++++++++---------
lib/bitmap.c | 4 ++--
2 files changed, 19 insertions(+), 11 deletions(-)
--
2.40.0.1.gaa8946217a0b
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v1 1/2] bitmap: Use constants and macros from bits.h
2023-08-17 16:54 [rfc, rft, PATCH v1 0/2] bitmap: clean up and (micro?)optimization Andy Shevchenko
@ 2023-08-17 16:54 ` Andy Shevchenko
2023-08-18 6:28 ` Rasmus Villemoes
2023-08-17 16:54 ` [PATCH v1 2/2] bitmap: Optimize memset() calls Andy Shevchenko
1 sibling, 1 reply; 8+ messages in thread
From: Andy Shevchenko @ 2023-08-17 16:54 UTC (permalink / raw)
To: Andy Shevchenko, Yury Norov, linux-kernel; +Cc: Rasmus Villemoes
Use BITS_PER_LONG and BITS_PER_BYTE for BITMAP_MEM_ALIGNMENT.
Calculate bytes from bits for memcmp() and memset() with BITS_TO_BYTES().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
include/linux/bitmap.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 1516ff979315..2d5042d1b501 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -354,9 +354,9 @@ static inline void bitmap_complement(unsigned long *dst, const unsigned long *sr
}
#ifdef __LITTLE_ENDIAN
-#define BITMAP_MEM_ALIGNMENT 8
+#define BITMAP_MEM_ALIGNMENT BITS_PER_BYTE
#else
-#define BITMAP_MEM_ALIGNMENT (8 * sizeof(unsigned long))
+#define BITMAP_MEM_ALIGNMENT BITS_PER_LONG
#endif
#define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1)
@@ -367,7 +367,7 @@ static inline bool bitmap_equal(const unsigned long *src1,
return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
- return !memcmp(src1, src2, nbits / 8);
+ return !memcmp(src1, src2, BITS_TO_BYTES(nbits));
return __bitmap_equal(src1, src2, nbits);
}
@@ -454,7 +454,7 @@ static __always_inline void bitmap_set(unsigned long *map, unsigned int start,
IS_ALIGNED(start, BITMAP_MEM_ALIGNMENT) &&
__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
- memset((char *)map + start / 8, 0xff, nbits / 8);
+ memset((char *)map + BITS_TO_BYTES(start), 0xff, BITS_TO_BYTES(nbits));
else
__bitmap_set(map, start, nbits);
}
@@ -470,7 +470,7 @@ static __always_inline void bitmap_clear(unsigned long *map, unsigned int start,
IS_ALIGNED(start, BITMAP_MEM_ALIGNMENT) &&
__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
- memset((char *)map + start / 8, 0, nbits / 8);
+ memset((char *)map + BITS_TO_BYTES(start), 0x00, BITS_TO_BYTES(nbits));
else
__bitmap_clear(map, start, nbits);
}
--
2.40.0.1.gaa8946217a0b
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v1 2/2] bitmap: Optimize memset() calls
2023-08-17 16:54 [rfc, rft, PATCH v1 0/2] bitmap: clean up and (micro?)optimization Andy Shevchenko
2023-08-17 16:54 ` [PATCH v1 1/2] bitmap: Use constants and macros from bits.h Andy Shevchenko
@ 2023-08-17 16:54 ` Andy Shevchenko
2023-08-18 6:53 ` Rasmus Villemoes
1 sibling, 1 reply; 8+ messages in thread
From: Andy Shevchenko @ 2023-08-17 16:54 UTC (permalink / raw)
To: Andy Shevchenko, Yury Norov, linux-kernel; +Cc: Rasmus Villemoes
Intead of byte memset() calls use 32- or 64-bit version depending
on the compile-time BITS_PER_LONG value.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
include/linux/bitmap.h | 16 ++++++++++++----
lib/bitmap.c | 4 ++--
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 2d5042d1b501..6eec4d4fd623 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -234,22 +234,30 @@ extern int bitmap_print_list_to_buf(char *buf, const unsigned long *maskp,
static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
{
- unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
+ unsigned int len = BITS_TO_LONGS(nbits);
if (small_const_nbits(nbits))
*dst = 0;
else
- memset(dst, 0, len);
+#if BITS_PER_LONG == 64
+ memset64((uint64_t *)dst, 0, len);
+#else
+ memset32((uint32_t *)dst, 0, len);
+#endif
}
static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
{
- unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
+ unsigned int len = BITS_TO_LONGS(nbits);
if (small_const_nbits(nbits))
*dst = ~0UL;
else
- memset(dst, 0xff, len);
+#if BITS_PER_LONG == 64
+ memset64((uint64_t *)dst, GENMASK(63, 0), len);
+#else
+ memset32((uint32_t *)dst, GENMASK(31, 0), len);
+#endif
}
static inline void bitmap_copy(unsigned long *dst, const unsigned long *src,
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 935e0f96e785..df0fb37a5732 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -128,7 +128,7 @@ void __bitmap_shift_right(unsigned long *dst, const unsigned long *src,
dst[k] = lower | upper;
}
if (off)
- memset(&dst[lim - off], 0, off*sizeof(unsigned long));
+ bitmap_zero(&dst[lim - off], off);
}
EXPORT_SYMBOL(__bitmap_shift_right);
@@ -166,7 +166,7 @@ void __bitmap_shift_left(unsigned long *dst, const unsigned long *src,
dst[k + off] = lower | upper;
}
if (off)
- memset(dst, 0, off*sizeof(unsigned long));
+ bitmap_zero(dst, off);
}
EXPORT_SYMBOL(__bitmap_shift_left);
--
2.40.0.1.gaa8946217a0b
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v1 1/2] bitmap: Use constants and macros from bits.h
2023-08-17 16:54 ` [PATCH v1 1/2] bitmap: Use constants and macros from bits.h Andy Shevchenko
@ 2023-08-18 6:28 ` Rasmus Villemoes
2023-08-18 8:50 ` Andy Shevchenko
0 siblings, 1 reply; 8+ messages in thread
From: Rasmus Villemoes @ 2023-08-18 6:28 UTC (permalink / raw)
To: Andy Shevchenko, Yury Norov, linux-kernel
On 17/08/2023 18.54, Andy Shevchenko wrote:
> Use BITS_PER_LONG and BITS_PER_BYTE for BITMAP_MEM_ALIGNMENT.
> Calculate bytes from bits for memcmp() and memset() with BITS_TO_BYTES().
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
> include/linux/bitmap.h | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 1516ff979315..2d5042d1b501 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -354,9 +354,9 @@ static inline void bitmap_complement(unsigned long *dst, const unsigned long *sr
> }
>
> #ifdef __LITTLE_ENDIAN
> -#define BITMAP_MEM_ALIGNMENT 8
> +#define BITMAP_MEM_ALIGNMENT BITS_PER_BYTE
> #else
> -#define BITMAP_MEM_ALIGNMENT (8 * sizeof(unsigned long))
> +#define BITMAP_MEM_ALIGNMENT BITS_PER_LONG
> #endif
> #define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1)
>
> @@ -367,7 +367,7 @@ static inline bool bitmap_equal(const unsigned long *src1,
> return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits));
> if (__builtin_constant_p(nbits & BITMAP_MEM_MASK) &&
> IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT))
> - return !memcmp(src1, src2, nbits / 8);
> + return !memcmp(src1, src2, BITS_TO_BYTES(nbits));
Please no. Currently, I can verify the arithmetic directly. Using such a
"helper" I'd have to know whether it just does /8 or if it's more like
the bitmap_words() thing which rounds up to a whole number of words. And
BITS_PER_BYTE (and similarly CHAR_BITS) really is, IMO, much less
readable than 8.
Rasmus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 2/2] bitmap: Optimize memset() calls
2023-08-17 16:54 ` [PATCH v1 2/2] bitmap: Optimize memset() calls Andy Shevchenko
@ 2023-08-18 6:53 ` Rasmus Villemoes
2023-08-18 8:55 ` Andy Shevchenko
0 siblings, 1 reply; 8+ messages in thread
From: Rasmus Villemoes @ 2023-08-18 6:53 UTC (permalink / raw)
To: Andy Shevchenko, Yury Norov, linux-kernel
On 17/08/2023 18.54, Andy Shevchenko wrote:
> Intead of byte memset() calls use 32- or 64-bit version depending
> on the compile-time BITS_PER_LONG value.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
> include/linux/bitmap.h | 16 ++++++++++++----
> lib/bitmap.c | 4 ++--
> 2 files changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 2d5042d1b501..6eec4d4fd623 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -234,22 +234,30 @@ extern int bitmap_print_list_to_buf(char *buf, const unsigned long *maskp,
>
> static inline void bitmap_zero(unsigned long *dst, unsigned int nbits)
> {
> - unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
> + unsigned int len = BITS_TO_LONGS(nbits);
>
> if (small_const_nbits(nbits))
> *dst = 0;
> else
> - memset(dst, 0, len);
> +#if BITS_PER_LONG == 64
> + memset64((uint64_t *)dst, 0, len);
> +#else
> + memset32((uint32_t *)dst, 0, len);
> +#endif
> }
>
So _if_ this is worth it at all, all those new '#if BITS_PER_LONG == 64'
suggests that we should instead have a new helper memset_long(), no?
In fact, string.h already has that:
static inline void *memset_l(unsigned long *p, unsigned long v,
__kernel_size_t n)
> static inline void bitmap_fill(unsigned long *dst, unsigned int nbits)
> {
> - unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long);
> + unsigned int len = BITS_TO_LONGS(nbits);
>
> if (small_const_nbits(nbits))
> *dst = ~0UL;
> else
> - memset(dst, 0xff, len);
> +#if BITS_PER_LONG == 64
> + memset64((uint64_t *)dst, GENMASK(63, 0), len);
> +#else
> + memset32((uint32_t *)dst, GENMASK(31, 0), len);
> +#endif> }
>
Please just spell an all-ones long "~0UL", that also matches the
small_const case.
> static inline void bitmap_copy(unsigned long *dst, const unsigned long *src,
> diff --git a/lib/bitmap.c b/lib/bitmap.c
> index 935e0f96e785..df0fb37a5732 100644
> --- a/lib/bitmap.c
> +++ b/lib/bitmap.c
> @@ -128,7 +128,7 @@ void __bitmap_shift_right(unsigned long *dst, const unsigned long *src,
> dst[k] = lower | upper;
> }
> if (off)
> - memset(&dst[lim - off], 0, off*sizeof(unsigned long));
> + bitmap_zero(&dst[lim - off], off);
This... can't be right. bitmap_zero() still takes an argument which is
the size in bits, while off is the whole number of words to shift. So if
called with say shift=128, we'd have off==2, and that bitmap_zero()
would, because bitmap_zero() rounds up to a whole number of words, end
up clearing just one word.
Perhaps a chance to add some more test cases? Maybe we're not exercising
any of the "shift more than BITS_PER_LONG" logic.
But rather than using bitmap_zero() here, forcing you to convert off to
a number of bits and then bitmap_zero to divide again, if you do this at
all, just change that memset() to memset_l().
> }
> EXPORT_SYMBOL(__bitmap_shift_right);
>
> @@ -166,7 +166,7 @@ void __bitmap_shift_left(unsigned long *dst, const unsigned long *src,
> dst[k + off] = lower | upper;
> }
> if (off)
> - memset(dst, 0, off*sizeof(unsigned long));
> + bitmap_zero(dst, off);
> }
Same here. Cannot possibly be correct, but will work by chance for off
<= 1, i.e. shift <= BITS_PER_LONG.
Rasmus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 1/2] bitmap: Use constants and macros from bits.h
2023-08-18 6:28 ` Rasmus Villemoes
@ 2023-08-18 8:50 ` Andy Shevchenko
2023-08-18 9:13 ` Rasmus Villemoes
0 siblings, 1 reply; 8+ messages in thread
From: Andy Shevchenko @ 2023-08-18 8:50 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: Yury Norov, linux-kernel
On Fri, Aug 18, 2023 at 08:28:21AM +0200, Rasmus Villemoes wrote:
> On 17/08/2023 18.54, Andy Shevchenko wrote:
> > #ifdef __LITTLE_ENDIAN
> > -#define BITMAP_MEM_ALIGNMENT 8
> > +#define BITMAP_MEM_ALIGNMENT BITS_PER_BYTE
> > #else
> > -#define BITMAP_MEM_ALIGNMENT (8 * sizeof(unsigned long))
> > +#define BITMAP_MEM_ALIGNMENT BITS_PER_LONG
> > #endif
> > #define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1)
What about this chunk? Does it worth to be updated?
...
> > - return !memcmp(src1, src2, nbits / 8);
> > + return !memcmp(src1, src2, BITS_TO_BYTES(nbits));
>
> Please no. Currently, I can verify the arithmetic directly. Using such a
> "helper" I'd have to know whether it just does /8 or if it's more like
> the bitmap_words() thing which rounds up to a whole number of words. And
> BITS_PER_BYTE (and similarly CHAR_BITS) really is, IMO, much less
> readable than 8.
Okay, thank you for the comment!
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 2/2] bitmap: Optimize memset() calls
2023-08-18 6:53 ` Rasmus Villemoes
@ 2023-08-18 8:55 ` Andy Shevchenko
0 siblings, 0 replies; 8+ messages in thread
From: Andy Shevchenko @ 2023-08-18 8:55 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: Yury Norov, linux-kernel
On Fri, Aug 18, 2023 at 08:53:38AM +0200, Rasmus Villemoes wrote:
> On 17/08/2023 18.54, Andy Shevchenko wrote:
...
> > +#if BITS_PER_LONG == 64
> > + memset64((uint64_t *)dst, 0, len);
> > +#else
> > + memset32((uint32_t *)dst, 0, len);
> > +#endif
> > }
>
> So _if_ this is worth it at all,
Yury, I believe you have a set of performance tests for bitmaps, perhaps you
can give a try and see if we got a benefit here.
Yet, not all architectures have custom implementation of the optimized
memset()/memcpy(). But at least ARM, x86 do.
> all those new '#if BITS_PER_LONG == 64'
> suggests that we should instead have a new helper memset_long(), no?
>
> In fact, string.h already has that:
>
> static inline void *memset_l(unsigned long *p, unsigned long v,
> __kernel_size_t n)
My grep skills suddenly degraded, my thoughts were exactly like yours,
but I missed the _l instead of _long for which I grepped.
...
> Please just spell an all-ones long "~0UL", that also matches the
> small_const case.
OK!
...
> > - memset(&dst[lim - off], 0, off*sizeof(unsigned long));
> > + bitmap_zero(&dst[lim - off], off);
>
> This... can't be right.
Indeed, I have in mind memset_long() and on the half-way replaced it with
bitmap_zero().
> bitmap_zero() still takes an argument which is
> the size in bits, while off is the whole number of words to shift. So if
> called with say shift=128, we'd have off==2, and that bitmap_zero()
> would, because bitmap_zero() rounds up to a whole number of words, end
> up clearing just one word.
>
> Perhaps a chance to add some more test cases? Maybe we're not exercising
> any of the "shift more than BITS_PER_LONG" logic.
>
> But rather than using bitmap_zero() here, forcing you to convert off to
> a number of bits and then bitmap_zero to divide again, if you do this at
> all, just change that memset() to memset_l().
Agree.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v1 1/2] bitmap: Use constants and macros from bits.h
2023-08-18 8:50 ` Andy Shevchenko
@ 2023-08-18 9:13 ` Rasmus Villemoes
0 siblings, 0 replies; 8+ messages in thread
From: Rasmus Villemoes @ 2023-08-18 9:13 UTC (permalink / raw)
To: Andy Shevchenko; +Cc: Yury Norov, linux-kernel
On 18/08/2023 10.50, Andy Shevchenko wrote:
> On Fri, Aug 18, 2023 at 08:28:21AM +0200, Rasmus Villemoes wrote:
>> On 17/08/2023 18.54, Andy Shevchenko wrote:
>
>>> #ifdef __LITTLE_ENDIAN
>>> -#define BITMAP_MEM_ALIGNMENT 8
>>> +#define BITMAP_MEM_ALIGNMENT BITS_PER_BYTE
>>> #else
>>> -#define BITMAP_MEM_ALIGNMENT (8 * sizeof(unsigned long))
>>> +#define BITMAP_MEM_ALIGNMENT BITS_PER_LONG
>>> #endif
>>> #define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1)
>
> What about this chunk? Does it worth to be updated?
IMHO, no, not in this way anyway. But the macros could perhaps use a
comment saying that they decide whether certain bitmap operations can be
turned into plain memxxx calls, which is why LE vs BE matters, and LE
just needs the bitmap to consist of a whole number of bytes while for BE
it must be a whole number of longs.
Rasmus
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-08-18 9:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-17 16:54 [rfc, rft, PATCH v1 0/2] bitmap: clean up and (micro?)optimization Andy Shevchenko
2023-08-17 16:54 ` [PATCH v1 1/2] bitmap: Use constants and macros from bits.h Andy Shevchenko
2023-08-18 6:28 ` Rasmus Villemoes
2023-08-18 8:50 ` Andy Shevchenko
2023-08-18 9:13 ` Rasmus Villemoes
2023-08-17 16:54 ` [PATCH v1 2/2] bitmap: Optimize memset() calls Andy Shevchenko
2023-08-18 6:53 ` Rasmus Villemoes
2023-08-18 8:55 ` Andy Shevchenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox