* [PATCH v2] lib: optimize cpumask_next_and()
2017-10-26 12:58 [PATCH v3] lib: optimize cpumask_next_and() Alexey Dobriyan
@ 2017-10-26 15:52 ` Clement Courbet
2017-10-28 13:24 ` [PATCH v3] " Yury Norov
1 sibling, 0 replies; 12+ messages in thread
From: Clement Courbet @ 2017-10-26 15:52 UTC (permalink / raw)
To: Arnd Bergmann, Rasmus Villemoes, Andrew Morton, Matthew Wilcox,
Yury Norov
Cc: Clement Courbet, Ingo Molnar, linux-arch, linux-kernel
Hi Alexey,
> Gentoo ships 5.4.0 which doesn't inline this code on x86_64 defconfig
> (which has OPTIMIZE_INLINING).
I have not actually marked _find_next_bit() inline, it just turns out
that my compiler inlines it.
I've tried out marking the function inline and OPTIMIZE_INLINING does
not un-inline it. I'll send a v4 with an explicit inine.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] lib: optimize cpumask_next_and()
2017-10-26 12:58 [PATCH v3] lib: optimize cpumask_next_and() Alexey Dobriyan
2017-10-26 15:52 ` [PATCH v2] " Clement Courbet
@ 2017-10-28 13:24 ` Yury Norov
1 sibling, 0 replies; 12+ messages in thread
From: Yury Norov @ 2017-10-28 13:24 UTC (permalink / raw)
To: Alexey Dobriyan
Cc: courbet, arnd, linux, akpm, mawilcox, mingo, linux-arch,
linux-kernel
On Thu, Oct 26, 2017 at 02:58:00PM +0200, Alexey Dobriyan wrote:
> > - Refactored _find_next_common_bit into _find_next_bit., as suggested
> > by Yury Norov. This has no adverse effects on the performance side,
> > as the compiler successfully inlines the code.
>
> 1)
> Gentoo ships 5.4.0 which doesn't inline this code on x86_64 defconfig
> (which has OPTIMIZE_INLINING).
>
>
> ffffffff813556c0 <find_next_bit>:
> ffffffff813556c0: 55 push rbp
> ffffffff813556c1: 48 89 d1 mov rcx,rdx
> ffffffff813556c4: 45 31 c0 xor r8d,r8d
> ffffffff813556c7: 48 89 f2 mov rdx,rsi
> ffffffff813556ca: 31 f6 xor esi,esi
> ffffffff813556cc: 48 89 e5 mov rbp,rsp
> ffffffff813556cf: e8 7c ff ff ff call
> ffffffff81355650 <_find_next_bit>
> ffffffff813556d4: 5d pop rbp
> ffffffff813556d5: c3 ret
GCC 7 for ARM64 doesn't inline as well. I wrote test for it to measure
the effect of inlining:
http://www.spinics.net/lists/kernel/msg2635338.html
The performance impact of this patch without inlining:
Before:
[ 96.856195] Start testing find_bit() with random-filled bitmap
[ 96.868322] find_next_bit: 34529 cycles, 16304 iterations
[ 96.879525] find_next_zero_bit: 35771 cycles, 16465 iterations
[ 96.891409] find_last_bit: 17444 cycles, 16304 iterations
[ 96.914445] find_first_bit: 1219671 cycles, 16305 iterations
[ 96.925802] Start testing find_bit() with sparse bitmap
[ 96.936308] find_next_bit: 301 cycles, 66 iterations
[ 96.946981] find_next_zero_bit: 70897 cycles, 32703 iterations
[ 96.958694] find_last_bit: 286 cycles, 66 iterations
[ 96.968710] find_first_bit: 5260 cycles, 66 iterations
After:
[ 116.205594] Start testing find_bit() with random-filled bitmap
[ 116.217621] find_next_bit: 24979 cycles, 16449 iterations
[ 116.228719] find_next_zero_bit: 25666 cycles, 16320 iterations
[ 116.240620] find_last_bit: 19407 cycles, 16449 iterations
[ 116.268368] find_first_bit: 1690945 cycles, 16449 iterations
[ 116.279718] Start testing find_bit() with sparse bitmap
[ 116.290219] find_next_bit: 352 cycles, 66 iterations
[ 116.300692] find_next_zero_bit: 50916 cycles, 32703 iterations
[ 116.312400] find_last_bit: 295 cycles, 66 iterations
[ 116.322427] find_first_bit: 6742 cycles, 66 iterations
And with inlining:
Before:
[ 169.464229] Start testing find_bit() with random-filled bitmap
[ 169.476191] find_next_bit: 17520 cycles, 16336 iterations
[ 169.487210] find_next_zero_bit: 17622 cycles, 16433 iterations
[ 169.499111] find_last_bit: 19272 cycles, 16335 iterations
[ 169.519735] find_first_bit: 978657 cycles, 16337 iterations
[ 169.530912] Start testing find_bit() with sparse bitmap
[ 169.541414] find_next_bit: 252 cycles, 66 iterations
[ 169.551726] find_next_zero_bit: 34554 cycles, 32703 iterations
[ 169.563436] find_last_bit: 294 cycles, 66 iterations
[ 169.573439] find_first_bit: 3964 cycles, 66 iterations
After
[ 191.191170] Start testing find_bit() with random-filled bitmap
[ 191.203133] find_next_bit: 17530 cycles, 16346 iterations
[ 191.214150] find_next_zero_bit: 17630 cycles, 16423 iterations
[ 191.226037] find_last_bit: 17489 cycles, 16347 iterations
[ 191.246672] find_first_bit: 979961 cycles, 16347 iterations
[ 191.257849] Start testing find_bit() with sparse bitmap
[ 191.268351] find_next_bit: 257 cycles, 66 iterations
[ 191.278660] find_next_zero_bit: 34547 cycles, 32703 iterations
[ 191.290370] find_last_bit: 292 cycles, 66 iterations
[ 191.300376] find_first_bit: 4269 cycles, 66 iterations
I didn't investigate why non-inlined version of this patch works
faster than vanilla code, but inlined one is even faster and is
as fast as inlined version of existing code. I think, we should
come with it finally.
It would be great if someone test it on x86.
> 2)
> Making "and" operation to be centerpiece of this code is kind of meh
> find_next_or_bit() will be hard to implement.
Not so hard actually. :)
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1521775.html
Yury
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH v2] lib: optimize cpumask_next_and()
2017-10-24 10:51 [PATCH v2] " Clement Courbet
@ 2017-10-25 7:28 ` Matthew Wilcox
2017-10-25 7:28 ` Matthew Wilcox
2017-10-25 14:59 ` Yury Norov
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2017-10-25 7:28 UTC (permalink / raw)
To: Clement Courbet, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
Yury Norov
Cc: Ingo Molnar, linux-arch@vger.kernel.org,
linux-kernel@vger.kernel.org
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
> -----Original Message-----
> From: Clement Courbet [mailto:courbet@google.com]
> Sent: Tuesday, October 24, 2017 6:52 AM
> To: Arnd Bergmann <arnd@arndb.de>; Rasmus Villemoes
> <linux@rasmusvillemoes.dk>; Andrew Morton <akpm@linux-foundation.org>;
> Matthew Wilcox <mawilcox@microsoft.com>; Yury Norov
> <ynorov@caviumnetworks.com>
> Cc: Clement Courbet <courbet@google.com>; Ingo Molnar
> <mingo@kernel.org>; linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH v2] lib: optimize cpumask_next_and()
>
> We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
> It's essentially a joined iteration in search for a non-zero bit, which
> is currently implemented as a lookup join (find a nonzero bit on the
> lhs, lookup the rhs to see if it's set there).
>
> Implement a direct join (find a nonzero bit on the incrementally built
> join). Direct benchmarking shows that it's 1.17x to 14x faster with a
> geometric mean of 2.1 on 32 CPUs. No impact on memory usage.
>
> Approximate benchmark code:
>
> ```
> unsigned long src1p[nr_cpumask_longs] = {pattern1};
> unsigned long src2p[nr_cpumask_longs] = {pattern2};
> for (/*a bunch of repetitions*/) {
> for (int n = -1; n <= nr_cpu_ids; ++n) {
> asm volatile("" : "+rm"(src1p)); // prevent any optimization
> asm volatile("" : "+rm"(src2p));
> unsigned long result = cpumask_next_and(n, src1p, src2p);
> asm volatile("" : "+rm"(result));
> }
> }
> ```
>
> Results:
> pattern1 pattern2 time_before/time_after
> 0x0000ffff 0x0000ffff 1.65
> 0x0000ffff 0x00005555 2.24
> 0x0000ffff 0x00001111 2.94
> 0x0000ffff 0x00000000 14.0
> 0x00005555 0x0000ffff 1.67
> 0x00005555 0x00005555 1.71
> 0x00005555 0x00001111 1.90
> 0x00005555 0x00000000 6.58
> 0x00001111 0x0000ffff 1.46
> 0x00001111 0x00005555 1.49
> 0x00001111 0x00001111 1.45
> 0x00001111 0x00000000 3.10
> 0x00000000 0x0000ffff 1.18
> 0x00000000 0x00005555 1.18
> 0x00000000 0x00001111 1.17
> 0x00000000 0x00000000 1.25
> -----------------------------
> geo.mean 2.06
>
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
> Changes in v2:
> - Refactored _find_next_common_bit into _find_next_bit., as suggested
> by Yury Norov. This has no adverse effects on the performance side,
> as the compiler successfully inlines the code.
>
> include/asm-generic/bitops/find.h | 16 ++++++++++++++
> include/linux/bitmap.h | 2 ++
> lib/cpumask.c | 9 ++++----
> lib/find_bit.c | 37 +++++++++++++++++++++++++--------
> tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
> 5 files changed, 67 insertions(+), 13 deletions(-)
>
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-
> generic/bitops/find.h
> index 998d4d544f18..130962f3a264 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long
> *addr, unsigned long
> size, unsigned long offset);
> #endif
>
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long size,
> + unsigned long offset);
> +#endif
> +
> #ifndef find_next_zero_bit
> /**
> * find_next_zero_bit - find the next cleared bit in a memory region
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f67118..b4606bfda52f 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -77,6 +77,8 @@
> * find_first_bit(addr, nbits) Position first set bit in *addr
> * find_next_zero_bit(addr, nbits, bit) Position next zero bit in *addr >= bit
> * find_next_bit(addr, nbits, bit) Position next set bit in *addr >= bit
> + * find_next_and_bit(addr1, addr2, nbits, bit) Same as find_first_bit, but in
> + * (*addr1 & *addr2)
> */
>
> /*
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8b1a1bd77539..5602223837fa 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
> int cpumask_next_and(int n, const struct cpumask *src1p,
> const struct cpumask *src2p)
> {
> - while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
> - if (cpumask_test_cpu(n, src2p))
> - break;
> - return n;
> + /* -1 is a legal arg here. */
> + if (n != -1)
> + cpumask_check(n);
> + return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> + nr_cpumask_bits, n + 1);
> }
> EXPORT_SYMBOL(cpumask_next_and);
>
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 6ed74f78380c..ebc08fd9fdf8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -24,19 +24,25 @@
> #if !defined(find_next_bit) || !defined(find_next_zero_bit)
>
> /*
> - * This is a common helper function for find_next_bit and
> - * find_next_zero_bit. The difference is the "invert" argument, which
> - * is XORed with each fetched word before searching it for one bits.
> + * This is a common helper function for find_next_bit, find_next_zero_bit, and
> + * find_next_and_bit. The differences are:
> + * - The "invert" argument, which is XORed with each fetched word before
> + * searching it for one bits.
> + * - The optional "addr2", which is anded with "addr1" if present.
> */
> -static unsigned long _find_next_bit(const unsigned long *addr,
> - unsigned long nbits, unsigned long start, unsigned long invert)
> +static unsigned long _find_next_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long nbits,
> + unsigned long start, unsigned long invert)
> {
> unsigned long tmp;
>
> if (unlikely(start >= nbits))
> return nbits;
>
> - tmp = addr[start / BITS_PER_LONG] ^ invert;
> + tmp = addr1[start / BITS_PER_LONG];
> + if (addr2)
> + tmp &= addr2[start / BITS_PER_LONG];
> + tmp ^= invert;
>
> /* Handle 1st word. */
> tmp &= BITMAP_FIRST_WORD_MASK(start);
> @@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long
> *addr,
> if (start >= nbits)
> return nbits;
>
> - tmp = addr[start / BITS_PER_LONG] ^ invert;
> + tmp = addr1[start / BITS_PER_LONG];
> + if (addr2)
> + tmp &= addr2[start / BITS_PER_LONG];
> + tmp ^= invert;
> }
>
> return min(start + __ffs(tmp), nbits);
> @@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long
> *addr,
> unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
> unsigned long offset)
> {
> - return _find_next_bit(addr, size, offset, 0UL);
> + return _find_next_bit(addr, NULL, size, offset, 0UL);
> }
> EXPORT_SYMBOL(find_next_bit);
> #endif
> @@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
> unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long
> size,
> unsigned long offset)
> {
> - return _find_next_bit(addr, size, offset, ~0UL);
> + return _find_next_bit(addr, NULL, size, offset, ~0UL);
> }
> EXPORT_SYMBOL(find_next_zero_bit);
> #endif
>
> +#if !defined(find_next_and_bit)
> +unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long nbits,
> + unsigned long start)
> +{
> + return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> +}
> +EXPORT_SYMBOL(find_next_and_bit);
> +#endif
> +
> #ifndef find_first_bit
> /*
> * Find the first set bit in a memory region.
> diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-
> generic/bitops/find.h
> index 5538ecdc964a..435c7d002f33 100644
> --- a/tools/include/asm-generic/bitops/find.h
> +++ b/tools/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long
> *addr, unsigned long
> size, unsigned long offset);
> #endif
>
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long size,
> + unsigned long offset);
> +#endif
> +
> #ifndef find_next_zero_bit
>
> /**
> --
> 2.15.0.rc0.271.g36b669edcc-goog
^ permalink raw reply [flat|nested] 12+ messages in thread* RE: [PATCH v2] lib: optimize cpumask_next_and()
2017-10-25 7:28 ` Matthew Wilcox
@ 2017-10-25 7:28 ` Matthew Wilcox
0 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2017-10-25 7:28 UTC (permalink / raw)
To: Clement Courbet, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
Yury Norov
Cc: Ingo Molnar, linux-arch@vger.kernel.org,
linux-kernel@vger.kernel.org
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
> -----Original Message-----
> From: Clement Courbet [mailto:courbet@google.com]
> Sent: Tuesday, October 24, 2017 6:52 AM
> To: Arnd Bergmann <arnd@arndb.de>; Rasmus Villemoes
> <linux@rasmusvillemoes.dk>; Andrew Morton <akpm@linux-foundation.org>;
> Matthew Wilcox <mawilcox@microsoft.com>; Yury Norov
> <ynorov@caviumnetworks.com>
> Cc: Clement Courbet <courbet@google.com>; Ingo Molnar
> <mingo@kernel.org>; linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH v2] lib: optimize cpumask_next_and()
>
> We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
> It's essentially a joined iteration in search for a non-zero bit, which
> is currently implemented as a lookup join (find a nonzero bit on the
> lhs, lookup the rhs to see if it's set there).
>
> Implement a direct join (find a nonzero bit on the incrementally built
> join). Direct benchmarking shows that it's 1.17x to 14x faster with a
> geometric mean of 2.1 on 32 CPUs. No impact on memory usage.
>
> Approximate benchmark code:
>
> ```
> unsigned long src1p[nr_cpumask_longs] = {pattern1};
> unsigned long src2p[nr_cpumask_longs] = {pattern2};
> for (/*a bunch of repetitions*/) {
> for (int n = -1; n <= nr_cpu_ids; ++n) {
> asm volatile("" : "+rm"(src1p)); // prevent any optimization
> asm volatile("" : "+rm"(src2p));
> unsigned long result = cpumask_next_and(n, src1p, src2p);
> asm volatile("" : "+rm"(result));
> }
> }
> ```
>
> Results:
> pattern1 pattern2 time_before/time_after
> 0x0000ffff 0x0000ffff 1.65
> 0x0000ffff 0x00005555 2.24
> 0x0000ffff 0x00001111 2.94
> 0x0000ffff 0x00000000 14.0
> 0x00005555 0x0000ffff 1.67
> 0x00005555 0x00005555 1.71
> 0x00005555 0x00001111 1.90
> 0x00005555 0x00000000 6.58
> 0x00001111 0x0000ffff 1.46
> 0x00001111 0x00005555 1.49
> 0x00001111 0x00001111 1.45
> 0x00001111 0x00000000 3.10
> 0x00000000 0x0000ffff 1.18
> 0x00000000 0x00005555 1.18
> 0x00000000 0x00001111 1.17
> 0x00000000 0x00000000 1.25
> -----------------------------
> geo.mean 2.06
>
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
> Changes in v2:
> - Refactored _find_next_common_bit into _find_next_bit., as suggested
> by Yury Norov. This has no adverse effects on the performance side,
> as the compiler successfully inlines the code.
>
> include/asm-generic/bitops/find.h | 16 ++++++++++++++
> include/linux/bitmap.h | 2 ++
> lib/cpumask.c | 9 ++++----
> lib/find_bit.c | 37 +++++++++++++++++++++++++--------
> tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
> 5 files changed, 67 insertions(+), 13 deletions(-)
>
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-
> generic/bitops/find.h
> index 998d4d544f18..130962f3a264 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long
> *addr, unsigned long
> size, unsigned long offset);
> #endif
>
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long size,
> + unsigned long offset);
> +#endif
> +
> #ifndef find_next_zero_bit
> /**
> * find_next_zero_bit - find the next cleared bit in a memory region
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f67118..b4606bfda52f 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -77,6 +77,8 @@
> * find_first_bit(addr, nbits) Position first set bit in *addr
> * find_next_zero_bit(addr, nbits, bit) Position next zero bit in *addr >= bit
> * find_next_bit(addr, nbits, bit) Position next set bit in *addr >= bit
> + * find_next_and_bit(addr1, addr2, nbits, bit) Same as find_first_bit, but in
> + * (*addr1 & *addr2)
> */
>
> /*
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8b1a1bd77539..5602223837fa 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
> int cpumask_next_and(int n, const struct cpumask *src1p,
> const struct cpumask *src2p)
> {
> - while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
> - if (cpumask_test_cpu(n, src2p))
> - break;
> - return n;
> + /* -1 is a legal arg here. */
> + if (n != -1)
> + cpumask_check(n);
> + return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> + nr_cpumask_bits, n + 1);
> }
> EXPORT_SYMBOL(cpumask_next_and);
>
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 6ed74f78380c..ebc08fd9fdf8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -24,19 +24,25 @@
> #if !defined(find_next_bit) || !defined(find_next_zero_bit)
>
> /*
> - * This is a common helper function for find_next_bit and
> - * find_next_zero_bit. The difference is the "invert" argument, which
> - * is XORed with each fetched word before searching it for one bits.
> + * This is a common helper function for find_next_bit, find_next_zero_bit, and
> + * find_next_and_bit. The differences are:
> + * - The "invert" argument, which is XORed with each fetched word before
> + * searching it for one bits.
> + * - The optional "addr2", which is anded with "addr1" if present.
> */
> -static unsigned long _find_next_bit(const unsigned long *addr,
> - unsigned long nbits, unsigned long start, unsigned long invert)
> +static unsigned long _find_next_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long nbits,
> + unsigned long start, unsigned long invert)
> {
> unsigned long tmp;
>
> if (unlikely(start >= nbits))
> return nbits;
>
> - tmp = addr[start / BITS_PER_LONG] ^ invert;
> + tmp = addr1[start / BITS_PER_LONG];
> + if (addr2)
> + tmp &= addr2[start / BITS_PER_LONG];
> + tmp ^= invert;
>
> /* Handle 1st word. */
> tmp &= BITMAP_FIRST_WORD_MASK(start);
> @@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long
> *addr,
> if (start >= nbits)
> return nbits;
>
> - tmp = addr[start / BITS_PER_LONG] ^ invert;
> + tmp = addr1[start / BITS_PER_LONG];
> + if (addr2)
> + tmp &= addr2[start / BITS_PER_LONG];
> + tmp ^= invert;
> }
>
> return min(start + __ffs(tmp), nbits);
> @@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long
> *addr,
> unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
> unsigned long offset)
> {
> - return _find_next_bit(addr, size, offset, 0UL);
> + return _find_next_bit(addr, NULL, size, offset, 0UL);
> }
> EXPORT_SYMBOL(find_next_bit);
> #endif
> @@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
> unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long
> size,
> unsigned long offset)
> {
> - return _find_next_bit(addr, size, offset, ~0UL);
> + return _find_next_bit(addr, NULL, size, offset, ~0UL);
> }
> EXPORT_SYMBOL(find_next_zero_bit);
> #endif
>
> +#if !defined(find_next_and_bit)
> +unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long nbits,
> + unsigned long start)
> +{
> + return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> +}
> +EXPORT_SYMBOL(find_next_and_bit);
> +#endif
> +
> #ifndef find_first_bit
> /*
> * Find the first set bit in a memory region.
> diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-
> generic/bitops/find.h
> index 5538ecdc964a..435c7d002f33 100644
> --- a/tools/include/asm-generic/bitops/find.h
> +++ b/tools/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long
> *addr, unsigned long
> size, unsigned long offset);
> #endif
>
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long size,
> + unsigned long offset);
> +#endif
> +
> #ifndef find_next_zero_bit
>
> /**
> --
> 2.15.0.rc0.271.g36b669edcc-goog
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] lib: optimize cpumask_next_and()
2017-10-24 10:51 [PATCH v2] " Clement Courbet
2017-10-25 7:28 ` Matthew Wilcox
@ 2017-10-25 14:59 ` Yury Norov
2017-10-25 14:59 ` Yury Norov
2017-10-26 12:12 ` kbuild test robot
2017-10-26 12:22 ` kbuild test robot
3 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2017-10-25 14:59 UTC (permalink / raw)
To: Clement Courbet
Cc: Arnd Bergmann, Rasmus Villemoes, Andrew Morton, Matthew Wilcox,
Ingo Molnar, linux-arch, linux-kernel
On Tue, Oct 24, 2017 at 12:51:59PM +0200, Clement Courbet wrote:
> We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
> It's essentially a joined iteration in search for a non-zero bit, which
> is currently implemented as a lookup join (find a nonzero bit on the
> lhs, lookup the rhs to see if it's set there).
>
> Implement a direct join (find a nonzero bit on the incrementally built
> join). Direct benchmarking shows that it's 1.17x to 14x faster with a
> geometric mean of 2.1 on 32 CPUs. No impact on memory usage.
>
> Approximate benchmark code:
>
> ```
> unsigned long src1p[nr_cpumask_longs] = {pattern1};
> unsigned long src2p[nr_cpumask_longs] = {pattern2};
> for (/*a bunch of repetitions*/) {
> for (int n = -1; n <= nr_cpu_ids; ++n) {
> asm volatile("" : "+rm"(src1p)); // prevent any optimization
> asm volatile("" : "+rm"(src2p));
> unsigned long result = cpumask_next_and(n, src1p, src2p);
> asm volatile("" : "+rm"(result));
> }
> }
> ```
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
> Changes in v2:
> - Refactored _find_next_common_bit into _find_next_bit., as suggested
> by Yury Norov.
What I actually suggested is make _find_next_and_bit() similar to
_find_next_bit(), not to extend _find_next_bit(). But what you did
looks OK.
> This has no adverse effects on the performance side,
> as the compiler successfully inlines the code.
I think it's not about inlining, compiler just optimizes out branches known
as false at compile-time.
> include/asm-generic/bitops/find.h | 16 ++++++++++++++
> include/linux/bitmap.h | 2 ++
> lib/cpumask.c | 9 ++++----
> lib/find_bit.c | 37 +++++++++++++++++++++++++--------
> tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
> 5 files changed, 67 insertions(+), 13 deletions(-)
>
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
> index 998d4d544f18..130962f3a264 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
> size, unsigned long offset);
> #endif
>
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long size,
> + unsigned long offset);
> +#endif
> +
> #ifndef find_next_zero_bit
> /**
> * find_next_zero_bit - find the next cleared bit in a memory region
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f67118..b4606bfda52f 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -77,6 +77,8 @@
> * find_first_bit(addr, nbits) Position first set bit in *addr
> * find_next_zero_bit(addr, nbits, bit) Position next zero bit in *addr >= bit
> * find_next_bit(addr, nbits, bit) Position next set bit in *addr >= bit
> + * find_next_and_bit(addr1, addr2, nbits, bit) Same as find_first_bit, but in
> + * (*addr1 & *addr2)
> */
>
> /*
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8b1a1bd77539..5602223837fa 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
> int cpumask_next_and(int n, const struct cpumask *src1p,
> const struct cpumask *src2p)
> {
> - while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
> - if (cpumask_test_cpu(n, src2p))
> - break;
> - return n;
> + /* -1 is a legal arg here. */
> + if (n != -1)
> + cpumask_check(n);
> + return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> + nr_cpumask_bits, n + 1);
> }
> EXPORT_SYMBOL(cpumask_next_and);
>
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 6ed74f78380c..ebc08fd9fdf8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -24,19 +24,25 @@
> #if !defined(find_next_bit) || !defined(find_next_zero_bit)
>
> /*
> - * This is a common helper function for find_next_bit and
> - * find_next_zero_bit. The difference is the "invert" argument, which
> - * is XORed with each fetched word before searching it for one bits.
> + * This is a common helper function for find_next_bit, find_next_zero_bit, and
> + * find_next_and_bit. The differences are:
> + * - The "invert" argument, which is XORed with each fetched word before
> + * searching it for one bits.
> + * - The optional "addr2", which is anded with "addr1" if present.
> */
> -static unsigned long _find_next_bit(const unsigned long *addr,
> - unsigned long nbits, unsigned long start, unsigned long invert)
> +static unsigned long _find_next_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long nbits,
> + unsigned long start, unsigned long invert)
> {
> unsigned long tmp;
>
> if (unlikely(start >= nbits))
> return nbits;
>
> - tmp = addr[start / BITS_PER_LONG] ^ invert;
> + tmp = addr1[start / BITS_PER_LONG];
> + if (addr2)
> + tmp &= addr2[start / BITS_PER_LONG];
> + tmp ^= invert;
>
> /* Handle 1st word. */
> tmp &= BITMAP_FIRST_WORD_MASK(start);
> @@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long *addr,
> if (start >= nbits)
> return nbits;
>
> - tmp = addr[start / BITS_PER_LONG] ^ invert;
> + tmp = addr1[start / BITS_PER_LONG];
> + if (addr2)
> + tmp &= addr2[start / BITS_PER_LONG];
> + tmp ^= invert;
> }
>
> return min(start + __ffs(tmp), nbits);
> @@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long *addr,
> unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
> unsigned long offset)
> {
> - return _find_next_bit(addr, size, offset, 0UL);
> + return _find_next_bit(addr, NULL, size, offset, 0UL);
> }
> EXPORT_SYMBOL(find_next_bit);
> #endif
> @@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
> unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
> unsigned long offset)
> {
> - return _find_next_bit(addr, size, offset, ~0UL);
> + return _find_next_bit(addr, NULL, size, offset, ~0UL);
> }
> EXPORT_SYMBOL(find_next_zero_bit);
> #endif
>
> +#if !defined(find_next_and_bit)
> +unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long nbits,
> + unsigned long start)
It should be:
unsigned long find_next_and_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long size,
unsigned long offset)
> +{
> + return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> +}
> +EXPORT_SYMBOL(find_next_and_bit);
> +#endif
> +
> #ifndef find_first_bit
> /*
> * Find the first set bit in a memory region.
If we continue this way, we'll most probably end up like this,
sooner or later:
diff --git a/lib/find_bit.c b/lib/find_bit.c
index ebc08fd9fdf8..1b0b4aedc93a 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -31,8 +31,12 @@
* - The optional "addr2", which is anded with "addr1" if present.
*/
static unsigned long _find_next_bit(const unsigned long *addr1,
- const unsigned long *addr2, unsigned long nbits,
- unsigned long start, unsigned long invert)
+ const unsigned long *and,
+ const unsigned long *or,
+ const unsigned long *xor,
+ unsigned long nbits,
+ unsigned long start,
+ unsigned long invert)
{
unsigned long tmp;
@@ -40,8 +44,12 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
return nbits;
tmp = addr1[start / BITS_PER_LONG];
- if (addr2)
- tmp &= addr2[start / BITS_PER_LONG];
+ if (and)
+ tmp &= and[start / BITS_PER_LONG];
+ if (or)
+ tmp |= or[start / BITS_PER_LONG];
+ if (xor)
+ tmp ^= xor[start / BITS_PER_LONG];
tmp ^= invert;
/* Handle 1st word. */
@@ -54,8 +62,12 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
return nbits;
tmp = addr1[start / BITS_PER_LONG];
- if (addr2)
- tmp &= addr2[start / BITS_PER_LONG];
+ if (and)
+ tmp &= and[start / BITS_PER_LONG];
+ if (or)
+ tmp |= or[start / BITS_PER_LONG];
+ if (xor)
+ tmp ^= xor[start / BITS_PER_LONG];
tmp ^= invert;
}
Just a fantasy of course.
I'm generally fine to proceed this way. It makes _find_next_bit() more
complex, but lets us avoid code duplication, which is better deal for
long-term maintenance.
But I'd like also to keep _find_next_bit() consistent with _find_next_bit_le()
Could you please send v3 with fixed find_next_and_bit() declaration,
and synced LE routines?
Yury
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v2] lib: optimize cpumask_next_and()
2017-10-25 14:59 ` Yury Norov
@ 2017-10-25 14:59 ` Yury Norov
0 siblings, 0 replies; 12+ messages in thread
From: Yury Norov @ 2017-10-25 14:59 UTC (permalink / raw)
To: Clement Courbet
Cc: Arnd Bergmann, Rasmus Villemoes, Andrew Morton, Matthew Wilcox,
Ingo Molnar, linux-arch, linux-kernel
On Tue, Oct 24, 2017 at 12:51:59PM +0200, Clement Courbet wrote:
> We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
> It's essentially a joined iteration in search for a non-zero bit, which
> is currently implemented as a lookup join (find a nonzero bit on the
> lhs, lookup the rhs to see if it's set there).
>
> Implement a direct join (find a nonzero bit on the incrementally built
> join). Direct benchmarking shows that it's 1.17x to 14x faster with a
> geometric mean of 2.1 on 32 CPUs. No impact on memory usage.
>
> Approximate benchmark code:
>
> ```
> unsigned long src1p[nr_cpumask_longs] = {pattern1};
> unsigned long src2p[nr_cpumask_longs] = {pattern2};
> for (/*a bunch of repetitions*/) {
> for (int n = -1; n <= nr_cpu_ids; ++n) {
> asm volatile("" : "+rm"(src1p)); // prevent any optimization
> asm volatile("" : "+rm"(src2p));
> unsigned long result = cpumask_next_and(n, src1p, src2p);
> asm volatile("" : "+rm"(result));
> }
> }
> ```
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
> Changes in v2:
> - Refactored _find_next_common_bit into _find_next_bit., as suggested
> by Yury Norov.
What I actually suggested is make _find_next_and_bit() similar to
_find_next_bit(), not to extend _find_next_bit(). But what you did
looks OK.
> This has no adverse effects on the performance side,
> as the compiler successfully inlines the code.
I think it's not about inlining, compiler just optimizes out branches known
as false at compile-time.
> include/asm-generic/bitops/find.h | 16 ++++++++++++++
> include/linux/bitmap.h | 2 ++
> lib/cpumask.c | 9 ++++----
> lib/find_bit.c | 37 +++++++++++++++++++++++++--------
> tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
> 5 files changed, 67 insertions(+), 13 deletions(-)
>
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
> index 998d4d544f18..130962f3a264 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
> size, unsigned long offset);
> #endif
>
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long size,
> + unsigned long offset);
> +#endif
> +
> #ifndef find_next_zero_bit
> /**
> * find_next_zero_bit - find the next cleared bit in a memory region
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f67118..b4606bfda52f 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -77,6 +77,8 @@
> * find_first_bit(addr, nbits) Position first set bit in *addr
> * find_next_zero_bit(addr, nbits, bit) Position next zero bit in *addr >= bit
> * find_next_bit(addr, nbits, bit) Position next set bit in *addr >= bit
> + * find_next_and_bit(addr1, addr2, nbits, bit) Same as find_first_bit, but in
> + * (*addr1 & *addr2)
> */
>
> /*
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8b1a1bd77539..5602223837fa 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
> int cpumask_next_and(int n, const struct cpumask *src1p,
> const struct cpumask *src2p)
> {
> - while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
> - if (cpumask_test_cpu(n, src2p))
> - break;
> - return n;
> + /* -1 is a legal arg here. */
> + if (n != -1)
> + cpumask_check(n);
> + return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> + nr_cpumask_bits, n + 1);
> }
> EXPORT_SYMBOL(cpumask_next_and);
>
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 6ed74f78380c..ebc08fd9fdf8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -24,19 +24,25 @@
> #if !defined(find_next_bit) || !defined(find_next_zero_bit)
>
> /*
> - * This is a common helper function for find_next_bit and
> - * find_next_zero_bit. The difference is the "invert" argument, which
> - * is XORed with each fetched word before searching it for one bits.
> + * This is a common helper function for find_next_bit, find_next_zero_bit, and
> + * find_next_and_bit. The differences are:
> + * - The "invert" argument, which is XORed with each fetched word before
> + * searching it for one bits.
> + * - The optional "addr2", which is anded with "addr1" if present.
> */
> -static unsigned long _find_next_bit(const unsigned long *addr,
> - unsigned long nbits, unsigned long start, unsigned long invert)
> +static unsigned long _find_next_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long nbits,
> + unsigned long start, unsigned long invert)
> {
> unsigned long tmp;
>
> if (unlikely(start >= nbits))
> return nbits;
>
> - tmp = addr[start / BITS_PER_LONG] ^ invert;
> + tmp = addr1[start / BITS_PER_LONG];
> + if (addr2)
> + tmp &= addr2[start / BITS_PER_LONG];
> + tmp ^= invert;
>
> /* Handle 1st word. */
> tmp &= BITMAP_FIRST_WORD_MASK(start);
> @@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long *addr,
> if (start >= nbits)
> return nbits;
>
> - tmp = addr[start / BITS_PER_LONG] ^ invert;
> + tmp = addr1[start / BITS_PER_LONG];
> + if (addr2)
> + tmp &= addr2[start / BITS_PER_LONG];
> + tmp ^= invert;
> }
>
> return min(start + __ffs(tmp), nbits);
> @@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long *addr,
> unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
> unsigned long offset)
> {
> - return _find_next_bit(addr, size, offset, 0UL);
> + return _find_next_bit(addr, NULL, size, offset, 0UL);
> }
> EXPORT_SYMBOL(find_next_bit);
> #endif
> @@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
> unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
> unsigned long offset)
> {
> - return _find_next_bit(addr, size, offset, ~0UL);
> + return _find_next_bit(addr, NULL, size, offset, ~0UL);
> }
> EXPORT_SYMBOL(find_next_zero_bit);
> #endif
>
> +#if !defined(find_next_and_bit)
> +unsigned long find_next_and_bit(const unsigned long *addr1,
> + const unsigned long *addr2, unsigned long nbits,
> + unsigned long start)
It should be:
unsigned long find_next_and_bit(const unsigned long *addr1,
const unsigned long *addr2, unsigned long size,
unsigned long offset)
> +{
> + return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> +}
> +EXPORT_SYMBOL(find_next_and_bit);
> +#endif
> +
> #ifndef find_first_bit
> /*
> * Find the first set bit in a memory region.
If we continue this way, we'll most probably end up like this,
sooner or later:
diff --git a/lib/find_bit.c b/lib/find_bit.c
index ebc08fd9fdf8..1b0b4aedc93a 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -31,8 +31,12 @@
* - The optional "addr2", which is anded with "addr1" if present.
*/
static unsigned long _find_next_bit(const unsigned long *addr1,
- const unsigned long *addr2, unsigned long nbits,
- unsigned long start, unsigned long invert)
+ const unsigned long *and,
+ const unsigned long *or,
+ const unsigned long *xor,
+ unsigned long nbits,
+ unsigned long start,
+ unsigned long invert)
{
unsigned long tmp;
@@ -40,8 +44,12 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
return nbits;
tmp = addr1[start / BITS_PER_LONG];
- if (addr2)
- tmp &= addr2[start / BITS_PER_LONG];
+ if (and)
+ tmp &= and[start / BITS_PER_LONG];
+ if (or)
+ tmp |= or[start / BITS_PER_LONG];
+ if (xor)
+ tmp ^= xor[start / BITS_PER_LONG];
tmp ^= invert;
/* Handle 1st word. */
@@ -54,8 +62,12 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
return nbits;
tmp = addr1[start / BITS_PER_LONG];
- if (addr2)
- tmp &= addr2[start / BITS_PER_LONG];
+ if (and)
+ tmp &= and[start / BITS_PER_LONG];
+ if (or)
+ tmp |= or[start / BITS_PER_LONG];
+ if (xor)
+ tmp ^= xor[start / BITS_PER_LONG];
tmp ^= invert;
}
Just a fantasy of course.
I'm generally fine to proceed this way. It makes _find_next_bit() more
complex, but lets us avoid code duplication, which is better deal for
long-term maintenance.
But I'd like also to keep _find_next_bit() consistent with _find_next_bit_le()
Could you please send v3 with fixed find_next_and_bit() declaration,
and synced LE routines?
Yury
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2] lib: optimize cpumask_next_and()
2017-10-24 10:51 [PATCH v2] " Clement Courbet
2017-10-25 7:28 ` Matthew Wilcox
2017-10-25 14:59 ` Yury Norov
@ 2017-10-26 12:12 ` kbuild test robot
2017-10-26 12:12 ` kbuild test robot
2017-10-26 12:22 ` kbuild test robot
3 siblings, 1 reply; 12+ messages in thread
From: kbuild test robot @ 2017-10-26 12:12 UTC (permalink / raw)
Cc: kbuild-all, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
Matthew Wilcox, Yury Norov, Clement Courbet, Ingo Molnar,
linux-arch, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1751 bytes --]
Hi Clement,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Clement-Courbet/lib-optimize-cpumask_next_and/20171026-184850
config: i386-randconfig-x001-201743 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All error/warnings (new ones prefixed by >>):
lib/find_bit.c: In function 'find_next_and_bit':
>> lib/find_bit.c:92:38: error: 'size' undeclared (first use in this function)
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^~~~
lib/find_bit.c:92:38: note: each undeclared identifier is reported only once for each function it appears in
>> lib/find_bit.c:92:44: error: 'offset' undeclared (first use in this function)
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^~~~~~
>> lib/find_bit.c:93:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
vim +/size +92 lib/find_bit.c
86
87 #if !defined(find_next_and_bit)
88 unsigned long find_next_and_bit(const unsigned long *addr1,
89 const unsigned long *addr2, unsigned long nbits,
90 unsigned long start)
91 {
> 92 return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> 93 }
94 EXPORT_SYMBOL(find_next_and_bit);
95 #endif
96
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26064 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] lib: optimize cpumask_next_and()
2017-10-26 12:12 ` kbuild test robot
@ 2017-10-26 12:12 ` kbuild test robot
0 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2017-10-26 12:12 UTC (permalink / raw)
To: Clement Courbet
Cc: kbuild-all, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
Matthew Wilcox, Yury Norov, Ingo Molnar, linux-arch, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1751 bytes --]
Hi Clement,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Clement-Courbet/lib-optimize-cpumask_next_and/20171026-184850
config: i386-randconfig-x001-201743 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All error/warnings (new ones prefixed by >>):
lib/find_bit.c: In function 'find_next_and_bit':
>> lib/find_bit.c:92:38: error: 'size' undeclared (first use in this function)
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^~~~
lib/find_bit.c:92:38: note: each undeclared identifier is reported only once for each function it appears in
>> lib/find_bit.c:92:44: error: 'offset' undeclared (first use in this function)
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^~~~~~
>> lib/find_bit.c:93:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
vim +/size +92 lib/find_bit.c
86
87 #if !defined(find_next_and_bit)
88 unsigned long find_next_and_bit(const unsigned long *addr1,
89 const unsigned long *addr2, unsigned long nbits,
90 unsigned long start)
91 {
> 92 return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> 93 }
94 EXPORT_SYMBOL(find_next_and_bit);
95 #endif
96
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26064 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] lib: optimize cpumask_next_and()
2017-10-24 10:51 [PATCH v2] " Clement Courbet
` (2 preceding siblings ...)
2017-10-26 12:12 ` kbuild test robot
@ 2017-10-26 12:22 ` kbuild test robot
2017-10-26 12:22 ` kbuild test robot
3 siblings, 1 reply; 12+ messages in thread
From: kbuild test robot @ 2017-10-26 12:22 UTC (permalink / raw)
Cc: kbuild-all, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
Matthew Wilcox, Yury Norov, Clement Courbet, Ingo Molnar,
linux-arch, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]
Hi Clement,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Clement-Courbet/lib-optimize-cpumask_next_and/20171026-184850
config: m68k-allyesconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k
All errors (new ones prefixed by >>):
lib/find_bit.c: In function 'find_next_and_bit':
>> lib/find_bit.c:92:2: error: implicit declaration of function '_find_next_bit' [-Werror=implicit-function-declaration]
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^
lib/find_bit.c:92:38: error: 'size' undeclared (first use in this function)
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^
lib/find_bit.c:92:38: note: each undeclared identifier is reported only once for each function it appears in
lib/find_bit.c:92:44: error: 'offset' undeclared (first use in this function)
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^
lib/find_bit.c:93:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
cc1: some warnings being treated as errors
vim +/_find_next_bit +92 lib/find_bit.c
86
87 #if !defined(find_next_and_bit)
88 unsigned long find_next_and_bit(const unsigned long *addr1,
89 const unsigned long *addr2, unsigned long nbits,
90 unsigned long start)
91 {
> 92 return _find_next_bit(addr1, addr2, size, offset, ~0UL);
93 }
94 EXPORT_SYMBOL(find_next_and_bit);
95 #endif
96
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 44325 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v2] lib: optimize cpumask_next_and()
2017-10-26 12:22 ` kbuild test robot
@ 2017-10-26 12:22 ` kbuild test robot
0 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2017-10-26 12:22 UTC (permalink / raw)
To: Clement Courbet
Cc: kbuild-all, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
Matthew Wilcox, Yury Norov, Ingo Molnar, linux-arch, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]
Hi Clement,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Clement-Courbet/lib-optimize-cpumask_next_and/20171026-184850
config: m68k-allyesconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k
All errors (new ones prefixed by >>):
lib/find_bit.c: In function 'find_next_and_bit':
>> lib/find_bit.c:92:2: error: implicit declaration of function '_find_next_bit' [-Werror=implicit-function-declaration]
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^
lib/find_bit.c:92:38: error: 'size' undeclared (first use in this function)
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^
lib/find_bit.c:92:38: note: each undeclared identifier is reported only once for each function it appears in
lib/find_bit.c:92:44: error: 'offset' undeclared (first use in this function)
return _find_next_bit(addr1, addr2, size, offset, ~0UL);
^
lib/find_bit.c:93:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
cc1: some warnings being treated as errors
vim +/_find_next_bit +92 lib/find_bit.c
86
87 #if !defined(find_next_and_bit)
88 unsigned long find_next_and_bit(const unsigned long *addr1,
89 const unsigned long *addr2, unsigned long nbits,
90 unsigned long start)
91 {
> 92 return _find_next_bit(addr1, addr2, size, offset, ~0UL);
93 }
94 EXPORT_SYMBOL(find_next_and_bit);
95 #endif
96
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 44325 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread