Re: [PATCH v3] lib: optimize cpumask_next

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v3] lib: optimize cpumask_next_and()
@ 2017-10-26 12:58 Alexey Dobriyan
  2017-10-26 15:52 ` [PATCH v2] " Clement Courbet
  2017-10-28 13:24 ` [PATCH v3] " Yury Norov
  0 siblings, 2 replies; 12+ messages in thread
From: Alexey Dobriyan @ 2017-10-26 12:58 UTC (permalink / raw)
  To: courbet
  Cc: arnd, linux, akpm, mawilcox, ynorov, mingo, linux-arch,
	linux-kernel

>  - Refactored _find_next_common_bit into _find_next_bit., as suggested
>    by Yury Norov. This has no adverse effects on the performance side,
>    as the compiler successfully inlines the code.

1)
Gentoo ships 5.4.0 which doesn't inline this code on x86_64 defconfig
(which has OPTIMIZE_INLINING).


ffffffff813556c0 <find_next_bit>:
ffffffff813556c0:       55                      push   rbp
ffffffff813556c1:       48 89 d1                mov    rcx,rdx
ffffffff813556c4:       45 31 c0                xor    r8d,r8d
ffffffff813556c7:       48 89 f2                mov    rdx,rsi
ffffffff813556ca:       31 f6                   xor    esi,esi
ffffffff813556cc:       48 89 e5                mov    rbp,rsp
ffffffff813556cf:       e8 7c ff ff ff          call
ffffffff81355650 <_find_next_bit>
ffffffff813556d4:       5d                      pop    rbp
ffffffff813556d5:       c3                      ret

2)
Making "and" operation to be centerpiece of this code is kind of meh
find_next_or_bit() will be hard to implement.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-26 12:58 [PATCH v3] lib: optimize cpumask_next_and() Alexey Dobriyan
@ 2017-10-26 15:52 ` Clement Courbet
  2017-10-28 13:24 ` [PATCH v3] " Yury Norov
  1 sibling, 0 replies; 12+ messages in thread
From: Clement Courbet @ 2017-10-26 15:52 UTC (permalink / raw)
  To: Arnd Bergmann, Rasmus Villemoes, Andrew Morton, Matthew Wilcox,
	Yury Norov
  Cc: Clement Courbet, Ingo Molnar, linux-arch, linux-kernel

Hi Alexey,

> Gentoo ships 5.4.0 which doesn't inline this code on x86_64 defconfig
> (which has OPTIMIZE_INLINING).

I have not actually marked _find_next_bit() inline, it just turns out
that my compiler inlines it.
I've tried out marking the function inline and OPTIMIZE_INLINING does
not un-inline it. I'll send a v4 with an explicit inine.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] lib: optimize cpumask_next_and()
  2017-10-26 12:58 [PATCH v3] lib: optimize cpumask_next_and() Alexey Dobriyan
  2017-10-26 15:52 ` [PATCH v2] " Clement Courbet
@ 2017-10-28 13:24 ` Yury Norov
  1 sibling, 0 replies; 12+ messages in thread
From: Yury Norov @ 2017-10-28 13:24 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: courbet, arnd, linux, akpm, mawilcox, mingo, linux-arch,
	linux-kernel

On Thu, Oct 26, 2017 at 02:58:00PM +0200, Alexey Dobriyan wrote:
> >  - Refactored _find_next_common_bit into _find_next_bit., as suggested
> >    by Yury Norov. This has no adverse effects on the performance side,
> >    as the compiler successfully inlines the code.
> 
> 1)
> Gentoo ships 5.4.0 which doesn't inline this code on x86_64 defconfig
> (which has OPTIMIZE_INLINING).
> 
> 
> ffffffff813556c0 <find_next_bit>:
> ffffffff813556c0:       55                      push   rbp
> ffffffff813556c1:       48 89 d1                mov    rcx,rdx
> ffffffff813556c4:       45 31 c0                xor    r8d,r8d
> ffffffff813556c7:       48 89 f2                mov    rdx,rsi
> ffffffff813556ca:       31 f6                   xor    esi,esi
> ffffffff813556cc:       48 89 e5                mov    rbp,rsp
> ffffffff813556cf:       e8 7c ff ff ff          call
> ffffffff81355650 <_find_next_bit>
> ffffffff813556d4:       5d                      pop    rbp
> ffffffff813556d5:       c3                      ret

GCC 7 for ARM64 doesn't inline as well. I wrote test for it to measure
the effect of inlining:
http://www.spinics.net/lists/kernel/msg2635338.html

The performance impact of this patch without inlining: 

Before:
[   96.856195] Start testing find_bit() with random-filled bitmap
[   96.868322] find_next_bit: 34529 cycles, 16304 iterations
[   96.879525] find_next_zero_bit: 35771 cycles, 16465 iterations
[   96.891409] find_last_bit: 17444 cycles, 16304 iterations
[   96.914445] find_first_bit: 1219671 cycles, 16305 iterations
[   96.925802] Start testing find_bit() with sparse bitmap
[   96.936308] find_next_bit: 301 cycles, 66 iterations
[   96.946981] find_next_zero_bit: 70897 cycles, 32703 iterations
[   96.958694] find_last_bit: 286 cycles, 66 iterations
[   96.968710] find_first_bit: 5260 cycles, 66 iterations

After:
[  116.205594] Start testing find_bit() with random-filled bitmap
[  116.217621] find_next_bit: 24979 cycles, 16449 iterations
[  116.228719] find_next_zero_bit: 25666 cycles, 16320 iterations
[  116.240620] find_last_bit: 19407 cycles, 16449 iterations
[  116.268368] find_first_bit: 1690945 cycles, 16449 iterations
[  116.279718] Start testing find_bit() with sparse bitmap
[  116.290219] find_next_bit: 352 cycles, 66 iterations
[  116.300692] find_next_zero_bit: 50916 cycles, 32703 iterations
[  116.312400] find_last_bit: 295 cycles, 66 iterations
[  116.322427] find_first_bit: 6742 cycles, 66 iterations

And with inlining:

Before:
[  169.464229] Start testing find_bit() with random-filled bitmap
[  169.476191] find_next_bit: 17520 cycles, 16336 iterations
[  169.487210] find_next_zero_bit: 17622 cycles, 16433 iterations
[  169.499111] find_last_bit: 19272 cycles, 16335 iterations
[  169.519735] find_first_bit: 978657 cycles, 16337 iterations
[  169.530912] Start testing find_bit() with sparse bitmap
[  169.541414] find_next_bit: 252 cycles, 66 iterations
[  169.551726] find_next_zero_bit: 34554 cycles, 32703 iterations
[  169.563436] find_last_bit: 294 cycles, 66 iterations
[  169.573439] find_first_bit: 3964 cycles, 66 iterations

After
[  191.191170] Start testing find_bit() with random-filled bitmap
[  191.203133] find_next_bit: 17530 cycles, 16346 iterations
[  191.214150] find_next_zero_bit: 17630 cycles, 16423 iterations
[  191.226037] find_last_bit: 17489 cycles, 16347 iterations
[  191.246672] find_first_bit: 979961 cycles, 16347 iterations
[  191.257849] Start testing find_bit() with sparse bitmap
[  191.268351] find_next_bit: 257 cycles, 66 iterations
[  191.278660] find_next_zero_bit: 34547 cycles, 32703 iterations
[  191.290370] find_last_bit: 292 cycles, 66 iterations
[  191.300376] find_first_bit: 4269 cycles, 66 iterations

I didn't investigate why non-inlined version of this patch works
faster than vanilla code, but inlined one is even faster and is
as fast as inlined version of existing code. I think, we should
come with it finally.

It would be great if someone test it on x86.
 
> 2)
> Making "and" operation to be centerpiece of this code is kind of meh
> find_next_or_bit() will be hard to implement.

Not so hard actually. :)
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1521775.html

Yury

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] lib: optimize cpumask_next_and()
@ 2017-10-24 10:51 Clement Courbet
  2017-10-25  7:28 ` Matthew Wilcox
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Clement Courbet @ 2017-10-24 10:51 UTC (permalink / raw)
  To: Arnd Bergmann, Rasmus Villemoes, Andrew Morton, Matthew Wilcox,
	Yury Norov
  Cc: Clement Courbet, Ingo Molnar, linux-arch, linux-kernel

We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
It's essentially a joined iteration in search for a non-zero bit, which
is currently implemented as a lookup join (find a nonzero bit on the
lhs, lookup the rhs to see if it's set there).

Implement a direct join (find a nonzero bit on the incrementally built
join). Direct benchmarking shows that it's 1.17x to 14x faster with a
geometric mean of 2.1 on 32 CPUs. No impact on memory usage.

Approximate benchmark code:

```
  unsigned long src1p[nr_cpumask_longs] = {pattern1};
  unsigned long src2p[nr_cpumask_longs] = {pattern2};
  for (/*a bunch of repetitions*/) {
    for (int n = -1; n <= nr_cpu_ids; ++n) {
      asm volatile("" : "+rm"(src1p)); // prevent any optimization
      asm volatile("" : "+rm"(src2p));
      unsigned long result = cpumask_next_and(n, src1p, src2p);
      asm volatile("" : "+rm"(result));
    }
  }
```

Results:
pattern1    pattern2     time_before/time_after
0x0000ffff  0x0000ffff   1.65
0x0000ffff  0x00005555   2.24
0x0000ffff  0x00001111   2.94
0x0000ffff  0x00000000   14.0
0x00005555  0x0000ffff   1.67
0x00005555  0x00005555   1.71
0x00005555  0x00001111   1.90
0x00005555  0x00000000   6.58
0x00001111  0x0000ffff   1.46
0x00001111  0x00005555   1.49
0x00001111  0x00001111   1.45
0x00001111  0x00000000   3.10
0x00000000  0x0000ffff   1.18
0x00000000  0x00005555   1.18
0x00000000  0x00001111   1.17
0x00000000  0x00000000   1.25
-----------------------------
               geo.mean  2.06

Signed-off-by: Clement Courbet <courbet@google.com>
---
Changes in v2:
 - Refactored _find_next_common_bit into _find_next_bit., as suggested
   by Yury Norov. This has no adverse effects on the performance side,
   as the compiler successfully inlines the code.

 include/asm-generic/bitops/find.h       | 16 ++++++++++++++
 include/linux/bitmap.h                  |  2 ++
 lib/cpumask.c                           |  9 ++++----
 lib/find_bit.c                          | 37 +++++++++++++++++++++++++--------
 tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
 5 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
index 998d4d544f18..130962f3a264 100644
--- a/include/asm-generic/bitops/find.h
+++ b/include/asm-generic/bitops/find.h
@@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
 		size, unsigned long offset);
 #endif
 
+#ifndef find_next_and_bit
+/**
+ * find_next_and_bit - find the next set bit in both memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+extern unsigned long find_next_and_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long size,
+		unsigned long offset);
+#endif
+
 #ifndef find_next_zero_bit
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 700cf5f67118..b4606bfda52f 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -77,6 +77,8 @@
  * find_first_bit(addr, nbits)		Position first set bit in *addr
  * find_next_zero_bit(addr, nbits, bit)	Position next zero bit in *addr >= bit
  * find_next_bit(addr, nbits, bit)	Position next set bit in *addr >= bit
+ * find_next_and_bit(addr1, addr2, nbits, bit)	Same as find_first_bit, but in
+ *						(*addr1 & *addr2)
  */
 
 /*
diff --git a/lib/cpumask.c b/lib/cpumask.c
index 8b1a1bd77539..5602223837fa 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
 int cpumask_next_and(int n, const struct cpumask *src1p,
 		     const struct cpumask *src2p)
 {
-	while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
-		if (cpumask_test_cpu(n, src2p))
-			break;
-	return n;
+	/* -1 is a legal arg here. */
+	if (n != -1)
+		cpumask_check(n);
+	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
+		nr_cpumask_bits, n + 1);
 }
 EXPORT_SYMBOL(cpumask_next_and);
 
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 6ed74f78380c..ebc08fd9fdf8 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -24,19 +24,25 @@
 #if !defined(find_next_bit) || !defined(find_next_zero_bit)
 
 /*
- * This is a common helper function for find_next_bit and
- * find_next_zero_bit.  The difference is the "invert" argument, which
- * is XORed with each fetched word before searching it for one bits.
+ * This is a common helper function for find_next_bit, find_next_zero_bit, and
+ * find_next_and_bit. The differences are:
+ *  - The "invert" argument, which is XORed with each fetched word before
+ *    searching it for one bits.
+ *  - The optional "addr2", which is anded with "addr1" if present.
  */
-static unsigned long _find_next_bit(const unsigned long *addr,
-		unsigned long nbits, unsigned long start, unsigned long invert)
+static unsigned long _find_next_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long nbits,
+		unsigned long start, unsigned long invert)
 {
 	unsigned long tmp;
 
 	if (unlikely(start >= nbits))
 		return nbits;
 
-	tmp = addr[start / BITS_PER_LONG] ^ invert;
+	tmp = addr1[start / BITS_PER_LONG];
+	if (addr2)
+		tmp &= addr2[start / BITS_PER_LONG];
+	tmp ^= invert;
 
 	/* Handle 1st word. */
 	tmp &= BITMAP_FIRST_WORD_MASK(start);
@@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long *addr,
 		if (start >= nbits)
 			return nbits;
 
-		tmp = addr[start / BITS_PER_LONG] ^ invert;
+		tmp = addr1[start / BITS_PER_LONG];
+		if (addr2)
+			tmp &= addr2[start / BITS_PER_LONG];
+		tmp ^= invert;
 	}
 
 	return min(start + __ffs(tmp), nbits);
@@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long *addr,
 unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
 			    unsigned long offset)
 {
-	return _find_next_bit(addr, size, offset, 0UL);
+	return _find_next_bit(addr, NULL, size, offset, 0UL);
 }
 EXPORT_SYMBOL(find_next_bit);
 #endif
@@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
 unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
 				 unsigned long offset)
 {
-	return _find_next_bit(addr, size, offset, ~0UL);
+	return _find_next_bit(addr, NULL, size, offset, ~0UL);
 }
 EXPORT_SYMBOL(find_next_zero_bit);
 #endif
 
+#if !defined(find_next_and_bit)
+unsigned long find_next_and_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long nbits,
+		unsigned long start)
+{
+	return _find_next_bit(addr1, addr2, size, offset, ~0UL);
+}
+EXPORT_SYMBOL(find_next_and_bit);
+#endif
+
 #ifndef find_first_bit
 /*
  * Find the first set bit in a memory region.
diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-generic/bitops/find.h
index 5538ecdc964a..435c7d002f33 100644
--- a/tools/include/asm-generic/bitops/find.h
+++ b/tools/include/asm-generic/bitops/find.h
@@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
 		size, unsigned long offset);
 #endif
 
+#ifndef find_next_and_bit
+/**
+ * find_next_and_bit - find the next set bit in both memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @offset: The bitnumber to start searching at
+ * @size: The bitmap size in bits
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+extern unsigned long find_next_and_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long size,
+		unsigned long offset);
+#endif
+
 #ifndef find_next_zero_bit
 
 /**
-- 
2.15.0.rc0.271.g36b669edcc-goog

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-24 10:51 [PATCH v2] " Clement Courbet
@ 2017-10-25  7:28 ` Matthew Wilcox
  2017-10-25  7:28   ` Matthew Wilcox
  2017-10-25 14:59 ` Yury Norov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2017-10-25  7:28 UTC (permalink / raw)
  To: Clement Courbet, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
	Yury Norov
  Cc: Ingo Molnar, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org

Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>

> -----Original Message-----
> From: Clement Courbet [mailto:courbet@google.com]
> Sent: Tuesday, October 24, 2017 6:52 AM
> To: Arnd Bergmann <arnd@arndb.de>; Rasmus Villemoes
> <linux@rasmusvillemoes.dk>; Andrew Morton <akpm@linux-foundation.org>;
> Matthew Wilcox <mawilcox@microsoft.com>; Yury Norov
> <ynorov@caviumnetworks.com>
> Cc: Clement Courbet <courbet@google.com>; Ingo Molnar
> <mingo@kernel.org>; linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH v2] lib: optimize cpumask_next_and()
> 
> We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
> It's essentially a joined iteration in search for a non-zero bit, which
> is currently implemented as a lookup join (find a nonzero bit on the
> lhs, lookup the rhs to see if it's set there).
> 
> Implement a direct join (find a nonzero bit on the incrementally built
> join). Direct benchmarking shows that it's 1.17x to 14x faster with a
> geometric mean of 2.1 on 32 CPUs. No impact on memory usage.
> 
> Approximate benchmark code:
> 
> ```
>   unsigned long src1p[nr_cpumask_longs] = {pattern1};
>   unsigned long src2p[nr_cpumask_longs] = {pattern2};
>   for (/*a bunch of repetitions*/) {
>     for (int n = -1; n <= nr_cpu_ids; ++n) {
>       asm volatile("" : "+rm"(src1p)); // prevent any optimization
>       asm volatile("" : "+rm"(src2p));
>       unsigned long result = cpumask_next_and(n, src1p, src2p);
>       asm volatile("" : "+rm"(result));
>     }
>   }
> ```
> 
> Results:
> pattern1    pattern2     time_before/time_after
> 0x0000ffff  0x0000ffff   1.65
> 0x0000ffff  0x00005555   2.24
> 0x0000ffff  0x00001111   2.94
> 0x0000ffff  0x00000000   14.0
> 0x00005555  0x0000ffff   1.67
> 0x00005555  0x00005555   1.71
> 0x00005555  0x00001111   1.90
> 0x00005555  0x00000000   6.58
> 0x00001111  0x0000ffff   1.46
> 0x00001111  0x00005555   1.49
> 0x00001111  0x00001111   1.45
> 0x00001111  0x00000000   3.10
> 0x00000000  0x0000ffff   1.18
> 0x00000000  0x00005555   1.18
> 0x00000000  0x00001111   1.17
> 0x00000000  0x00000000   1.25
> -----------------------------
>                geo.mean  2.06
> 
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
> Changes in v2:
>  - Refactored _find_next_common_bit into _find_next_bit., as suggested
>    by Yury Norov. This has no adverse effects on the performance side,
>    as the compiler successfully inlines the code.
> 
>  include/asm-generic/bitops/find.h       | 16 ++++++++++++++
>  include/linux/bitmap.h                  |  2 ++
>  lib/cpumask.c                           |  9 ++++----
>  lib/find_bit.c                          | 37 +++++++++++++++++++++++++--------
>  tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
>  5 files changed, 67 insertions(+), 13 deletions(-)
> 
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-
> generic/bitops/find.h
> index 998d4d544f18..130962f3a264 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long
> *addr, unsigned long
>  		size, unsigned long offset);
>  #endif
> 
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long size,
> +		unsigned long offset);
> +#endif
> +
>  #ifndef find_next_zero_bit
>  /**
>   * find_next_zero_bit - find the next cleared bit in a memory region
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f67118..b4606bfda52f 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -77,6 +77,8 @@
>   * find_first_bit(addr, nbits)		Position first set bit in *addr
>   * find_next_zero_bit(addr, nbits, bit)	Position next zero bit in *addr >= bit
>   * find_next_bit(addr, nbits, bit)	Position next set bit in *addr >= bit
> + * find_next_and_bit(addr1, addr2, nbits, bit)	Same as find_first_bit, but in
> + *						(*addr1 & *addr2)
>   */
> 
>  /*
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8b1a1bd77539..5602223837fa 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
>  int cpumask_next_and(int n, const struct cpumask *src1p,
>  		     const struct cpumask *src2p)
>  {
> -	while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
> -		if (cpumask_test_cpu(n, src2p))
> -			break;
> -	return n;
> +	/* -1 is a legal arg here. */
> +	if (n != -1)
> +		cpumask_check(n);
> +	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> +		nr_cpumask_bits, n + 1);
>  }
>  EXPORT_SYMBOL(cpumask_next_and);
> 
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 6ed74f78380c..ebc08fd9fdf8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -24,19 +24,25 @@
>  #if !defined(find_next_bit) || !defined(find_next_zero_bit)
> 
>  /*
> - * This is a common helper function for find_next_bit and
> - * find_next_zero_bit.  The difference is the "invert" argument, which
> - * is XORed with each fetched word before searching it for one bits.
> + * This is a common helper function for find_next_bit, find_next_zero_bit, and
> + * find_next_and_bit. The differences are:
> + *  - The "invert" argument, which is XORed with each fetched word before
> + *    searching it for one bits.
> + *  - The optional "addr2", which is anded with "addr1" if present.
>   */
> -static unsigned long _find_next_bit(const unsigned long *addr,
> -		unsigned long nbits, unsigned long start, unsigned long invert)
> +static unsigned long _find_next_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,
> +		unsigned long start, unsigned long invert)
>  {
>  	unsigned long tmp;
> 
>  	if (unlikely(start >= nbits))
>  		return nbits;
> 
> -	tmp = addr[start / BITS_PER_LONG] ^ invert;
> +	tmp = addr1[start / BITS_PER_LONG];
> +	if (addr2)
> +		tmp &= addr2[start / BITS_PER_LONG];
> +	tmp ^= invert;
> 
>  	/* Handle 1st word. */
>  	tmp &= BITMAP_FIRST_WORD_MASK(start);
> @@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long
> *addr,
>  		if (start >= nbits)
>  			return nbits;
> 
> -		tmp = addr[start / BITS_PER_LONG] ^ invert;
> +		tmp = addr1[start / BITS_PER_LONG];
> +		if (addr2)
> +			tmp &= addr2[start / BITS_PER_LONG];
> +		tmp ^= invert;
>  	}
> 
>  	return min(start + __ffs(tmp), nbits);
> @@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long
> *addr,
>  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>  			    unsigned long offset)
>  {
> -	return _find_next_bit(addr, size, offset, 0UL);
> +	return _find_next_bit(addr, NULL, size, offset, 0UL);
>  }
>  EXPORT_SYMBOL(find_next_bit);
>  #endif
> @@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
>  unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long
> size,
>  				 unsigned long offset)
>  {
> -	return _find_next_bit(addr, size, offset, ~0UL);
> +	return _find_next_bit(addr, NULL, size, offset, ~0UL);
>  }
>  EXPORT_SYMBOL(find_next_zero_bit);
>  #endif
> 
> +#if !defined(find_next_and_bit)
> +unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,
> +		unsigned long start)
> +{
> +	return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> +}
> +EXPORT_SYMBOL(find_next_and_bit);
> +#endif
> +
>  #ifndef find_first_bit
>  /*
>   * Find the first set bit in a memory region.
> diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-
> generic/bitops/find.h
> index 5538ecdc964a..435c7d002f33 100644
> --- a/tools/include/asm-generic/bitops/find.h
> +++ b/tools/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long
> *addr, unsigned long
>  		size, unsigned long offset);
>  #endif
> 
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long size,
> +		unsigned long offset);
> +#endif
> +
>  #ifndef find_next_zero_bit
> 
>  /**
> --
> 2.15.0.rc0.271.g36b669edcc-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-25  7:28 ` Matthew Wilcox
@ 2017-10-25  7:28   ` Matthew Wilcox
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2017-10-25  7:28 UTC (permalink / raw)
  To: Clement Courbet, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
	Yury Norov
  Cc: Ingo Molnar, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org

Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>

> -----Original Message-----
> From: Clement Courbet [mailto:courbet@google.com]
> Sent: Tuesday, October 24, 2017 6:52 AM
> To: Arnd Bergmann <arnd@arndb.de>; Rasmus Villemoes
> <linux@rasmusvillemoes.dk>; Andrew Morton <akpm@linux-foundation.org>;
> Matthew Wilcox <mawilcox@microsoft.com>; Yury Norov
> <ynorov@caviumnetworks.com>
> Cc: Clement Courbet <courbet@google.com>; Ingo Molnar
> <mingo@kernel.org>; linux-arch@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH v2] lib: optimize cpumask_next_and()
> 
> We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
> It's essentially a joined iteration in search for a non-zero bit, which
> is currently implemented as a lookup join (find a nonzero bit on the
> lhs, lookup the rhs to see if it's set there).
> 
> Implement a direct join (find a nonzero bit on the incrementally built
> join). Direct benchmarking shows that it's 1.17x to 14x faster with a
> geometric mean of 2.1 on 32 CPUs. No impact on memory usage.
> 
> Approximate benchmark code:
> 
> ```
>   unsigned long src1p[nr_cpumask_longs] = {pattern1};
>   unsigned long src2p[nr_cpumask_longs] = {pattern2};
>   for (/*a bunch of repetitions*/) {
>     for (int n = -1; n <= nr_cpu_ids; ++n) {
>       asm volatile("" : "+rm"(src1p)); // prevent any optimization
>       asm volatile("" : "+rm"(src2p));
>       unsigned long result = cpumask_next_and(n, src1p, src2p);
>       asm volatile("" : "+rm"(result));
>     }
>   }
> ```
> 
> Results:
> pattern1    pattern2     time_before/time_after
> 0x0000ffff  0x0000ffff   1.65
> 0x0000ffff  0x00005555   2.24
> 0x0000ffff  0x00001111   2.94
> 0x0000ffff  0x00000000   14.0
> 0x00005555  0x0000ffff   1.67
> 0x00005555  0x00005555   1.71
> 0x00005555  0x00001111   1.90
> 0x00005555  0x00000000   6.58
> 0x00001111  0x0000ffff   1.46
> 0x00001111  0x00005555   1.49
> 0x00001111  0x00001111   1.45
> 0x00001111  0x00000000   3.10
> 0x00000000  0x0000ffff   1.18
> 0x00000000  0x00005555   1.18
> 0x00000000  0x00001111   1.17
> 0x00000000  0x00000000   1.25
> -----------------------------
>                geo.mean  2.06
> 
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
> Changes in v2:
>  - Refactored _find_next_common_bit into _find_next_bit., as suggested
>    by Yury Norov. This has no adverse effects on the performance side,
>    as the compiler successfully inlines the code.
> 
>  include/asm-generic/bitops/find.h       | 16 ++++++++++++++
>  include/linux/bitmap.h                  |  2 ++
>  lib/cpumask.c                           |  9 ++++----
>  lib/find_bit.c                          | 37 +++++++++++++++++++++++++--------
>  tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
>  5 files changed, 67 insertions(+), 13 deletions(-)
> 
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-
> generic/bitops/find.h
> index 998d4d544f18..130962f3a264 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long
> *addr, unsigned long
>  		size, unsigned long offset);
>  #endif
> 
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long size,
> +		unsigned long offset);
> +#endif
> +
>  #ifndef find_next_zero_bit
>  /**
>   * find_next_zero_bit - find the next cleared bit in a memory region
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f67118..b4606bfda52f 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -77,6 +77,8 @@
>   * find_first_bit(addr, nbits)		Position first set bit in *addr
>   * find_next_zero_bit(addr, nbits, bit)	Position next zero bit in *addr >= bit
>   * find_next_bit(addr, nbits, bit)	Position next set bit in *addr >= bit
> + * find_next_and_bit(addr1, addr2, nbits, bit)	Same as find_first_bit, but in
> + *						(*addr1 & *addr2)
>   */
> 
>  /*
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8b1a1bd77539..5602223837fa 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
>  int cpumask_next_and(int n, const struct cpumask *src1p,
>  		     const struct cpumask *src2p)
>  {
> -	while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
> -		if (cpumask_test_cpu(n, src2p))
> -			break;
> -	return n;
> +	/* -1 is a legal arg here. */
> +	if (n != -1)
> +		cpumask_check(n);
> +	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> +		nr_cpumask_bits, n + 1);
>  }
>  EXPORT_SYMBOL(cpumask_next_and);
> 
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 6ed74f78380c..ebc08fd9fdf8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -24,19 +24,25 @@
>  #if !defined(find_next_bit) || !defined(find_next_zero_bit)
> 
>  /*
> - * This is a common helper function for find_next_bit and
> - * find_next_zero_bit.  The difference is the "invert" argument, which
> - * is XORed with each fetched word before searching it for one bits.
> + * This is a common helper function for find_next_bit, find_next_zero_bit, and
> + * find_next_and_bit. The differences are:
> + *  - The "invert" argument, which is XORed with each fetched word before
> + *    searching it for one bits.
> + *  - The optional "addr2", which is anded with "addr1" if present.
>   */
> -static unsigned long _find_next_bit(const unsigned long *addr,
> -		unsigned long nbits, unsigned long start, unsigned long invert)
> +static unsigned long _find_next_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,
> +		unsigned long start, unsigned long invert)
>  {
>  	unsigned long tmp;
> 
>  	if (unlikely(start >= nbits))
>  		return nbits;
> 
> -	tmp = addr[start / BITS_PER_LONG] ^ invert;
> +	tmp = addr1[start / BITS_PER_LONG];
> +	if (addr2)
> +		tmp &= addr2[start / BITS_PER_LONG];
> +	tmp ^= invert;
> 
>  	/* Handle 1st word. */
>  	tmp &= BITMAP_FIRST_WORD_MASK(start);
> @@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long
> *addr,
>  		if (start >= nbits)
>  			return nbits;
> 
> -		tmp = addr[start / BITS_PER_LONG] ^ invert;
> +		tmp = addr1[start / BITS_PER_LONG];
> +		if (addr2)
> +			tmp &= addr2[start / BITS_PER_LONG];
> +		tmp ^= invert;
>  	}
> 
>  	return min(start + __ffs(tmp), nbits);
> @@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long
> *addr,
>  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>  			    unsigned long offset)
>  {
> -	return _find_next_bit(addr, size, offset, 0UL);
> +	return _find_next_bit(addr, NULL, size, offset, 0UL);
>  }
>  EXPORT_SYMBOL(find_next_bit);
>  #endif
> @@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
>  unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long
> size,
>  				 unsigned long offset)
>  {
> -	return _find_next_bit(addr, size, offset, ~0UL);
> +	return _find_next_bit(addr, NULL, size, offset, ~0UL);
>  }
>  EXPORT_SYMBOL(find_next_zero_bit);
>  #endif
> 
> +#if !defined(find_next_and_bit)
> +unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,
> +		unsigned long start)
> +{
> +	return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> +}
> +EXPORT_SYMBOL(find_next_and_bit);
> +#endif
> +
>  #ifndef find_first_bit
>  /*
>   * Find the first set bit in a memory region.
> diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/asm-
> generic/bitops/find.h
> index 5538ecdc964a..435c7d002f33 100644
> --- a/tools/include/asm-generic/bitops/find.h
> +++ b/tools/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long
> *addr, unsigned long
>  		size, unsigned long offset);
>  #endif
> 
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long size,
> +		unsigned long offset);
> +#endif
> +
>  #ifndef find_next_zero_bit
> 
>  /**
> --
> 2.15.0.rc0.271.g36b669edcc-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-24 10:51 [PATCH v2] " Clement Courbet
  2017-10-25  7:28 ` Matthew Wilcox
@ 2017-10-25 14:59 ` Yury Norov
  2017-10-25 14:59   ` Yury Norov
  2017-10-26 12:12 ` kbuild test robot
  2017-10-26 12:22 ` kbuild test robot
  3 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2017-10-25 14:59 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Arnd Bergmann, Rasmus Villemoes, Andrew Morton, Matthew Wilcox,
	Ingo Molnar, linux-arch, linux-kernel

On Tue, Oct 24, 2017 at 12:51:59PM +0200, Clement Courbet wrote:
> We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
> It's essentially a joined iteration in search for a non-zero bit, which
> is currently implemented as a lookup join (find a nonzero bit on the
> lhs, lookup the rhs to see if it's set there).
> 
> Implement a direct join (find a nonzero bit on the incrementally built
> join). Direct benchmarking shows that it's 1.17x to 14x faster with a
> geometric mean of 2.1 on 32 CPUs. No impact on memory usage.
> 
> Approximate benchmark code:
> 
> ```
>   unsigned long src1p[nr_cpumask_longs] = {pattern1};
>   unsigned long src2p[nr_cpumask_longs] = {pattern2};
>   for (/*a bunch of repetitions*/) {
>     for (int n = -1; n <= nr_cpu_ids; ++n) {
>       asm volatile("" : "+rm"(src1p)); // prevent any optimization
>       asm volatile("" : "+rm"(src2p));
>       unsigned long result = cpumask_next_and(n, src1p, src2p);
>       asm volatile("" : "+rm"(result));
>     }
>   }
> ```
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
> Changes in v2:
>  - Refactored _find_next_common_bit into _find_next_bit., as suggested
>    by Yury Norov.

What I actually suggested is make _find_next_and_bit() similar to
_find_next_bit(), not to extend _find_next_bit(). But what you did
looks OK.

> This has no adverse effects on the performance side,
>    as the compiler successfully inlines the code.

I think it's not about inlining, compiler just optimizes out branches known
as false at compile-time.

>  include/asm-generic/bitops/find.h       | 16 ++++++++++++++
>  include/linux/bitmap.h                  |  2 ++
>  lib/cpumask.c                           |  9 ++++----
>  lib/find_bit.c                          | 37 +++++++++++++++++++++++++--------
>  tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
>  5 files changed, 67 insertions(+), 13 deletions(-)
> 
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
> index 998d4d544f18..130962f3a264 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
>  		size, unsigned long offset);
>  #endif
>  
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long size,
> +		unsigned long offset);
> +#endif
> +
>  #ifndef find_next_zero_bit
>  /**
>   * find_next_zero_bit - find the next cleared bit in a memory region
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f67118..b4606bfda52f 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -77,6 +77,8 @@
>   * find_first_bit(addr, nbits)		Position first set bit in *addr
>   * find_next_zero_bit(addr, nbits, bit)	Position next zero bit in *addr >= bit
>   * find_next_bit(addr, nbits, bit)	Position next set bit in *addr >= bit
> + * find_next_and_bit(addr1, addr2, nbits, bit)	Same as find_first_bit, but in
> + *						(*addr1 & *addr2)
>   */
>  
>  /*
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8b1a1bd77539..5602223837fa 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
>  int cpumask_next_and(int n, const struct cpumask *src1p,
>  		     const struct cpumask *src2p)
>  {
> -	while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
> -		if (cpumask_test_cpu(n, src2p))
> -			break;
> -	return n;
> +	/* -1 is a legal arg here. */
> +	if (n != -1)
> +		cpumask_check(n);
> +	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> +		nr_cpumask_bits, n + 1);
>  }
>  EXPORT_SYMBOL(cpumask_next_and);
>  
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 6ed74f78380c..ebc08fd9fdf8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -24,19 +24,25 @@
>  #if !defined(find_next_bit) || !defined(find_next_zero_bit)
>  
>  /*
> - * This is a common helper function for find_next_bit and
> - * find_next_zero_bit.  The difference is the "invert" argument, which
> - * is XORed with each fetched word before searching it for one bits.
> + * This is a common helper function for find_next_bit, find_next_zero_bit, and
> + * find_next_and_bit. The differences are:
> + *  - The "invert" argument, which is XORed with each fetched word before
> + *    searching it for one bits.
> + *  - The optional "addr2", which is anded with "addr1" if present.
>   */
> -static unsigned long _find_next_bit(const unsigned long *addr,
> -		unsigned long nbits, unsigned long start, unsigned long invert)
> +static unsigned long _find_next_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,
> +		unsigned long start, unsigned long invert)
>  {
>  	unsigned long tmp;
>  
>  	if (unlikely(start >= nbits))
>  		return nbits;
>  
> -	tmp = addr[start / BITS_PER_LONG] ^ invert;
> +	tmp = addr1[start / BITS_PER_LONG];
> +	if (addr2)
> +		tmp &= addr2[start / BITS_PER_LONG];
> +	tmp ^= invert;
>  
>  	/* Handle 1st word. */
>  	tmp &= BITMAP_FIRST_WORD_MASK(start);
> @@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long *addr,
>  		if (start >= nbits)
>  			return nbits;
>  
> -		tmp = addr[start / BITS_PER_LONG] ^ invert;
> +		tmp = addr1[start / BITS_PER_LONG];
> +		if (addr2)
> +			tmp &= addr2[start / BITS_PER_LONG];
> +		tmp ^= invert;
>  	}
>  
>  	return min(start + __ffs(tmp), nbits);
> @@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long *addr,
>  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>  			    unsigned long offset)
>  {
> -	return _find_next_bit(addr, size, offset, 0UL);
> +	return _find_next_bit(addr, NULL, size, offset, 0UL);
>  }
>  EXPORT_SYMBOL(find_next_bit);
>  #endif
> @@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
>  unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
>  				 unsigned long offset)
>  {
> -	return _find_next_bit(addr, size, offset, ~0UL);
> +	return _find_next_bit(addr, NULL, size, offset, ~0UL);
>  }
>  EXPORT_SYMBOL(find_next_zero_bit);
>  #endif
>  
> +#if !defined(find_next_and_bit)
> +unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,
> +		unsigned long start)

It should be:

unsigned long find_next_and_bit(const unsigned long *addr1,
		const unsigned long *addr2, unsigned long size,
		unsigned long offset)

> +{
> +	return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> +}
> +EXPORT_SYMBOL(find_next_and_bit);
> +#endif
> +
>  #ifndef find_first_bit
>  /*
>   * Find the first set bit in a memory region.

If we continue this way, we'll most probably end up like this,
sooner or later:

diff --git a/lib/find_bit.c b/lib/find_bit.c
index ebc08fd9fdf8..1b0b4aedc93a 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -31,8 +31,12 @@
  *  - The optional "addr2", which is anded with "addr1" if present.
  */
 static unsigned long _find_next_bit(const unsigned long *addr1,
-		const unsigned long *addr2, unsigned long nbits,
-		unsigned long start, unsigned long invert)
+		const unsigned long *and,
+		const unsigned long *or,
+		const unsigned long *xor,
+		unsigned long nbits,
+		unsigned long start,
+		unsigned long invert)
 {
 	unsigned long tmp;
 
@@ -40,8 +44,12 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
 		return nbits;
 
 	tmp = addr1[start / BITS_PER_LONG];
-	if (addr2)
-		tmp &= addr2[start / BITS_PER_LONG];
+	if (and)
+		tmp &= and[start / BITS_PER_LONG];
+	if (or)
+		tmp |= or[start / BITS_PER_LONG];
+	if (xor)
+		tmp ^= xor[start / BITS_PER_LONG];
 	tmp ^= invert;
 
 	/* Handle 1st word. */
@@ -54,8 +62,12 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
 			return nbits;
 
 		tmp = addr1[start / BITS_PER_LONG];
-		if (addr2)
-			tmp &= addr2[start / BITS_PER_LONG];
+		if (and)
+			tmp &= and[start / BITS_PER_LONG];
+		if (or)
+			tmp |= or[start / BITS_PER_LONG];
+		if (xor)
+			tmp ^= xor[start / BITS_PER_LONG];
 		tmp ^= invert;
 	}
 
Just a fantasy of course.

I'm generally fine to proceed this way. It makes _find_next_bit() more
complex, but lets us avoid code duplication, which is better deal for
long-term maintenance.

But I'd like also to keep _find_next_bit() consistent with _find_next_bit_le()

Could you please send v3 with fixed find_next_and_bit() declaration,
and synced LE routines?

Yury

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-25 14:59 ` Yury Norov
@ 2017-10-25 14:59   ` Yury Norov
  0 siblings, 0 replies; 12+ messages in thread
From: Yury Norov @ 2017-10-25 14:59 UTC (permalink / raw)
  To: Clement Courbet
  Cc: Arnd Bergmann, Rasmus Villemoes, Andrew Morton, Matthew Wilcox,
	Ingo Molnar, linux-arch, linux-kernel

On Tue, Oct 24, 2017 at 12:51:59PM +0200, Clement Courbet wrote:
> We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
> It's essentially a joined iteration in search for a non-zero bit, which
> is currently implemented as a lookup join (find a nonzero bit on the
> lhs, lookup the rhs to see if it's set there).
> 
> Implement a direct join (find a nonzero bit on the incrementally built
> join). Direct benchmarking shows that it's 1.17x to 14x faster with a
> geometric mean of 2.1 on 32 CPUs. No impact on memory usage.
> 
> Approximate benchmark code:
> 
> ```
>   unsigned long src1p[nr_cpumask_longs] = {pattern1};
>   unsigned long src2p[nr_cpumask_longs] = {pattern2};
>   for (/*a bunch of repetitions*/) {
>     for (int n = -1; n <= nr_cpu_ids; ++n) {
>       asm volatile("" : "+rm"(src1p)); // prevent any optimization
>       asm volatile("" : "+rm"(src2p));
>       unsigned long result = cpumask_next_and(n, src1p, src2p);
>       asm volatile("" : "+rm"(result));
>     }
>   }
> ```
> Signed-off-by: Clement Courbet <courbet@google.com>
> ---
> Changes in v2:
>  - Refactored _find_next_common_bit into _find_next_bit., as suggested
>    by Yury Norov.

What I actually suggested is make _find_next_and_bit() similar to
_find_next_bit(), not to extend _find_next_bit(). But what you did
looks OK.

> This has no adverse effects on the performance side,
>    as the compiler successfully inlines the code.

I think it's not about inlining, compiler just optimizes out branches known
as false at compile-time.

>  include/asm-generic/bitops/find.h       | 16 ++++++++++++++
>  include/linux/bitmap.h                  |  2 ++
>  lib/cpumask.c                           |  9 ++++----
>  lib/find_bit.c                          | 37 +++++++++++++++++++++++++--------
>  tools/include/asm-generic/bitops/find.h | 16 ++++++++++++++
>  5 files changed, 67 insertions(+), 13 deletions(-)
> 
> diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h
> index 998d4d544f18..130962f3a264 100644
> --- a/include/asm-generic/bitops/find.h
> +++ b/include/asm-generic/bitops/find.h
> @@ -15,6 +15,22 @@ extern unsigned long find_next_bit(const unsigned long *addr, unsigned long
>  		size, unsigned long offset);
>  #endif
>  
> +#ifndef find_next_and_bit
> +/**
> + * find_next_and_bit - find the next set bit in both memory regions
> + * @addr1: The first address to base the search on
> + * @addr2: The second address to base the search on
> + * @offset: The bitnumber to start searching at
> + * @size: The bitmap size in bits
> + *
> + * Returns the bit number for the next set bit
> + * If no bits are set, returns @size.
> + */
> +extern unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long size,
> +		unsigned long offset);
> +#endif
> +
>  #ifndef find_next_zero_bit
>  /**
>   * find_next_zero_bit - find the next cleared bit in a memory region
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 700cf5f67118..b4606bfda52f 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -77,6 +77,8 @@
>   * find_first_bit(addr, nbits)		Position first set bit in *addr
>   * find_next_zero_bit(addr, nbits, bit)	Position next zero bit in *addr >= bit
>   * find_next_bit(addr, nbits, bit)	Position next set bit in *addr >= bit
> + * find_next_and_bit(addr1, addr2, nbits, bit)	Same as find_first_bit, but in
> + *						(*addr1 & *addr2)
>   */
>  
>  /*
> diff --git a/lib/cpumask.c b/lib/cpumask.c
> index 8b1a1bd77539..5602223837fa 100644
> --- a/lib/cpumask.c
> +++ b/lib/cpumask.c
> @@ -32,10 +32,11 @@ EXPORT_SYMBOL(cpumask_next);
>  int cpumask_next_and(int n, const struct cpumask *src1p,
>  		     const struct cpumask *src2p)
>  {
> -	while ((n = cpumask_next(n, src1p)) < nr_cpu_ids)
> -		if (cpumask_test_cpu(n, src2p))
> -			break;
> -	return n;
> +	/* -1 is a legal arg here. */
> +	if (n != -1)
> +		cpumask_check(n);
> +	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
> +		nr_cpumask_bits, n + 1);
>  }
>  EXPORT_SYMBOL(cpumask_next_and);
>  
> diff --git a/lib/find_bit.c b/lib/find_bit.c
> index 6ed74f78380c..ebc08fd9fdf8 100644
> --- a/lib/find_bit.c
> +++ b/lib/find_bit.c
> @@ -24,19 +24,25 @@
>  #if !defined(find_next_bit) || !defined(find_next_zero_bit)
>  
>  /*
> - * This is a common helper function for find_next_bit and
> - * find_next_zero_bit.  The difference is the "invert" argument, which
> - * is XORed with each fetched word before searching it for one bits.
> + * This is a common helper function for find_next_bit, find_next_zero_bit, and
> + * find_next_and_bit. The differences are:
> + *  - The "invert" argument, which is XORed with each fetched word before
> + *    searching it for one bits.
> + *  - The optional "addr2", which is anded with "addr1" if present.
>   */
> -static unsigned long _find_next_bit(const unsigned long *addr,
> -		unsigned long nbits, unsigned long start, unsigned long invert)
> +static unsigned long _find_next_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,
> +		unsigned long start, unsigned long invert)
>  {
>  	unsigned long tmp;
>  
>  	if (unlikely(start >= nbits))
>  		return nbits;
>  
> -	tmp = addr[start / BITS_PER_LONG] ^ invert;
> +	tmp = addr1[start / BITS_PER_LONG];
> +	if (addr2)
> +		tmp &= addr2[start / BITS_PER_LONG];
> +	tmp ^= invert;
>  
>  	/* Handle 1st word. */
>  	tmp &= BITMAP_FIRST_WORD_MASK(start);
> @@ -47,7 +53,10 @@ static unsigned long _find_next_bit(const unsigned long *addr,
>  		if (start >= nbits)
>  			return nbits;
>  
> -		tmp = addr[start / BITS_PER_LONG] ^ invert;
> +		tmp = addr1[start / BITS_PER_LONG];
> +		if (addr2)
> +			tmp &= addr2[start / BITS_PER_LONG];
> +		tmp ^= invert;
>  	}
>  
>  	return min(start + __ffs(tmp), nbits);
> @@ -61,7 +70,7 @@ static unsigned long _find_next_bit(const unsigned long *addr,
>  unsigned long find_next_bit(const unsigned long *addr, unsigned long size,
>  			    unsigned long offset)
>  {
> -	return _find_next_bit(addr, size, offset, 0UL);
> +	return _find_next_bit(addr, NULL, size, offset, 0UL);
>  }
>  EXPORT_SYMBOL(find_next_bit);
>  #endif
> @@ -70,11 +79,21 @@ EXPORT_SYMBOL(find_next_bit);
>  unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
>  				 unsigned long offset)
>  {
> -	return _find_next_bit(addr, size, offset, ~0UL);
> +	return _find_next_bit(addr, NULL, size, offset, ~0UL);
>  }
>  EXPORT_SYMBOL(find_next_zero_bit);
>  #endif
>  
> +#if !defined(find_next_and_bit)
> +unsigned long find_next_and_bit(const unsigned long *addr1,
> +		const unsigned long *addr2, unsigned long nbits,
> +		unsigned long start)

It should be:

unsigned long find_next_and_bit(const unsigned long *addr1,
		const unsigned long *addr2, unsigned long size,
		unsigned long offset)

> +{
> +	return _find_next_bit(addr1, addr2, size, offset, ~0UL);
> +}
> +EXPORT_SYMBOL(find_next_and_bit);
> +#endif
> +
>  #ifndef find_first_bit
>  /*
>   * Find the first set bit in a memory region.

If we continue this way, we'll most probably end up like this,
sooner or later:

diff --git a/lib/find_bit.c b/lib/find_bit.c
index ebc08fd9fdf8..1b0b4aedc93a 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -31,8 +31,12 @@
  *  - The optional "addr2", which is anded with "addr1" if present.
  */
 static unsigned long _find_next_bit(const unsigned long *addr1,
-		const unsigned long *addr2, unsigned long nbits,
-		unsigned long start, unsigned long invert)
+		const unsigned long *and,
+		const unsigned long *or,
+		const unsigned long *xor,
+		unsigned long nbits,
+		unsigned long start,
+		unsigned long invert)
 {
 	unsigned long tmp;
 
@@ -40,8 +44,12 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
 		return nbits;
 
 	tmp = addr1[start / BITS_PER_LONG];
-	if (addr2)
-		tmp &= addr2[start / BITS_PER_LONG];
+	if (and)
+		tmp &= and[start / BITS_PER_LONG];
+	if (or)
+		tmp |= or[start / BITS_PER_LONG];
+	if (xor)
+		tmp ^= xor[start / BITS_PER_LONG];
 	tmp ^= invert;
 
 	/* Handle 1st word. */
@@ -54,8 +62,12 @@ static unsigned long _find_next_bit(const unsigned long *addr1,
 			return nbits;
 
 		tmp = addr1[start / BITS_PER_LONG];
-		if (addr2)
-			tmp &= addr2[start / BITS_PER_LONG];
+		if (and)
+			tmp &= and[start / BITS_PER_LONG];
+		if (or)
+			tmp |= or[start / BITS_PER_LONG];
+		if (xor)
+			tmp ^= xor[start / BITS_PER_LONG];
 		tmp ^= invert;
 	}
 
Just a fantasy of course.

I'm generally fine to proceed this way. It makes _find_next_bit() more
complex, but lets us avoid code duplication, which is better deal for
long-term maintenance.

But I'd like also to keep _find_next_bit() consistent with _find_next_bit_le()

Could you please send v3 with fixed find_next_and_bit() declaration,
and synced LE routines?

Yury

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-24 10:51 [PATCH v2] " Clement Courbet
  2017-10-25  7:28 ` Matthew Wilcox
  2017-10-25 14:59 ` Yury Norov
@ 2017-10-26 12:12 ` kbuild test robot
  2017-10-26 12:12   ` kbuild test robot
  2017-10-26 12:22 ` kbuild test robot
  3 siblings, 1 reply; 12+ messages in thread
From: kbuild test robot @ 2017-10-26 12:12 UTC (permalink / raw)
  Cc: kbuild-all, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
	Matthew Wilcox, Yury Norov, Clement Courbet, Ingo Molnar,
	linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1751 bytes --]

Hi Clement,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Clement-Courbet/lib-optimize-cpumask_next_and/20171026-184850
config: i386-randconfig-x001-201743 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   lib/find_bit.c: In function 'find_next_and_bit':
>> lib/find_bit.c:92:38: error: 'size' undeclared (first use in this function)
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
                                         ^~~~
   lib/find_bit.c:92:38: note: each undeclared identifier is reported only once for each function it appears in
>> lib/find_bit.c:92:44: error: 'offset' undeclared (first use in this function)
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
                                               ^~~~~~
>> lib/find_bit.c:93:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +/size +92 lib/find_bit.c

    86	
    87	#if !defined(find_next_and_bit)
    88	unsigned long find_next_and_bit(const unsigned long *addr1,
    89			const unsigned long *addr2, unsigned long nbits,
    90			unsigned long start)
    91	{
  > 92		return _find_next_bit(addr1, addr2, size, offset, ~0UL);
  > 93	}
    94	EXPORT_SYMBOL(find_next_and_bit);
    95	#endif
    96	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26064 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-26 12:12 ` kbuild test robot
@ 2017-10-26 12:12   ` kbuild test robot
  0 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2017-10-26 12:12 UTC (permalink / raw)
  To: Clement Courbet
  Cc: kbuild-all, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
	Matthew Wilcox, Yury Norov, Ingo Molnar, linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1751 bytes --]

Hi Clement,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Clement-Courbet/lib-optimize-cpumask_next_and/20171026-184850
config: i386-randconfig-x001-201743 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   lib/find_bit.c: In function 'find_next_and_bit':
>> lib/find_bit.c:92:38: error: 'size' undeclared (first use in this function)
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
                                         ^~~~
   lib/find_bit.c:92:38: note: each undeclared identifier is reported only once for each function it appears in
>> lib/find_bit.c:92:44: error: 'offset' undeclared (first use in this function)
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
                                               ^~~~~~
>> lib/find_bit.c:93:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +/size +92 lib/find_bit.c

    86	
    87	#if !defined(find_next_and_bit)
    88	unsigned long find_next_and_bit(const unsigned long *addr1,
    89			const unsigned long *addr2, unsigned long nbits,
    90			unsigned long start)
    91	{
  > 92		return _find_next_bit(addr1, addr2, size, offset, ~0UL);
  > 93	}
    94	EXPORT_SYMBOL(find_next_and_bit);
    95	#endif
    96	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26064 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-24 10:51 [PATCH v2] " Clement Courbet
                   ` (2 preceding siblings ...)
  2017-10-26 12:12 ` kbuild test robot
@ 2017-10-26 12:22 ` kbuild test robot
  2017-10-26 12:22   ` kbuild test robot
  3 siblings, 1 reply; 12+ messages in thread
From: kbuild test robot @ 2017-10-26 12:22 UTC (permalink / raw)
  Cc: kbuild-all, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
	Matthew Wilcox, Yury Norov, Clement Courbet, Ingo Molnar,
	linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]

Hi Clement,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Clement-Courbet/lib-optimize-cpumask_next_and/20171026-184850
config: m68k-allyesconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   lib/find_bit.c: In function 'find_next_and_bit':
>> lib/find_bit.c:92:2: error: implicit declaration of function '_find_next_bit' [-Werror=implicit-function-declaration]
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
     ^
   lib/find_bit.c:92:38: error: 'size' undeclared (first use in this function)
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
                                         ^
   lib/find_bit.c:92:38: note: each undeclared identifier is reported only once for each function it appears in
   lib/find_bit.c:92:44: error: 'offset' undeclared (first use in this function)
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
                                               ^
   lib/find_bit.c:93:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
   cc1: some warnings being treated as errors

vim +/_find_next_bit +92 lib/find_bit.c

    86	
    87	#if !defined(find_next_and_bit)
    88	unsigned long find_next_and_bit(const unsigned long *addr1,
    89			const unsigned long *addr2, unsigned long nbits,
    90			unsigned long start)
    91	{
  > 92		return _find_next_bit(addr1, addr2, size, offset, ~0UL);
    93	}
    94	EXPORT_SYMBOL(find_next_and_bit);
    95	#endif
    96	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 44325 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] lib: optimize cpumask_next_and()
  2017-10-26 12:22 ` kbuild test robot
@ 2017-10-26 12:22   ` kbuild test robot
  0 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2017-10-26 12:22 UTC (permalink / raw)
  To: Clement Courbet
  Cc: kbuild-all, Arnd Bergmann, Rasmus Villemoes, Andrew Morton,
	Matthew Wilcox, Yury Norov, Ingo Molnar, linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]

Hi Clement,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Clement-Courbet/lib-optimize-cpumask_next_and/20171026-184850
config: m68k-allyesconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   lib/find_bit.c: In function 'find_next_and_bit':
>> lib/find_bit.c:92:2: error: implicit declaration of function '_find_next_bit' [-Werror=implicit-function-declaration]
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
     ^
   lib/find_bit.c:92:38: error: 'size' undeclared (first use in this function)
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
                                         ^
   lib/find_bit.c:92:38: note: each undeclared identifier is reported only once for each function it appears in
   lib/find_bit.c:92:44: error: 'offset' undeclared (first use in this function)
     return _find_next_bit(addr1, addr2, size, offset, ~0UL);
                                               ^
   lib/find_bit.c:93:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^
   cc1: some warnings being treated as errors

vim +/_find_next_bit +92 lib/find_bit.c

    86	
    87	#if !defined(find_next_and_bit)
    88	unsigned long find_next_and_bit(const unsigned long *addr1,
    89			const unsigned long *addr2, unsigned long nbits,
    90			unsigned long start)
    91	{
  > 92		return _find_next_bit(addr1, addr2, size, offset, ~0UL);
    93	}
    94	EXPORT_SYMBOL(find_next_and_bit);
    95	#endif
    96	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 44325 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-28 13:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-26 12:58 [PATCH v3] lib: optimize cpumask_next_and() Alexey Dobriyan
2017-10-26 15:52 ` [PATCH v2] " Clement Courbet
2017-10-28 13:24 ` [PATCH v3] " Yury Norov
  -- strict thread matches above, loose matches on Subject: below --
2017-10-24 10:51 [PATCH v2] " Clement Courbet
2017-10-25  7:28 ` Matthew Wilcox
2017-10-25  7:28   ` Matthew Wilcox
2017-10-25 14:59 ` Yury Norov
2017-10-25 14:59   ` Yury Norov
2017-10-26 12:12 ` kbuild test robot
2017-10-26 12:12   ` kbuild test robot
2017-10-26 12:22 ` kbuild test robot
2017-10-26 12:22   ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).