The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH v2 0/2] bitops: Optimize fns() for improved performance
@ 2024-04-30  5:49 Kuan-Wei Chiu
  2024-04-30  5:49 ` [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns() Kuan-Wei Chiu
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Kuan-Wei Chiu @ 2024-04-30  5:49 UTC (permalink / raw)
  To: akpm, yury.norov; +Cc: linux, linux-kernel, Kuan-Wei Chiu

Hello,

This patch series optimizes the fns() function by avoiding repeated
calls to __ffs(). Additionally, tests for fns() have been added in
lib/find_bit_benchmark.c.

Changes in v2:
- Add benchmark test for fns() in lib/find_bit_benchmark.c.
- Change the loop in fns() by counting down from n to 0.
- Add find_bit benchmark result for find_nth_bit in commit message.

Link to v1: https://lkml.kernel.org/20240406235532.613696-1-visitorckw@gmail.com

Kuan-Wei Chiu (2):
  lib/find_bit_benchmark: Add benchmark test for fns()
  bitops: Optimize fns() for improved performance

 include/linux/bitops.h   | 12 +++---------
 lib/find_bit_benchmark.c | 25 +++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 9 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()
  2024-04-30  5:49 [PATCH v2 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
@ 2024-04-30  5:49 ` Kuan-Wei Chiu
  2024-04-30 17:24   ` Yury Norov
  2024-04-30 22:50   ` kernel test robot
  2024-04-30  5:49 ` [PATCH v2 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
  2024-04-30  6:11 ` [PATCH v2 0/2] " Kuan-Wei Chiu
  2 siblings, 2 replies; 8+ messages in thread
From: Kuan-Wei Chiu @ 2024-04-30  5:49 UTC (permalink / raw)
  To: akpm, yury.norov; +Cc: linux, linux-kernel, Kuan-Wei Chiu

Introduce a benchmark test for the fns(). It measures the total time
taken by fns() to process 1,000,000 test data generated using
get_random_long() for each n in the range [0, BITS_PER_LONG].

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 lib/find_bit_benchmark.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
index d3fb09e6eff1..8712eacf3bbd 100644
--- a/lib/find_bit_benchmark.c
+++ b/lib/find_bit_benchmark.c
@@ -146,6 +146,28 @@ static int __init test_find_next_and_bit(const void *bitmap,
 	return 0;
 }
 
+static int __init test_fns(void)
+{
+	const unsigned long round = 1000000;
+	s64 time[BITS_PER_LONG + 1];
+	unsigned int i, n;
+	volatile unsigned long x, y;
+
+	for (n = 0; n <= BITS_PER_LONG; n++) {
+		time[n] = ktime_get();
+		for (i = 0; i < round; i++) {
+			x = get_random_long();
+			y = fns(x, n);
+		}
+		time[n] = ktime_get() - time[n];
+	}
+
+	for (n = 0; n <= BITS_PER_LONG; n++)
+		pr_err("fns: n = %2u: %12lld ns\n", n, time[n]);
+
+	return 0;
+}
+
 static int __init find_bit_test(void)
 {
 	unsigned long nbits = BITMAP_LEN / SPARSE;
@@ -186,6 +208,9 @@ static int __init find_bit_test(void)
 	test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN);
 	test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
 
+	pr_err("\nStart testing for fns()\n");
+	test_fns();
+
 	/*
 	 * Everything is OK. Return error just to let user run benchmark
 	 * again without annoying rmmod.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] bitops: Optimize fns() for improved performance
  2024-04-30  5:49 [PATCH v2 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
  2024-04-30  5:49 ` [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns() Kuan-Wei Chiu
@ 2024-04-30  5:49 ` Kuan-Wei Chiu
  2024-04-30 17:36   ` Yury Norov
  2024-04-30  6:11 ` [PATCH v2 0/2] " Kuan-Wei Chiu
  2 siblings, 1 reply; 8+ messages in thread
From: Kuan-Wei Chiu @ 2024-04-30  5:49 UTC (permalink / raw)
  To: akpm, yury.norov; +Cc: linux, linux-kernel, Kuan-Wei Chiu

The current fns() repeatedly uses __ffs() to find the index of the
least significant bit and then clears the corresponding bit using
__clear_bit(). The method for clearing the least significant bit can be
optimized by using word &= word - 1 instead.

Typically, the execution time of one __ffs() plus one __clear_bit() is
longer than that of a bitwise AND operation and a subtraction. To
improve performance, the loop for clearing the least significant bit
has been replaced with word &= word - 1, followed by a single __ffs()
operation to obtain the answer. This change reduces the number of
__ffs() iterations from n to just one, enhancing overall performance.

This change speeds up the find_nth_bit() in the find_bit benchmark by
approximately ~1.26 times:

Before:
find_nth_bit:                  4254313 ns,  16525 iterations

After:
find_nth_bit:                  3362863 ns,  16501 iterations

The following microbenchmark data, conducted on my x86-64 machine,
shows the execution time (in microseconds) required for 1000000 test
data generated by get_random_u64() and executed by fns() under
different values of n:

+-----+---------------+---------------+
|  n  |   time_old    |   time_new    |
+-----+---------------+---------------+
|  0  |     29194     |     25878     |
|  1  |     25510     |     25497     |
|  2  |     27836     |     25721     |
|  3  |     30140     |     25673     |
|  4  |     32569     |     25426     |
|  5  |     34792     |     25690     |
|  6  |     37117     |     25651     |
|  7  |     39742     |     25383     |
|  8  |     42360     |     25657     |
|  9  |     44672     |     25897     |
| 10  |     47237     |     25819     |
| 11  |     49884     |     26530     |
| 12  |     51864     |     26647     |
| 13  |     54265     |     28915     |
| 14  |     56440     |     28373     |
| 15  |     58839     |     28616     |
| 16  |     62383     |     29128     |
| 17  |     64257     |     30041     |
| 18  |     66805     |     29773     |
| 19  |     69368     |     33203     |
| 20  |     72942     |     33688     |
| 21  |     77006     |     34518     |
| 22  |     80926     |     34298     |
| 23  |     85723     |     35586     |
| 24  |     90324     |     36376     |
| 25  |     95992     |     37465     |
| 26  |    101101     |     37599     |
| 27  |    106520     |     37466     |
| 28  |    113287     |     38163     |
| 29  |    120552     |     38810     |
| 30  |    128040     |     39373     |
| 31  |    135624     |     40500     |
| 32  |    142580     |     40343     |
| 33  |    148915     |     40460     |
| 34  |    154005     |     41294     |
| 35  |    157996     |     41730     |
| 36  |    160806     |     41523     |
| 37  |    162975     |     42088     |
| 38  |    163426     |     41530     |
| 39  |    164872     |     41789     |
| 40  |    164477     |     42505     |
| 41  |    164758     |     41879     |
| 42  |    164182     |     41415     |
| 43  |    164842     |     42119     |
| 44  |    164881     |     42297     |
| 45  |    164870     |     42145     |
| 46  |    164673     |     42066     |
| 47  |    164616     |     42051     |
| 48  |    165055     |     41902     |
| 49  |    164847     |     41862     |
| 50  |    165171     |     41960     |
| 51  |    164851     |     42089     |
| 52  |    164763     |     41717     |
| 53  |    164635     |     42154     |
| 54  |    164757     |     41983     |
| 55  |    165095     |     41419     |
| 56  |    164641     |     42381     |
| 57  |    164601     |     41654     |
| 58  |    164864     |     41834     |
| 59  |    164594     |     41920     |
| 60  |    165207     |     42020     |
| 61  |    165056     |     41185     |
| 62  |    165160     |     41722     |
| 63  |    164923     |     41702     |
| 64  |    164777     |     41880     |
+-----+---------------+---------------+

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---

Changes in v2:
- Change the loop in fns() by counting down from n to 0.
- Add find_bit benchmark result for find_nth_bit in commit message.

 include/linux/bitops.h | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2ba557e067fe..57ecef354f47 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -254,16 +254,10 @@ static inline unsigned long __ffs64(u64 word)
  */
 static inline unsigned long fns(unsigned long word, unsigned int n)
 {
-	unsigned int bit;
+	while (word && n--)
+		word &= word - 1;
 
-	while (word) {
-		bit = __ffs(word);
-		if (n-- == 0)
-			return bit;
-		__clear_bit(bit, &word);
-	}
-
-	return BITS_PER_LONG;
+	return word ? __ffs(word) : BITS_PER_LONG;
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] bitops: Optimize fns() for improved performance
  2024-04-30  5:49 [PATCH v2 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
  2024-04-30  5:49 ` [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns() Kuan-Wei Chiu
  2024-04-30  5:49 ` [PATCH v2 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
@ 2024-04-30  6:11 ` Kuan-Wei Chiu
  2 siblings, 0 replies; 8+ messages in thread
From: Kuan-Wei Chiu @ 2024-04-30  6:11 UTC (permalink / raw)
  To: akpm, yury.norov; +Cc: linux, linux-kernel

On Tue, Apr 30, 2024 at 01:49:10PM +0800, Kuan-Wei Chiu wrote:
> Hello,
> 
> This patch series optimizes the fns() function by avoiding repeated
> calls to __ffs(). Additionally, tests for fns() have been added in
> lib/find_bit_benchmark.c.
> 
> Changes in v2:
> - Add benchmark test for fns() in lib/find_bit_benchmark.c.
> - Change the loop in fns() by counting down from n to 0.
> - Add find_bit benchmark result for find_nth_bit in commit message.
> 
> Link to v1: https://lkml.kernel.org/20240406235532.613696-1-visitorckw@gmail.com

Sorry for pasting the wrong link, the link to v1 should be:
https://lkml.kernel.org/20240426035152.956702-1-visitorckw@gmail.com

Regards,
Kuan-Wei
> 
> Kuan-Wei Chiu (2):
>   lib/find_bit_benchmark: Add benchmark test for fns()
>   bitops: Optimize fns() for improved performance
> 
>  include/linux/bitops.h   | 12 +++---------
>  lib/find_bit_benchmark.c | 25 +++++++++++++++++++++++++
>  2 files changed, 28 insertions(+), 9 deletions(-)
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()
  2024-04-30  5:49 ` [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns() Kuan-Wei Chiu
@ 2024-04-30 17:24   ` Yury Norov
  2024-05-01  4:50     ` Kuan-Wei Chiu
  2024-04-30 22:50   ` kernel test robot
  1 sibling, 1 reply; 8+ messages in thread
From: Yury Norov @ 2024-04-30 17:24 UTC (permalink / raw)
  To: Kuan-Wei Chiu; +Cc: akpm, linux, linux-kernel

On Tue, Apr 30, 2024 at 01:49:11PM +0800, Kuan-Wei Chiu wrote:
> Introduce a benchmark test for the fns(). It measures the total time
> taken by fns() to process 1,000,000 test data generated using
> get_random_long() for each n in the range [0, BITS_PER_LONG].

Can you also print an example of test output?
 
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
>  lib/find_bit_benchmark.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
> index d3fb09e6eff1..8712eacf3bbd 100644
> --- a/lib/find_bit_benchmark.c
> +++ b/lib/find_bit_benchmark.c
> @@ -146,6 +146,28 @@ static int __init test_find_next_and_bit(const void *bitmap,
>  	return 0;
>  }
>  
> +static int __init test_fns(void)
> +{
> +	const unsigned long round = 1000000;
> +	s64 time[BITS_PER_LONG + 1];
> +	unsigned int i, n;
> +	volatile unsigned long x, y;
> +
> +	for (n = 0; n <= BITS_PER_LONG; n++) {

n == BITS_PER_LONG is an error. Testing error case together with
normal cases is even worse error because it fools readers.

> +		time[n] = ktime_get();
> +		for (i = 0; i < round; i++) {
> +			x = get_random_long();
> +			y = fns(x, n);
> +		}

Here you count fns() + get_random_long() time. For your microbench
purposes it would be better exclude a random number generation
overhead.

> +		time[n] = ktime_get() - time[n];
> +	}
> +
> +	for (n = 0; n <= BITS_PER_LONG; n++)
> +		pr_err("fns: n = %2u: %12lld ns\n", n, time[n]);

Nah, not like that. Each test in there prints one line in the
report. Let's keep it that way for test_fns() too. Unless we have
a strong evidence that fns() for a particular input is worth to be
tracked separately, let's just print a total gross?

> +
> +	return 0;
> +}

I'd suggest to modify it like:

        static unsigned long buf[1000000];

        static int __init test_fns(void)
        {
                get_random_bytes(buf, ARRAY_SIZE(buf));
                time = ktime_get();

                for (n = 0; n < BITS_PER_LONG; n++)
                        for (i = 0; i < 1000000; i++)
                                fns(buf[i], n);

                time = ktime_get() - time;
                pr_err(...);
        }

>  static int __init find_bit_test(void)
>  {
>  	unsigned long nbits = BITMAP_LEN / SPARSE;
> @@ -186,6 +208,9 @@ static int __init find_bit_test(void)
>  	test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN);
>  	test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
>  
> +	pr_err("\nStart testing for fns()\n");
> +	test_fns();

There are 2 sections in the test - one for regular, and another for
sparse data. Adding a new section for a just one function doesn't look
like a good idea.

Even more, the fns() is already tested here. Maybe test_bitops is a
better place for this test?

> +
>  	/*
>  	 * Everything is OK. Return error just to let user run benchmark
>  	 * again without annoying rmmod.
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] bitops: Optimize fns() for improved performance
  2024-04-30  5:49 ` [PATCH v2 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
@ 2024-04-30 17:36   ` Yury Norov
  0 siblings, 0 replies; 8+ messages in thread
From: Yury Norov @ 2024-04-30 17:36 UTC (permalink / raw)
  To: Kuan-Wei Chiu; +Cc: akpm, linux, linux-kernel

On Tue, Apr 30, 2024 at 01:49:12PM +0800, Kuan-Wei Chiu wrote:
> The current fns() repeatedly uses __ffs() to find the index of the
> least significant bit and then clears the corresponding bit using
> __clear_bit(). The method for clearing the least significant bit can be
> optimized by using word &= word - 1 instead.
> 
> Typically, the execution time of one __ffs() plus one __clear_bit() is
> longer than that of a bitwise AND operation and a subtraction. To
> improve performance, the loop for clearing the least significant bit
> has been replaced with word &= word - 1, followed by a single __ffs()
> operation to obtain the answer. This change reduces the number of
> __ffs() iterations from n to just one, enhancing overall performance.
> 
> This change speeds up the find_nth_bit() in the find_bit benchmark by
> approximately ~1.26 times:

26% sounds better.

> Before:
> find_nth_bit:                  4254313 ns,  16525 iterations
> 
> After:
> find_nth_bit:                  3362863 ns,  16501 iterations
> 
> The following microbenchmark data, conducted on my x86-64 machine,
> shows the execution time (in microseconds) required for 1000000 test
> data generated by get_random_u64() and executed by fns() under
> different values of n:
> 
> +-----+---------------+---------------+
> |  n  |   time_old    |   time_new    |
> +-----+---------------+---------------+
> |  0  |     29194     |     25878     |
> |  1  |     25510     |     25497     |
> |  2  |     27836     |     25721     |
> |  3  |     30140     |     25673     |
> |  4  |     32569     |     25426     |
> |  5  |     34792     |     25690     |
> |  6  |     37117     |     25651     |
> |  7  |     39742     |     25383     |
> |  8  |     42360     |     25657     |
> |  9  |     44672     |     25897     |
> | 10  |     47237     |     25819     |
> | 11  |     49884     |     26530     |
> | 12  |     51864     |     26647     |
> | 13  |     54265     |     28915     |
> | 14  |     56440     |     28373     |
> | 15  |     58839     |     28616     |
> | 16  |     62383     |     29128     |
> | 17  |     64257     |     30041     |
> | 18  |     66805     |     29773     |
> | 19  |     69368     |     33203     |
> | 20  |     72942     |     33688     |
> | 21  |     77006     |     34518     |
> | 22  |     80926     |     34298     |
> | 23  |     85723     |     35586     |
> | 24  |     90324     |     36376     |
> | 25  |     95992     |     37465     |
> | 26  |    101101     |     37599     |
> | 27  |    106520     |     37466     |
> | 28  |    113287     |     38163     |
> | 29  |    120552     |     38810     |
> | 30  |    128040     |     39373     |
> | 31  |    135624     |     40500     |
> | 32  |    142580     |     40343     |
> | 33  |    148915     |     40460     |
> | 34  |    154005     |     41294     |
> | 35  |    157996     |     41730     |
> | 36  |    160806     |     41523     |
> | 37  |    162975     |     42088     |
> | 38  |    163426     |     41530     |
> | 39  |    164872     |     41789     |
> | 40  |    164477     |     42505     |
> | 41  |    164758     |     41879     |
> | 42  |    164182     |     41415     |
> | 43  |    164842     |     42119     |
> | 44  |    164881     |     42297     |
> | 45  |    164870     |     42145     |
> | 46  |    164673     |     42066     |
> | 47  |    164616     |     42051     |
> | 48  |    165055     |     41902     |
> | 49  |    164847     |     41862     |
> | 50  |    165171     |     41960     |
> | 51  |    164851     |     42089     |
> | 52  |    164763     |     41717     |
> | 53  |    164635     |     42154     |
> | 54  |    164757     |     41983     |
> | 55  |    165095     |     41419     |
> | 56  |    164641     |     42381     |
> | 57  |    164601     |     41654     |
> | 58  |    164864     |     41834     |
> | 59  |    164594     |     41920     |
> | 60  |    165207     |     42020     |
> | 61  |    165056     |     41185     |
> | 62  |    165160     |     41722     |
> | 63  |    164923     |     41702     |
> | 64  |    164777     |     41880     |
> +-----+---------------+---------------+

Please, just a total gross in the commit message.
 
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
> 
> Changes in v2:
> - Change the loop in fns() by counting down from n to 0.
> - Add find_bit benchmark result for find_nth_bit in commit message.
> 
>  include/linux/bitops.h | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 2ba557e067fe..57ecef354f47 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -254,16 +254,10 @@ static inline unsigned long __ffs64(u64 word)
>   */
>  static inline unsigned long fns(unsigned long word, unsigned int n)
>  {
> -	unsigned int bit;
> +	while (word && n--)
> +		word &= word - 1;
>  
> -	while (word) {
> -		bit = __ffs(word);
> -		if (n-- == 0)
> -			return bit;
> -		__clear_bit(bit, &word);
> -	}
> -
> -	return BITS_PER_LONG;
> +	return word ? __ffs(word) : BITS_PER_LONG;
>  }
>  
>  /**
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()
  2024-04-30  5:49 ` [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns() Kuan-Wei Chiu
  2024-04-30 17:24   ` Yury Norov
@ 2024-04-30 22:50   ` kernel test robot
  1 sibling, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-04-30 22:50 UTC (permalink / raw)
  To: Kuan-Wei Chiu, akpm, yury.norov
  Cc: oe-kbuild-all, linux, linux-kernel, Kuan-Wei Chiu

Hi Kuan-Wei,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.9-rc6 next-20240430]
[cannot apply to akpm-mm/mm-everything akpm-mm/mm-nonmm-unstable]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Kuan-Wei-Chiu/lib-find_bit_benchmark-Add-benchmark-test-for-fns/20240430-144241
base:   linus/master
patch link:    https://lore.kernel.org/r/20240430054912.124237-2-visitorckw%40gmail.com
patch subject: [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()
config: i386-buildonly-randconfig-004-20240501 (https://download.01.org/0day-ci/archive/20240501/202405010642.tHmTpd1i-lkp@intel.com/config)
compiler: gcc-11 (Ubuntu 11.4.0-4ubuntu1) 11.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240501/202405010642.tHmTpd1i-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405010642.tHmTpd1i-lkp@intel.com/

All warnings (new ones prefixed by >>):

   lib/find_bit_benchmark.c: In function 'test_fns':
>> lib/find_bit_benchmark.c:154:35: warning: variable 'y' set but not used [-Wunused-but-set-variable]
     154 |         volatile unsigned long x, y;
         |                                   ^


vim +/y +154 lib/find_bit_benchmark.c

   148	
   149	static int __init test_fns(void)
   150	{
   151		const unsigned long round = 1000000;
   152		s64 time[BITS_PER_LONG + 1];
   153		unsigned int i, n;
 > 154		volatile unsigned long x, y;
   155	
   156		for (n = 0; n <= BITS_PER_LONG; n++) {
   157			time[n] = ktime_get();
   158			for (i = 0; i < round; i++) {
   159				x = get_random_long();
   160				y = fns(x, n);
   161			}
   162			time[n] = ktime_get() - time[n];
   163		}
   164	
   165		for (n = 0; n <= BITS_PER_LONG; n++)
   166			pr_err("fns: n = %2u: %12lld ns\n", n, time[n]);
   167	
   168		return 0;
   169	}
   170	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()
  2024-04-30 17:24   ` Yury Norov
@ 2024-05-01  4:50     ` Kuan-Wei Chiu
  0 siblings, 0 replies; 8+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-01  4:50 UTC (permalink / raw)
  To: Yury Norov; +Cc: akpm, linux, linux-kernel

On Tue, Apr 30, 2024 at 10:24:03AM -0700, Yury Norov wrote:
> On Tue, Apr 30, 2024 at 01:49:11PM +0800, Kuan-Wei Chiu wrote:
> > Introduce a benchmark test for the fns(). It measures the total time
> > taken by fns() to process 1,000,000 test data generated using
> > get_random_long() for each n in the range [0, BITS_PER_LONG].
> 
> Can you also print an example of test output?
>  
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> >  lib/find_bit_benchmark.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
> > index d3fb09e6eff1..8712eacf3bbd 100644
> > --- a/lib/find_bit_benchmark.c
> > +++ b/lib/find_bit_benchmark.c
> > @@ -146,6 +146,28 @@ static int __init test_find_next_and_bit(const void *bitmap,
> >  	return 0;
> >  }
> >  
> > +static int __init test_fns(void)
> > +{
> > +	const unsigned long round = 1000000;
> > +	s64 time[BITS_PER_LONG + 1];
> > +	unsigned int i, n;
> > +	volatile unsigned long x, y;
> > +
> > +	for (n = 0; n <= BITS_PER_LONG; n++) {
> 
> n == BITS_PER_LONG is an error. Testing error case together with
> normal cases is even worse error because it fools readers.
>
My initial intention was to add a test for fns() always returning
BITS_PER_LONG. However, I agree that this is not a good idea and may
confuse readers.

> > +		time[n] = ktime_get();
> > +		for (i = 0; i < round; i++) {
> > +			x = get_random_long();
> > +			y = fns(x, n);
> > +		}
> 
> Here you count fns() + get_random_long() time. For your microbench
> purposes it would be better exclude a random number generation
> overhead.
> 
> > +		time[n] = ktime_get() - time[n];
> > +	}
> > +
> > +	for (n = 0; n <= BITS_PER_LONG; n++)
> > +		pr_err("fns: n = %2u: %12lld ns\n", n, time[n]);
> 
> Nah, not like that. Each test in there prints one line in the
> report. Let's keep it that way for test_fns() too. Unless we have
> a strong evidence that fns() for a particular input is worth to be
> tracked separately, let's just print a total gross?
> 
> > +
> > +	return 0;
> > +}
> 
> I'd suggest to modify it like:
> 
>         static unsigned long buf[1000000];
> 
>         static int __init test_fns(void)
>         {
>                 get_random_bytes(buf, ARRAY_SIZE(buf));

Instead of ARRAY_SIZE(buf), it should be sizeof(buf).

>                 time = ktime_get();
> 
>                 for (n = 0; n < BITS_PER_LONG; n++)
>                         for (i = 0; i < 1000000; i++)
>                                 fns(buf[i], n);
> 
>                 time = ktime_get() - time;
>                 pr_err(...);
>         }
>

That does seem like a better approach. I'll move it to lib/test_bitops
and send a v3 patch series.

Regards,
Kuan-Wei

> >  static int __init find_bit_test(void)
> >  {
> >  	unsigned long nbits = BITMAP_LEN / SPARSE;
> > @@ -186,6 +208,9 @@ static int __init find_bit_test(void)
> >  	test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN);
> >  	test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
> >  
> > +	pr_err("\nStart testing for fns()\n");
> > +	test_fns();
> 
> There are 2 sections in the test - one for regular, and another for
> sparse data. Adding a new section for a just one function doesn't look
> like a good idea.
> 
> Even more, the fns() is already tested here. Maybe test_bitops is a
> better place for this test?
> 
> > +
> >  	/*
> >  	 * Everything is OK. Return error just to let user run benchmark
> >  	 * again without annoying rmmod.
> > -- 
> > 2.34.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-01  4:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-30  5:49 [PATCH v2 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
2024-04-30  5:49 ` [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns() Kuan-Wei Chiu
2024-04-30 17:24   ` Yury Norov
2024-05-01  4:50     ` Kuan-Wei Chiu
2024-04-30 22:50   ` kernel test robot
2024-04-30  5:49 ` [PATCH v2 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
2024-04-30 17:36   ` Yury Norov
2024-04-30  6:11 ` [PATCH v2 0/2] " Kuan-Wei Chiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox