public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] bitops: Optimize fns() for improved performance
@ 2024-05-01  7:16 Kuan-Wei Chiu
  2024-05-01  7:16 ` [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() Kuan-Wei Chiu
  2024-05-01  7:16 ` [PATCH v3 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
  0 siblings, 2 replies; 6+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-01  7:16 UTC (permalink / raw)
  To: akpm, yury.norov; +Cc: linux, n26122115, jserv, linux-kernel, Kuan-Wei Chiu

Hello,

This patch series optimizes the fns() function by avoiding repeated
calls to __ffs(). Additionally, tests for fns() have been added in
lib/test_bitops.c.

Changes in v3:
- Move the benchmark test for fns() to lib/test_bitops.c.
- Exclude the overhead of random number generation from the benchmark
  result.
- Change the output to print only a total gross instead of each n in
  the benchmark result.
- Update the commit message in the second patch.

Changes in v2:
- Add benchmark test for fns() in lib/find_bit_benchmark.c.
- Change the loop in fns() by counting down from n to 0.
- Add find_bit benchmark result for find_nth_bit in commit message.

Link to v2: https://lkml.kernel.org/20240430054912.124237-1-visitorckw@gmail.com
Link to v1: https://lkml.kernel.org/20240426035152.956702-1-visitorckw@gmail.com

Kuan-Wei Chiu (2):
  lib/test_bitops: Add benchmark test for fns()
  bitops: Optimize fns() for improved performance

 include/linux/bitops.h | 12 +++---------
 lib/test_bitops.c      | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+), 9 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns()
  2024-05-01  7:16 [PATCH v3 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
@ 2024-05-01  7:16 ` Kuan-Wei Chiu
  2024-05-05 13:03   ` David Laight
  2024-05-01  7:16 ` [PATCH v3 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
  1 sibling, 1 reply; 6+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-01  7:16 UTC (permalink / raw)
  To: akpm, yury.norov; +Cc: linux, n26122115, jserv, linux-kernel, Kuan-Wei Chiu

Introduce a benchmark test for the fns(). It measures the total time
taken by fns() to process 1,000,000 test data generated using
get_random_long() for each n in the range [0, BITS_PER_LONG).

example:
test_bitops: fns:          5876762553 ns, 64000000 iterations

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---

Changes in v3:
- Move the benchmark test for fns() to lib/test_bitops.c.
- Exclude the overhead of random number generation from the benchmark
  result.
- Change the output to print only a total gross instead of each n in
  the benchmark result.

 lib/test_bitops.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/lib/test_bitops.c b/lib/test_bitops.c
index 3b7bcbee84db..ed939f124417 100644
--- a/lib/test_bitops.c
+++ b/lib/test_bitops.c
@@ -50,6 +50,26 @@ static unsigned long order_comb_long[][2] = {
 };
 #endif
 
+static unsigned long buf[1000000];
+
+static int __init test_fns(void)
+{
+	unsigned int i, n;
+	ktime_t time;
+
+	get_random_bytes(buf, sizeof(buf));
+	time = ktime_get();
+
+	for (n = 0; n < BITS_PER_LONG; n++)
+		for (i = 0; i < 1000000; i++)
+			fns(buf[i], n);
+
+	time = ktime_get() - time;
+	pr_err("fns:  %18llu ns, %6d iterations\n", time, BITS_PER_LONG * 1000000);
+
+	return 0;
+}
+
 static int __init test_bitops_startup(void)
 {
 	int i, bit_set;
@@ -94,6 +114,8 @@ static int __init test_bitops_startup(void)
 	if (bit_set != BITOPS_LAST)
 		pr_err("ERROR: FOUND SET BIT %d\n", bit_set);
 
+	test_fns();
+
 	pr_info("Completed bitops test\n");
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] bitops: Optimize fns() for improved performance
  2024-05-01  7:16 [PATCH v3 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
  2024-05-01  7:16 ` [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() Kuan-Wei Chiu
@ 2024-05-01  7:16 ` Kuan-Wei Chiu
  1 sibling, 0 replies; 6+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-01  7:16 UTC (permalink / raw)
  To: akpm, yury.norov; +Cc: linux, n26122115, jserv, linux-kernel, Kuan-Wei Chiu

The current fns() repeatedly uses __ffs() to find the index of the
least significant bit and then clears the corresponding bit using
__clear_bit(). The method for clearing the least significant bit can be
optimized by using word &= word - 1 instead.

Typically, the execution time of one __ffs() plus one __clear_bit() is
longer than that of a bitwise AND operation and a subtraction. To
improve performance, the loop for clearing the least significant bit
has been replaced with word &= word - 1, followed by a single __ffs()
operation to obtain the answer. This change reduces the number of
__ffs() iterations from n to just one, enhancing overall performance.

This modification significantly accelerates the fns() function in the
test_bitops benchmark, improving its speed by approximately 439 times.
Additionally, it enhances the performance of find_nth_bit() in the
find_bit benchmark by approximately 26%.

Before:
test_bitops: fns:          5876762553 ns, 64000000 iterations
find_nth_bit:                  4254313 ns,  16525 iterations

After:
test_bitops: fns:            13388431 ns, 64000000 iterations
find_nth_bit:                  3362863 ns,  16501 iterations

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---

Changes in v3:
- Add the fns() benchmark result from lib/test_bitops.c to the commit
  message.
- Modify the commit message to display only the total gross instead of
  each n values in the benchmark result.

 include/linux/bitops.h | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 2ba557e067fe..57ecef354f47 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -254,16 +254,10 @@ static inline unsigned long __ffs64(u64 word)
  */
 static inline unsigned long fns(unsigned long word, unsigned int n)
 {
-	unsigned int bit;
+	while (word && n--)
+		word &= word - 1;
 
-	while (word) {
-		bit = __ffs(word);
-		if (n-- == 0)
-			return bit;
-		__clear_bit(bit, &word);
-	}
-
-	return BITS_PER_LONG;
+	return word ? __ffs(word) : BITS_PER_LONG;
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns()
  2024-05-01  7:16 ` [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() Kuan-Wei Chiu
@ 2024-05-05 13:03   ` David Laight
  2024-05-05 17:27     ` Kuan-Wei Chiu
  0 siblings, 1 reply; 6+ messages in thread
From: David Laight @ 2024-05-05 13:03 UTC (permalink / raw)
  To: 'Kuan-Wei Chiu', akpm@linux-foundation.org,
	yury.norov@gmail.com
  Cc: linux@rasmusvillemoes.dk, n26122115@gs.ncku.edu.tw,
	jserv@ccns.ncku.edu.tw, linux-kernel@vger.kernel.org

From: Kuan-Wei Chiu
> Sent: 01 May 2024 08:17
> 
> Introduce a benchmark test for the fns(). It measures the total time
> taken by fns() to process 1,000,000 test data generated using
> get_random_long() for each n in the range [0, BITS_PER_LONG).
> 
> example:
> test_bitops: fns:          5876762553 ns, 64000000 iterations

Great benchmark....

The compiler almost certainly optimises it all away.

Assigning the result of fns() to a file scope (global) volatile int
should stop that happening.

And a real test would actually check the result - just in case
someone does something silly.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns()
  2024-05-05 13:03   ` David Laight
@ 2024-05-05 17:27     ` Kuan-Wei Chiu
  2024-05-05 17:29       ` Kuan-Wei Chiu
  0 siblings, 1 reply; 6+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-05 17:27 UTC (permalink / raw)
  To: David Laight
  Cc: akpm@linux-foundation.org, yury.norov@gmail.com,
	linux@rasmusvillemoes.dk, n26122115@gs.ncku.edu.tw,
	jserv@ccns.ncku.edu.tw, linux-kernel@vger.kernel.org

Hi David,

On Sun, May 05, 2024 at 01:03:23PM +0000, David Laight wrote:
> From: Kuan-Wei Chiu
> > Sent: 01 May 2024 08:17
> > 
> > Introduce a benchmark test for the fns(). It measures the total time
> > taken by fns() to process 1,000,000 test data generated using
> > get_random_long() for each n in the range [0, BITS_PER_LONG).
> > 
> > example:
> > test_bitops: fns:          5876762553 ns, 64000000 iterations
> 
> Great benchmark....
> 
> The compiler almost certainly optimises it all away.
> 
> Assigning the result of fns() to a file scope (global) volatile int
> should stop that happening.
> 
Thank you for your review. There is an updated v5 of this patch [1],
which has already been accepted and included in Yury's bitmap-for-next
branch of the bitmap tree. In the v5 patch, we have addressed the issue
you mentioned regarding the use of volatile variables to avoid compiler
optimizations.

> And a real test would actually check the result - just in case
> someone does something silly.
>
The fns() function is mainly a helper for find_nth_bit(), so its
accuracy should have been checked in find_nth_bit()'s tests. If you
want unit tests for fns() here too, that sounds good to me, but it
would likely be a separate patch. I'm happy to do it if you'd like.

Regards,
Kuan-Wei

> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns()
  2024-05-05 17:27     ` Kuan-Wei Chiu
@ 2024-05-05 17:29       ` Kuan-Wei Chiu
  0 siblings, 0 replies; 6+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-05 17:29 UTC (permalink / raw)
  To: David Laight
  Cc: akpm@linux-foundation.org, yury.norov@gmail.com,
	linux@rasmusvillemoes.dk, n26122115@gs.ncku.edu.tw,
	jserv@ccns.ncku.edu.tw, linux-kernel@vger.kernel.org

On Mon, May 06, 2024 at 01:27:25AM +0800, Kuan-Wei Chiu wrote:
> Hi David,
> 
> On Sun, May 05, 2024 at 01:03:23PM +0000, David Laight wrote:
> > From: Kuan-Wei Chiu
> > > Sent: 01 May 2024 08:17
> > > 
> > > Introduce a benchmark test for the fns(). It measures the total time
> > > taken by fns() to process 1,000,000 test data generated using
> > > get_random_long() for each n in the range [0, BITS_PER_LONG).
> > > 
> > > example:
> > > test_bitops: fns:          5876762553 ns, 64000000 iterations
> > 
> > Great benchmark....
> > 
> > The compiler almost certainly optimises it all away.
> > 
> > Assigning the result of fns() to a file scope (global) volatile int
> > should stop that happening.
> > 
> Thank you for your review. There is an updated v5 of this patch [1],
> which has already been accepted and included in Yury's bitmap-for-next
> branch of the bitmap tree. In the v5 patch, we have addressed the issue
> you mentioned regarding the use of volatile variables to avoid compiler
> optimizations.
>
[1]: https://lore.kernel.org/lkml/20240502092443.6845-2-visitorckw@gmail.com/

> > And a real test would actually check the result - just in case
> > someone does something silly.
> >
> The fns() function is mainly a helper for find_nth_bit(), so its
> accuracy should have been checked in find_nth_bit()'s tests. If you
> want unit tests for fns() here too, that sounds good to me, but it
> would likely be a separate patch. I'm happy to do it if you'd like.
> 
> Regards,
> Kuan-Wei
> 
> > 	David
> > 
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-05 17:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-01  7:16 [PATCH v3 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
2024-05-01  7:16 ` [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() Kuan-Wei Chiu
2024-05-05 13:03   ` David Laight
2024-05-05 17:27     ` Kuan-Wei Chiu
2024-05-05 17:29       ` Kuan-Wei Chiu
2024-05-01  7:16 ` [PATCH v3 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox