* [PATCH v3 0/2] bitops: Optimize fns() for improved performance @ 2024-05-01 7:16 Kuan-Wei Chiu 2024-05-01 7:16 ` [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() Kuan-Wei Chiu 2024-05-01 7:16 ` [PATCH v3 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu 0 siblings, 2 replies; 6+ messages in thread From: Kuan-Wei Chiu @ 2024-05-01 7:16 UTC (permalink / raw) To: akpm, yury.norov; +Cc: linux, n26122115, jserv, linux-kernel, Kuan-Wei Chiu Hello, This patch series optimizes the fns() function by avoiding repeated calls to __ffs(). Additionally, tests for fns() have been added in lib/test_bitops.c. Changes in v3: - Move the benchmark test for fns() to lib/test_bitops.c. - Exclude the overhead of random number generation from the benchmark result. - Change the output to print only a total gross instead of each n in the benchmark result. - Update the commit message in the second patch. Changes in v2: - Add benchmark test for fns() in lib/find_bit_benchmark.c. - Change the loop in fns() by counting down from n to 0. - Add find_bit benchmark result for find_nth_bit in commit message. Link to v2: https://lkml.kernel.org/20240430054912.124237-1-visitorckw@gmail.com Link to v1: https://lkml.kernel.org/20240426035152.956702-1-visitorckw@gmail.com Kuan-Wei Chiu (2): lib/test_bitops: Add benchmark test for fns() bitops: Optimize fns() for improved performance include/linux/bitops.h | 12 +++--------- lib/test_bitops.c | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 9 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() 2024-05-01 7:16 [PATCH v3 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu @ 2024-05-01 7:16 ` Kuan-Wei Chiu 2024-05-05 13:03 ` David Laight 2024-05-01 7:16 ` [PATCH v3 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu 1 sibling, 1 reply; 6+ messages in thread From: Kuan-Wei Chiu @ 2024-05-01 7:16 UTC (permalink / raw) To: akpm, yury.norov; +Cc: linux, n26122115, jserv, linux-kernel, Kuan-Wei Chiu Introduce a benchmark test for the fns(). It measures the total time taken by fns() to process 1,000,000 test data generated using get_random_long() for each n in the range [0, BITS_PER_LONG). example: test_bitops: fns: 5876762553 ns, 64000000 iterations Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> --- Changes in v3: - Move the benchmark test for fns() to lib/test_bitops.c. - Exclude the overhead of random number generation from the benchmark result. - Change the output to print only a total gross instead of each n in the benchmark result. lib/test_bitops.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/lib/test_bitops.c b/lib/test_bitops.c index 3b7bcbee84db..ed939f124417 100644 --- a/lib/test_bitops.c +++ b/lib/test_bitops.c @@ -50,6 +50,26 @@ static unsigned long order_comb_long[][2] = { }; #endif +static unsigned long buf[1000000]; + +static int __init test_fns(void) +{ + unsigned int i, n; + ktime_t time; + + get_random_bytes(buf, sizeof(buf)); + time = ktime_get(); + + for (n = 0; n < BITS_PER_LONG; n++) + for (i = 0; i < 1000000; i++) + fns(buf[i], n); + + time = ktime_get() - time; + pr_err("fns: %18llu ns, %6d iterations\n", time, BITS_PER_LONG * 1000000); + + return 0; +} + static int __init test_bitops_startup(void) { int i, bit_set; @@ -94,6 +114,8 @@ static int __init test_bitops_startup(void) if (bit_set != BITOPS_LAST) pr_err("ERROR: FOUND SET BIT %d\n", bit_set); + test_fns(); + pr_info("Completed bitops test\n"); return 0; -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() 2024-05-01 7:16 ` [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() Kuan-Wei Chiu @ 2024-05-05 13:03 ` David Laight 2024-05-05 17:27 ` Kuan-Wei Chiu 0 siblings, 1 reply; 6+ messages in thread From: David Laight @ 2024-05-05 13:03 UTC (permalink / raw) To: 'Kuan-Wei Chiu', akpm@linux-foundation.org, yury.norov@gmail.com Cc: linux@rasmusvillemoes.dk, n26122115@gs.ncku.edu.tw, jserv@ccns.ncku.edu.tw, linux-kernel@vger.kernel.org From: Kuan-Wei Chiu > Sent: 01 May 2024 08:17 > > Introduce a benchmark test for the fns(). It measures the total time > taken by fns() to process 1,000,000 test data generated using > get_random_long() for each n in the range [0, BITS_PER_LONG). > > example: > test_bitops: fns: 5876762553 ns, 64000000 iterations Great benchmark.... The compiler almost certainly optimises it all away. Assigning the result of fns() to a file scope (global) volatile int should stop that happening. And a real test would actually check the result - just in case someone does something silly. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() 2024-05-05 13:03 ` David Laight @ 2024-05-05 17:27 ` Kuan-Wei Chiu 2024-05-05 17:29 ` Kuan-Wei Chiu 0 siblings, 1 reply; 6+ messages in thread From: Kuan-Wei Chiu @ 2024-05-05 17:27 UTC (permalink / raw) To: David Laight Cc: akpm@linux-foundation.org, yury.norov@gmail.com, linux@rasmusvillemoes.dk, n26122115@gs.ncku.edu.tw, jserv@ccns.ncku.edu.tw, linux-kernel@vger.kernel.org Hi David, On Sun, May 05, 2024 at 01:03:23PM +0000, David Laight wrote: > From: Kuan-Wei Chiu > > Sent: 01 May 2024 08:17 > > > > Introduce a benchmark test for the fns(). It measures the total time > > taken by fns() to process 1,000,000 test data generated using > > get_random_long() for each n in the range [0, BITS_PER_LONG). > > > > example: > > test_bitops: fns: 5876762553 ns, 64000000 iterations > > Great benchmark.... > > The compiler almost certainly optimises it all away. > > Assigning the result of fns() to a file scope (global) volatile int > should stop that happening. > Thank you for your review. There is an updated v5 of this patch [1], which has already been accepted and included in Yury's bitmap-for-next branch of the bitmap tree. In the v5 patch, we have addressed the issue you mentioned regarding the use of volatile variables to avoid compiler optimizations. > And a real test would actually check the result - just in case > someone does something silly. > The fns() function is mainly a helper for find_nth_bit(), so its accuracy should have been checked in find_nth_bit()'s tests. If you want unit tests for fns() here too, that sounds good to me, but it would likely be a separate patch. I'm happy to do it if you'd like. Regards, Kuan-Wei > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() 2024-05-05 17:27 ` Kuan-Wei Chiu @ 2024-05-05 17:29 ` Kuan-Wei Chiu 0 siblings, 0 replies; 6+ messages in thread From: Kuan-Wei Chiu @ 2024-05-05 17:29 UTC (permalink / raw) To: David Laight Cc: akpm@linux-foundation.org, yury.norov@gmail.com, linux@rasmusvillemoes.dk, n26122115@gs.ncku.edu.tw, jserv@ccns.ncku.edu.tw, linux-kernel@vger.kernel.org On Mon, May 06, 2024 at 01:27:25AM +0800, Kuan-Wei Chiu wrote: > Hi David, > > On Sun, May 05, 2024 at 01:03:23PM +0000, David Laight wrote: > > From: Kuan-Wei Chiu > > > Sent: 01 May 2024 08:17 > > > > > > Introduce a benchmark test for the fns(). It measures the total time > > > taken by fns() to process 1,000,000 test data generated using > > > get_random_long() for each n in the range [0, BITS_PER_LONG). > > > > > > example: > > > test_bitops: fns: 5876762553 ns, 64000000 iterations > > > > Great benchmark.... > > > > The compiler almost certainly optimises it all away. > > > > Assigning the result of fns() to a file scope (global) volatile int > > should stop that happening. > > > Thank you for your review. There is an updated v5 of this patch [1], > which has already been accepted and included in Yury's bitmap-for-next > branch of the bitmap tree. In the v5 patch, we have addressed the issue > you mentioned regarding the use of volatile variables to avoid compiler > optimizations. > [1]: https://lore.kernel.org/lkml/20240502092443.6845-2-visitorckw@gmail.com/ > > And a real test would actually check the result - just in case > > someone does something silly. > > > The fns() function is mainly a helper for find_nth_bit(), so its > accuracy should have been checked in find_nth_bit()'s tests. If you > want unit tests for fns() here too, that sounds good to me, but it > would likely be a separate patch. I'm happy to do it if you'd like. > > Regards, > Kuan-Wei > > > David > > > > - > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > > Registration No: 1397386 (Wales) > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 2/2] bitops: Optimize fns() for improved performance 2024-05-01 7:16 [PATCH v3 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu 2024-05-01 7:16 ` [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() Kuan-Wei Chiu @ 2024-05-01 7:16 ` Kuan-Wei Chiu 1 sibling, 0 replies; 6+ messages in thread From: Kuan-Wei Chiu @ 2024-05-01 7:16 UTC (permalink / raw) To: akpm, yury.norov; +Cc: linux, n26122115, jserv, linux-kernel, Kuan-Wei Chiu The current fns() repeatedly uses __ffs() to find the index of the least significant bit and then clears the corresponding bit using __clear_bit(). The method for clearing the least significant bit can be optimized by using word &= word - 1 instead. Typically, the execution time of one __ffs() plus one __clear_bit() is longer than that of a bitwise AND operation and a subtraction. To improve performance, the loop for clearing the least significant bit has been replaced with word &= word - 1, followed by a single __ffs() operation to obtain the answer. This change reduces the number of __ffs() iterations from n to just one, enhancing overall performance. This modification significantly accelerates the fns() function in the test_bitops benchmark, improving its speed by approximately 439 times. Additionally, it enhances the performance of find_nth_bit() in the find_bit benchmark by approximately 26%. Before: test_bitops: fns: 5876762553 ns, 64000000 iterations find_nth_bit: 4254313 ns, 16525 iterations After: test_bitops: fns: 13388431 ns, 64000000 iterations find_nth_bit: 3362863 ns, 16501 iterations Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> --- Changes in v3: - Add the fns() benchmark result from lib/test_bitops.c to the commit message. - Modify the commit message to display only the total gross instead of each n values in the benchmark result. include/linux/bitops.h | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 2ba557e067fe..57ecef354f47 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -254,16 +254,10 @@ static inline unsigned long __ffs64(u64 word) */ static inline unsigned long fns(unsigned long word, unsigned int n) { - unsigned int bit; + while (word && n--) + word &= word - 1; - while (word) { - bit = __ffs(word); - if (n-- == 0) - return bit; - __clear_bit(bit, &word); - } - - return BITS_PER_LONG; + return word ? __ffs(word) : BITS_PER_LONG; } /** -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-05-05 17:29 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-01 7:16 [PATCH v3 0/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu 2024-05-01 7:16 ` [PATCH v3 1/2] lib/test_bitops: Add benchmark test for fns() Kuan-Wei Chiu 2024-05-05 13:03 ` David Laight 2024-05-05 17:27 ` Kuan-Wei Chiu 2024-05-05 17:29 ` Kuan-Wei Chiu 2024-05-01 7:16 ` [PATCH v3 2/2] bitops: Optimize fns() for improved performance Kuan-Wei Chiu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox