* [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off()
@ 2026-06-01 9:42 Yi Sun
2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Yi Sun @ 2026-06-01 9:42 UTC (permalink / raw)
To: ynorov, yury.norov
Cc: yi.sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="y", Size: 1941 bytes --]
Test code has been added to PATCH v2.
No new APIs were introduced.
Testing with the test code showed a performance improvement
of approximately 70%.
Test result(random):
orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
test1 1388885 1154 1203 462923 1308 353 70.7%
test2 1393616 1324 1052 736193 1212 607 42.3%
test3 1391693 1216 1144 735808 1260 583 49%
test4 1393231 1275 1092 742731 1402 529 51.6%
test5 1390731 1260 1103 737231 1274 578 47.6%
Test result(sparse):
orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
test1 4496077 322477 13 2419462 322480 7 46.2%
test2 7514731 322482 23 5785808 322476 17 26.1%
test3 7490692 322493 23 7654423 322483 23 0%
test4 7474500 322469 23 7628230 322483 23 0%
test5 7452692 322481 23 7663116 322478 23 0%
Test result explanation:
Test both random and sparse five times.
@orig_ns/cnt: Original version results.
@new_ns/cnt: Optimized test results.
@orig_average = orig_ns / orig_cnt
@new_average = new_ns / new_cnt
@ratio = (orig_average - new_average) / orig_average
The test results show that the optimized version
improved performance in almost every test.
---
v3: https://lore.kernel.org/all/20260514090607.231387-1-yi.sun@unisoc.com
- Based on Michał Nazarewicz's suggestion,
code optimization was performed on PATCH v1.
v2: https://lore.kernel.org/all/20260514035644.4118050-1-yi.sun@unisoc.com
- Do not introduce find_last_bit_from().
v1: https://lore.kernel.org/all/20260512040659.2992142-1-yi.sun@unisoc.com
Yi Sun (2):
lib: bitmap: reduce the number of goto again in
bitmap_find_next_zero_area_off()
lib/bitmap: add tests for bitmap_find_next_zero_area_off()
lib/bitmap.c | 10 +++++++---
lib/find_bit_benchmark.c | 17 +++++++++++++++++
lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++
3 files changed, 52 insertions(+), 3 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off()
2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun
@ 2026-06-01 9:42 ` Yi Sun
2026-06-08 22:15 ` Yury Norov
2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Yi Sun @ 2026-06-01 9:42 UTC (permalink / raw)
To: ynorov, yury.norov
Cc: yi.sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
Finding a contiguous free region in a highly fragmented
bitmap is not easy and may require many repeated attempts.
Therefore, find_next_bit(map, end, index) is not the optimal choice.
This is because there may be multiple scattered free regions
within the range [index, end) and none of them will meet the length
requirement of @nr.
Instead, it's sufficient to directly find the last bit within
the range [index, end), thus reducing unnecessary "goto again" calls.
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
---
lib/bitmap.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/lib/bitmap.c b/lib/bitmap.c
index b9bfa157e095..7d0fbcbe6973 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -432,7 +432,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
unsigned long align_mask,
unsigned long align_offset)
{
- unsigned long index, end, i;
+ unsigned long index, end, i, index_bits_align, index_idx;
again:
index = find_next_zero_bit(map, size, start);
@@ -442,8 +442,12 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
end = index + nr;
if (end > size)
return end;
- i = find_next_bit(map, end, index);
- if (i < end) {
+
+ index_bits_align = round_down(index, BITS_PER_LONG);
+ index_idx = index / BITS_PER_LONG;
+
+ i = find_last_bit(map + index_idx, end - index_bits_align) + index_bits_align;
+ if (i > index && i < end) {
start = i + 1;
goto again;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off()
2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun
2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun
@ 2026-06-01 9:42 ` Yi Sun
2026-06-08 19:14 ` Yury Norov
2026-06-08 7:44 ` 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 孙毅 (Yi Sun)
2026-06-08 21:54 ` Yury Norov
3 siblings, 1 reply; 10+ messages in thread
From: Yi Sun @ 2026-06-01 9:42 UTC (permalink / raw)
To: ynorov, yury.norov
Cc: yi.sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
Add functional and performance tests
for bitmap_find_next_zero_area_off().
Signed-off-by: Yi Sun <yi.sun@unisoc.com>
---
lib/find_bit_benchmark.c | 17 +++++++++++++++++
lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
index 00d9dc61cd46..37fe76ad322e 100644
--- a/lib/find_bit_benchmark.c
+++ b/lib/find_bit_benchmark.c
@@ -149,6 +149,21 @@ static int __init test_find_next_and_bit(const void *bitmap,
return 0;
}
+static int __init
+test_bitmap_find_next_zero_area_off(unsigned long *bitmap, unsigned long len)
+{
+ unsigned long i, cnt;
+ ktime_t time;
+
+ time = ktime_get();
+ for (cnt = i = 0; i < BITMAP_LEN; cnt++)
+ i = bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN, i, 8, 0, 0) + 1;
+ time = ktime_get() - time;
+ pr_err("bitmap_find_next_zero_area_off: %18llu ns, %6ld iterations\n", time, cnt);
+
+ return 0;
+}
+
static int __init find_bit_test(void)
{
unsigned long nbits = BITMAP_LEN / SPARSE;
@@ -158,6 +173,7 @@ static int __init find_bit_test(void)
get_random_bytes(bitmap, sizeof(bitmap));
get_random_bytes(bitmap2, sizeof(bitmap2));
+ test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN);
test_find_next_bit(bitmap, BITMAP_LEN);
test_find_next_zero_bit(bitmap, BITMAP_LEN);
test_find_last_bit(bitmap, BITMAP_LEN);
@@ -181,6 +197,7 @@ static int __init find_bit_test(void)
__set_bit(get_random_u32_below(BITMAP_LEN), bitmap2);
}
+ test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN);
test_find_next_bit(bitmap, BITMAP_LEN);
test_find_next_zero_bit(bitmap, BITMAP_LEN);
test_find_last_bit(bitmap, BITMAP_LEN);
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 69813c10e6c0..fad2b19760c2 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -234,6 +234,34 @@ static void __init test_find_nth_bit(void)
}
}
+static void __init
+test_bitmap_find_next_zero_area_off(void)
+{
+ int bmap_len = 64 * 3;
+ DECLARE_BITMAP(bmap, bmap_len);
+
+ bitmap_set(bmap, 0, bmap_len);
+
+ bitmap_clear(bmap, 0, 8);
+ __clear_bit(50, bmap);
+ bitmap_clear(bmap, 60, 10);
+ __clear_bit(80, bmap);
+ bitmap_clear(bmap, 100, 10);
+ __clear_bit(120, bmap);
+ bitmap_clear(bmap, 160, 32);
+
+ expect_eq_uint(0,
+ bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 8, 0, 0));
+ expect_eq_uint(60,
+ bitmap_find_next_zero_area_off(bmap, bmap_len, 1, 8, 0, 0));
+ expect_eq_uint(60,
+ bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0));
+ expect_eq_uint(160,
+ bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 32, 0, 0));
+ expect_eq_uint(1,
+ !!(bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 33, 0, 0) > bmap_len));
+}
+
static void __init test_fill_set(void)
{
DECLARE_BITMAP(bmap, 1024);
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off()
2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun
2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun
2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun
@ 2026-06-08 7:44 ` 孙毅 (Yi Sun)
2026-06-08 21:54 ` Yury Norov
3 siblings, 0 replies; 10+ messages in thread
From: 孙毅 (Yi Sun) @ 2026-06-08 7:44 UTC (permalink / raw)
To: ynorov@nvidia.com, yury.norov@gmail.com
Cc: mnazarewicz@gmail.com, akpm@linux-foundation.org,
mina86@mina86.com, akinobu.mita@gmail.com,
linux-kernel@vger.kernel.org, 王科 (Ke Wang)
Hi Yury,
Just a gentle ping.
Is there anything I can do?
> -----邮件原件-----
> 发件人: 孙毅 (Yi Sun) <Yi.Sun@unisoc.com>
> 发送时间: 2026年6月1日 17:43
> 收件人: ynorov@nvidia.com; yury.norov@gmail.com
> 抄送: 孙毅 (Yi Sun) <Yi.Sun@unisoc.com>; mnazarewicz@gmail.com;
> akpm@linux-foundation.org; mina86@mina86.com; akinobu.mita@gmail.com;
> linux-kernel@vger.kernel.org
> 主题: [PATCH v4 0/2] Improve the performance of
> bitmap_find_next_zero_area_off()
>
> Test code has been added to PATCH v2.
> No new APIs were introduced.
>
> Testing with the test code showed a performance improvement
> of approximately 70%.
>
> Test result(random):
> orig_ns orig_cnt orig_average new_ns
> new_cnt new_average ratio
> test1 1388885 1154 1203 462923
> 1308 353 70.7%
> test2 1393616 1324 1052 736193
> 1212 607 42.3%
> test3 1391693 1216 1144 735808
> 1260 583 49%
> test4 1393231 1275 1092 742731
> 1402 529 51.6%
> test5 1390731 1260 1103 737231
> 1274 578 47.6%
>
> Test result(sparse):
> orig_ns orig_cnt orig_average new_ns
> new_cnt new_average ratio
> test1 4496077 322477 13 2419462
> 322480 7 46.2%
> test2 7514731 322482 23 5785808
> 322476 17 26.1%
> test3 7490692 322493 23 7654423
> 322483 23 0%
> test4 7474500 322469 23 7628230
> 322483 23 0%
> test5 7452692 322481 23 7663116
> 322478 23 0%
>
> Test result explanation:
> Test both random and sparse five times.
> @orig_ns/cnt: Original version results.
> @new_ns/cnt: Optimized test results.
> @orig_average = orig_ns / orig_cnt
> @new_average = new_ns / new_cnt
> @ratio = (orig_average - new_average) / orig_average
>
> The test results show that the optimized version
> improved performance in almost every test.
>
> ---
> v3: https://lore.kernel.org/all/20260514090607.231387-1-yi.sun@unisoc.com
> - Based on Michał Nazarewicz's suggestion,
> code optimization was performed on PATCH v1.
>
> v2: https://lore.kernel.org/all/20260514035644.4118050-1-yi.sun@unisoc.com
> - Do not introduce find_last_bit_from().
>
> v1: https://lore.kernel.org/all/20260512040659.2992142-1-yi.sun@unisoc.com
>
>
> Yi Sun (2):
> lib: bitmap: reduce the number of goto again in
> bitmap_find_next_zero_area_off()
> lib/bitmap: add tests for bitmap_find_next_zero_area_off()
>
> lib/bitmap.c | 10 +++++++---
> lib/find_bit_benchmark.c | 17 +++++++++++++++++
> lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++
> 3 files changed, 52 insertions(+), 3 deletions(-)
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off()
2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun
@ 2026-06-08 19:14 ` Yury Norov
2026-06-08 19:24 ` Yury Norov
0 siblings, 1 reply; 10+ messages in thread
From: Yury Norov @ 2026-06-08 19:14 UTC (permalink / raw)
To: Yi Sun; +Cc: yury.norov, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
On Mon, Jun 01, 2026 at 05:42:34PM +0800, Yi Sun wrote:
> Add functional and performance tests
> for bitmap_find_next_zero_area_off().
>
> Signed-off-by: Yi Sun <yi.sun@unisoc.com>
> ---
> lib/find_bit_benchmark.c | 17 +++++++++++++++++
> lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
> index 00d9dc61cd46..37fe76ad322e 100644
> --- a/lib/find_bit_benchmark.c
> +++ b/lib/find_bit_benchmark.c
> @@ -149,6 +149,21 @@ static int __init test_find_next_and_bit(const void *bitmap,
> return 0;
> }
>
> +static int __init
> +test_bitmap_find_next_zero_area_off(unsigned long *bitmap, unsigned long len)
> +{
> + unsigned long i, cnt;
> + ktime_t time;
> +
> + time = ktime_get();
> + for (cnt = i = 0; i < BITMAP_LEN; cnt++)
> + i = bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN, i, 8, 0, 0) + 1;
> + time = ktime_get() - time;
> + pr_err("bitmap_find_next_zero_area_off: %18llu ns, %6ld iterations\n", time, cnt);
> +
> + return 0;
> +}
> +
> static int __init find_bit_test(void)
> {
> unsigned long nbits = BITMAP_LEN / SPARSE;
> @@ -158,6 +173,7 @@ static int __init find_bit_test(void)
> get_random_bytes(bitmap, sizeof(bitmap));
> get_random_bytes(bitmap2, sizeof(bitmap2));
>
> + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN);
> test_find_next_bit(bitmap, BITMAP_LEN);
> test_find_next_zero_bit(bitmap, BITMAP_LEN);
> test_find_last_bit(bitmap, BITMAP_LEN);
> @@ -181,6 +197,7 @@ static int __init find_bit_test(void)
> __set_bit(get_random_u32_below(BITMAP_LEN), bitmap2);
> }
>
> + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN);
> test_find_next_bit(bitmap, BITMAP_LEN);
> test_find_next_zero_bit(bitmap, BITMAP_LEN);
> test_find_last_bit(bitmap, BITMAP_LEN);
> diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
> index 69813c10e6c0..fad2b19760c2 100644
> --- a/lib/test_bitmap.c
> +++ b/lib/test_bitmap.c
> @@ -234,6 +234,34 @@ static void __init test_find_nth_bit(void)
> }
> }
>
> +static void __init
> +test_bitmap_find_next_zero_area_off(void)
warning: ‘test_bitmap_find_next_zero_area_off’ defined but not used [-Wunused-function]
238 | test_bitmap_find_next_zero_area_off(void)
> +{
> + int bmap_len = 64 * 3;
> + DECLARE_BITMAP(bmap, bmap_len);
I believe, everyone can realize that 64*3 == 192, and we don't need a
variable for it.
> + bitmap_set(bmap, 0, bmap_len);
> +
> + bitmap_clear(bmap, 0, 8);
> + __clear_bit(50, bmap);
> + bitmap_clear(bmap, 60, 10);
> + __clear_bit(80, bmap);
> + bitmap_clear(bmap, 100, 10);
> + __clear_bit(120, bmap);
> + bitmap_clear(bmap, 160, 32);
Can you also test a 'dirty' region:
bitmap_clear(60, 18);
__set_bit(69);
// find nothing
bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0));
> +
> + expect_eq_uint(0,
> + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 8, 0, 0));
> + expect_eq_uint(60,
> + bitmap_find_next_zero_area_off(bmap, bmap_len, 1, 8, 0, 0));
> + expect_eq_uint(60,
> + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0));
> + expect_eq_uint(160,
> + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 32, 0, 0));
> + expect_eq_uint(1,
> + !!(bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 33, 0, 0) > bmap_len));
Two last parameters are zero in all cases. Doesn't sound like an
exhaustive testing. Real users provide non-zero alignments, so please
you do.
> +}
> +
> static void __init test_fill_set(void)
> {
> DECLARE_BITMAP(bmap, 1024);
> --
> 2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off()
2026-06-08 19:14 ` Yury Norov
@ 2026-06-08 19:24 ` Yury Norov
0 siblings, 0 replies; 10+ messages in thread
From: Yury Norov @ 2026-06-08 19:24 UTC (permalink / raw)
To: Yury Norov; +Cc: Yi Sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
(Sorry, hit 'send' prematurely with a fat finger)
On Mon, Jun 08, 2026 at 03:14:22PM -0400, Yury Norov wrote:
> On Mon, Jun 01, 2026 at 05:42:34PM +0800, Yi Sun wrote:
> > Add functional and performance tests
> > for bitmap_find_next_zero_area_off().
> >
> > Signed-off-by: Yi Sun <yi.sun@unisoc.com>
> > ---
> > lib/find_bit_benchmark.c | 17 +++++++++++++++++
> > lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++
> > 2 files changed, 45 insertions(+)
> >
> > diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
> > index 00d9dc61cd46..37fe76ad322e 100644
> > --- a/lib/find_bit_benchmark.c
> > +++ b/lib/find_bit_benchmark.c
> > @@ -149,6 +149,21 @@ static int __init test_find_next_and_bit(const void *bitmap,
> > return 0;
> > }
> >
> > +static int __init
> > +test_bitmap_find_next_zero_area_off(unsigned long *bitmap, unsigned long len)
> > +{
> > + unsigned long i, cnt;
> > + ktime_t time;
> > +
> > + time = ktime_get();
> > + for (cnt = i = 0; i < BITMAP_LEN; cnt++)
> > + i = bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN, i, 8, 0, 0) + 1;
> > + time = ktime_get() - time;
> > + pr_err("bitmap_find_next_zero_area_off: %18llu ns, %6ld iterations\n", time, cnt);
> > +
> > + return 0;
> > +}
I asked you at least 3 times to print the exact test output in the
commit message. If you didn't ignore it, you'd see something like:
[ 0.230280] Start testing find_bit() with random-filled bitmap
[ 0.231438] bitmap_find_next_zero_area_off: 352393 ns, 1235 iterations
[ 0.232771] find_next_bit: 586225 ns, 163838 iterations
[ 0.234022] find_next_zero_bit: 631854 ns, 163843 iterations
[ 0.235154] find_last_bit: 489426 ns, 163839 iterations
[ 0.238080] find_nth_bit: 2283610 ns, 16471 iterations
[ 0.239545] find_first_bit: 844551 ns, 16472 iterations
[ 0.248981] find_first_and_bit: 8840955 ns, 41232 iterations
[ 0.250030] find_next_and_bit: 334729 ns, 81913 iterations
[ 0.250728]
[ 0.250728] Start testing find_bit() with sparse bitmap
[ 0.253094] bitmap_find_next_zero_area_off: 1658964 ns, 322485 iterations
[ 0.253851] find_next_bit: 8053 ns, 655 iterations
[ 0.255664] find_next_zero_bit: 1247380 ns, 327026 iterations
[ 0.256292] find_last_bit: 7444 ns, 655 iterations
[ 0.257756] find_nth_bit: 865494 ns, 654 iterations
[ 0.258720] find_first_bit: 284844 ns, 655 iterations
[ 0.259347] find_first_and_bit: 1824 ns, 1 iterations
[ 0.259992] find_next_and_bit: 1763 ns, 1 iterations
And it clearly breaks the test format. Please, when it comes to v5,
double check that you have all reviewers requests addressed.
> > +
> > static int __init find_bit_test(void)
> > {
> > unsigned long nbits = BITMAP_LEN / SPARSE;
> > @@ -158,6 +173,7 @@ static int __init find_bit_test(void)
> > get_random_bytes(bitmap, sizeof(bitmap));
> > get_random_bytes(bitmap2, sizeof(bitmap2));
> >
> > + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN);
> > test_find_next_bit(bitmap, BITMAP_LEN);
> > test_find_next_zero_bit(bitmap, BITMAP_LEN);
> > test_find_last_bit(bitmap, BITMAP_LEN);
> > @@ -181,6 +197,7 @@ static int __init find_bit_test(void)
> > __set_bit(get_random_u32_below(BITMAP_LEN), bitmap2);
> > }
> >
> > + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN);
> > test_find_next_bit(bitmap, BITMAP_LEN);
> > test_find_next_zero_bit(bitmap, BITMAP_LEN);
> > test_find_last_bit(bitmap, BITMAP_LEN);
> > diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
> > index 69813c10e6c0..fad2b19760c2 100644
> > --- a/lib/test_bitmap.c
> > +++ b/lib/test_bitmap.c
> > @@ -234,6 +234,34 @@ static void __init test_find_nth_bit(void)
> > }
> > }
> >
> > +static void __init
> > +test_bitmap_find_next_zero_area_off(void)
>
> warning: ‘test_bitmap_find_next_zero_area_off’ defined but not used [-Wunused-function]
> 238 | test_bitmap_find_next_zero_area_off(void)
>
> > +{
> > + int bmap_len = 64 * 3;
> > + DECLARE_BITMAP(bmap, bmap_len);
>
> I believe, everyone can realize that 64*3 == 192, and we don't need a
> variable for it.
>
> > + bitmap_set(bmap, 0, bmap_len);
> > +
> > + bitmap_clear(bmap, 0, 8);
> > + __clear_bit(50, bmap);
> > + bitmap_clear(bmap, 60, 10);
> > + __clear_bit(80, bmap);
> > + bitmap_clear(bmap, 100, 10);
> > + __clear_bit(120, bmap);
> > + bitmap_clear(bmap, 160, 32);
>
> Can you also test a 'dirty' region:
>
> bitmap_clear(60, 18);
> __set_bit(69);
> // find nothing
> bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0));
>
> > +
> > + expect_eq_uint(0,
> > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 8, 0, 0));
> > + expect_eq_uint(60,
> > + bitmap_find_next_zero_area_off(bmap, bmap_len, 1, 8, 0, 0));
> > + expect_eq_uint(60,
> > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0));
> > + expect_eq_uint(160,
> > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 32, 0, 0));
> > + expect_eq_uint(1,
> > + !!(bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 33, 0, 0) > bmap_len));
>
> Two last parameters are zero in all cases. Doesn't sound like an
> exhaustive testing. Real users provide non-zero alignments, so please
> you do.
>
> > +}
> > +
> > static void __init test_fill_set(void)
> > {
> > DECLARE_BITMAP(bmap, 1024);
> > --
> > 2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off()
2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun
` (2 preceding siblings ...)
2026-06-08 7:44 ` 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 孙毅 (Yi Sun)
@ 2026-06-08 21:54 ` Yury Norov
2026-06-09 1:06 ` Yury Norov
3 siblings, 1 reply; 10+ messages in thread
From: Yury Norov @ 2026-06-08 21:54 UTC (permalink / raw)
To: Yi Sun; +Cc: yury.norov, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote:
> Test code has been added to PATCH v2.
> No new APIs were introduced.
>
> Testing with the test code showed a performance improvement
> of approximately 70%.
No, it's not. Your numbers show approximately 50% improvement for
the dense case, and approximately 2% slowdown for the sparse case.
> Test result(random):
> orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
> test1 1388885 1154 1203 462923 1308 353 70.7%
> test2 1393616 1324 1052 736193 1212 607 42.3%
> test3 1391693 1216 1144 735808 1260 583 49%
> test4 1393231 1275 1092 742731 1402 529 51.6%
> test5 1390731 1260 1103 737231 1274 578 47.6%
>
> Test result(sparse):
> orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
> test1 4496077 322477 13 2419462 322480 7 46.2%
> test2 7514731 322482 23 5785808 322476 17 26.1%
> test3 7490692 322493 23 7654423 322483 23 0%
> test4 7474500 322469 23 7628230 322483 23 0%
> test5 7452692 322481 23 7663116 322478 23 0%
The numbers look quite inconsistent. The first measurements are
significantly faster for almost all experiments. In the 'new sparse'
case the first run is 4 times faster than the others. And the ratio
0% is simply wrong.
Please, run the test on a real hardware, not virtualized. Please
built-in the test, so it's executed at boot time, or make sure you're
not running anything on parallel, like a GUI or networking.
I gave your code a brief test on my qemu, and I have 43% improvement
in the dense case, with p-value 0.001; and -8% for sparse bitmap,
with the p-value 0.044, still significant.
Overall not bad. But if some critical user has actually a sparse bitmap,
he'll be disappointed. There's not that many actual users of the
function. For v5, can you CC those from non-driver part, at least.
(The ARM GIC counts as the non-driver, I believe.)
> Test result explanation:
> Test both random and sparse five times.
> @orig_ns/cnt: Original version results.
> @new_ns/cnt: Optimized test results.
> @orig_average = orig_ns / orig_cnt
> @new_average = new_ns / new_cnt
> @ratio = (orig_average - new_average) / orig_average
>
> The test results show that the optimized version
> improved performance in almost every test.
>
> ---
> v3: https://lore.kernel.org/all/20260514090607.231387-1-yi.sun@unisoc.com
> - Based on Michał Nazarewicz's suggestion,
> code optimization was performed on PATCH v1.
We've got a special tag for it: Suggested-by. If the optimization is
still there, please use the tag. Can you point to that suggestion?
> v2: https://lore.kernel.org/all/20260514035644.4118050-1-yi.sun@unisoc.com
> - Do not introduce find_last_bit_from().
>
> v1: https://lore.kernel.org/all/20260512040659.2992142-1-yi.sun@unisoc.com
>
>
> Yi Sun (2):
> lib: bitmap: reduce the number of goto again in
> bitmap_find_next_zero_area_off()
> lib/bitmap: add tests for bitmap_find_next_zero_area_off()
The patch order is wrong. You'd introduce the test first, then the
improvement. I want to apply the 1st patch, then run the test, then
apply the 2nd patch, and run the test again to compare. The way you're
doing it now makes me reverting the patches, the useless work.
Thanks,
Yury
>
> lib/bitmap.c | 10 +++++++---
> lib/find_bit_benchmark.c | 17 +++++++++++++++++
> lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++
> 3 files changed, 52 insertions(+), 3 deletions(-)
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off()
2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun
@ 2026-06-08 22:15 ` Yury Norov
0 siblings, 0 replies; 10+ messages in thread
From: Yury Norov @ 2026-06-08 22:15 UTC (permalink / raw)
To: Yi Sun; +Cc: yury.norov, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
On Mon, Jun 01, 2026 at 05:42:33PM +0800, Yi Sun wrote:
> Finding a contiguous free region in a highly fragmented
> bitmap is not easy and may require many repeated attempts.
> Therefore, find_next_bit(map, end, index) is not the optimal choice.
> This is because there may be multiple scattered free regions
> within the range [index, end) and none of them will meet the length
> requirement of @nr.
> Instead, it's sufficient to directly find the last bit within
> the range [index, end), thus reducing unnecessary "goto again" calls.
>
> Signed-off-by: Yi Sun <yi.sun@unisoc.com>
> ---
> lib/bitmap.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/lib/bitmap.c b/lib/bitmap.c
> index b9bfa157e095..7d0fbcbe6973 100644
> --- a/lib/bitmap.c
> +++ b/lib/bitmap.c
> @@ -432,7 +432,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
> unsigned long align_mask,
> unsigned long align_offset)
> {
> - unsigned long index, end, i;
> + unsigned long index, end, i, index_bits_align, index_idx;
> again:
Can you make an extra step, and remove the 'goto again' completely
with the more high-level for(), do-while() or similar?
> index = find_next_zero_bit(map, size, start);
>
> @@ -442,8 +442,12 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
> end = index + nr;
> if (end > size)
> return end;
> - i = find_next_bit(map, end, index);
> - if (i < end) {
> +
> + index_bits_align = round_down(index, BITS_PER_LONG);
> + index_idx = index / BITS_PER_LONG;
Too many indexes here. Can you find some better, meaningful names?
> +
> + i = find_last_bit(map + index_idx, end - index_bits_align) + index_bits_align;
> + if (i > index && i < end) {
> start = i + 1;
> goto again;
> }
I spent a couple cycles trying to understand why it works better,
comparing to find_next_bit(). The thing is that the distance between
first set bit and the beginning of the area is greater than between
the last bit and the beginning, so we make bigger steps traversing the
butmap. Is that right? Can you add a step-by-step explanation in the
commit message for both versions, for clarity?
Still wondering why the 'last_bit' version is slower here. Looking at
the find_bit_benchmark output, the 'last_bit' is generally faster than
the 'next_bit'...
Thanks,
Yury
> --
> 2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off()
2026-06-08 21:54 ` Yury Norov
@ 2026-06-09 1:06 ` Yury Norov
2026-06-09 2:09 ` John Stultz
0 siblings, 1 reply; 10+ messages in thread
From: Yury Norov @ 2026-06-09 1:06 UTC (permalink / raw)
To: Yury Norov
Cc: Yi Sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel,
John Stultz, John Stultz
On Mon, Jun 08, 2026 at 05:54:20PM -0400, Yury Norov wrote:
> On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote:
> > Test code has been added to PATCH v2.
> > No new APIs were introduced.
> >
> > Testing with the test code showed a performance improvement
> > of approximately 70%.
>
> No, it's not. Your numbers show approximately 50% improvement for
> the dense case, and approximately 2% slowdown for the sparse case.
>
> > Test result(random):
> > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
> > test1 1388885 1154 1203 462923 1308 353 70.7%
> > test2 1393616 1324 1052 736193 1212 607 42.3%
> > test3 1391693 1216 1144 735808 1260 583 49%
> > test4 1393231 1275 1092 742731 1402 529 51.6%
> > test5 1390731 1260 1103 737231 1274 578 47.6%
> >
> > Test result(sparse):
> > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
> > test1 4496077 322477 13 2419462 322480 7 46.2%
> > test2 7514731 322482 23 5785808 322476 17 26.1%
> > test3 7490692 322493 23 7654423 322483 23 0%
> > test4 7474500 322469 23 7628230 322483 23 0%
> > test5 7452692 322481 23 7663116 322478 23 0%
>
> The numbers look quite inconsistent. The first measurements are
> significantly faster for almost all experiments. In the 'new sparse'
> case the first run is 4 times faster than the others. And the ratio
> 0% is simply wrong.
>
> Please, run the test on a real hardware, not virtualized. Please
> built-in the test, so it's executed at boot time, or make sure you're
> not running anything on parallel, like a GUI or networking.
>
> I gave your code a brief test on my qemu, and I have 43% improvement
> in the dense case, with p-value 0.001; and -8% for sparse bitmap,
> with the p-value 0.044, still significant.
>
> Overall not bad. But if some critical user has actually a sparse bitmap,
> he'll be disappointed. There's not that many actual users of the
> function. For v5, can you CC those from non-driver part, at least.
>
> (The ARM GIC counts as the non-driver, I believe.)
OK, I traced the cma_alloc(), which calls the bitmap function through
cma_range_alloc(), and the numbers are looking really strong:
Metric Before After Change
Trace span 194.0 ms 87.1 ms -55.1%
Total CMA alloc time 48.46 ms 16.11 ms -66.8%
Avg alloc latency 184.94 us 61.49 us -66.8%
Median alloc latency 73.72 us 20.59 us -72.1%
p90 alloc latency 329.76 us 55.63 us -83.1%
p99 alloc latency 1866.76 us 859.83 us -53.9%
Max alloc latency 4821.91 us 2324.41 us -51.8%
By request size:
Request Before Avg After Avg Change
1 page 79.68 us 34.47 us -56.7%
256 pages 285.50 us 87.30 us -69.4%
I ran it on qemu, but the numbers are so impressive that I believe
they will be reproduced baremetal.
The tracing command is:
sudo trace-cmd record \
-o cma-dmabuf.dat \
-b 65536 \
-e cma:cma_alloc_start \
-e cma:cma_alloc_finish \
-e cma:cma_alloc_busy_retry \
-e cma:cma_release \
-- kselftest/dmabuf-heaps/dmabuf-heap
Can you run it on your side before sending v5, and share your results?
Adding John Stultz, the test author.
Hi John.
This series improves the underlying bitmap_find_next_zero_area_off()
significantly for average bitmap, but shows ~8% slowdown for sparse
bitmaps. With your CMA allocator test, the results are even stronger,
comparing to the synthetic benchmark, and there seemingly are no
drawbacks.
Can you comment on the results and maybe reproduce it on your side?
Are you or anyone aware of any other useful tests for CMA allocator?
How important the sparse bitmap case overall?
Thanks,
Yury
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off()
2026-06-09 1:06 ` Yury Norov
@ 2026-06-09 2:09 ` John Stultz
0 siblings, 0 replies; 10+ messages in thread
From: John Stultz @ 2026-06-09 2:09 UTC (permalink / raw)
To: Yury Norov, T.J. Mercier
Cc: Yi Sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
On Mon, Jun 8, 2026 at 6:06 PM Yury Norov <yury.norov@gmail.com> wrote:
> On Mon, Jun 08, 2026 at 05:54:20PM -0400, Yury Norov wrote:
> > On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote:
> > > Test code has been added to PATCH v2.
> > > No new APIs were introduced.
> > >
> > > Testing with the test code showed a performance improvement
> > > of approximately 70%.
> >
> > No, it's not. Your numbers show approximately 50% improvement for
> > the dense case, and approximately 2% slowdown for the sparse case.
> >
> > > Test result(random):
> > > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
> > > test1 1388885 1154 1203 462923 1308 353 70.7%
> > > test2 1393616 1324 1052 736193 1212 607 42.3%
> > > test3 1391693 1216 1144 735808 1260 583 49%
> > > test4 1393231 1275 1092 742731 1402 529 51.6%
> > > test5 1390731 1260 1103 737231 1274 578 47.6%
> > >
> > > Test result(sparse):
> > > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
> > > test1 4496077 322477 13 2419462 322480 7 46.2%
> > > test2 7514731 322482 23 5785808 322476 17 26.1%
> > > test3 7490692 322493 23 7654423 322483 23 0%
> > > test4 7474500 322469 23 7628230 322483 23 0%
> > > test5 7452692 322481 23 7663116 322478 23 0%
> >
> > The numbers look quite inconsistent. The first measurements are
> > significantly faster for almost all experiments. In the 'new sparse'
> > case the first run is 4 times faster than the others. And the ratio
> > 0% is simply wrong.
> >
> > Please, run the test on a real hardware, not virtualized. Please
> > built-in the test, so it's executed at boot time, or make sure you're
> > not running anything on parallel, like a GUI or networking.
> >
> > I gave your code a brief test on my qemu, and I have 43% improvement
> > in the dense case, with p-value 0.001; and -8% for sparse bitmap,
> > with the p-value 0.044, still significant.
> >
> > Overall not bad. But if some critical user has actually a sparse bitmap,
> > he'll be disappointed. There's not that many actual users of the
> > function. For v5, can you CC those from non-driver part, at least.
> >
> > (The ARM GIC counts as the non-driver, I believe.)
>
> OK, I traced the cma_alloc(), which calls the bitmap function through
> cma_range_alloc(), and the numbers are looking really strong:
>
> Metric Before After Change
> Trace span 194.0 ms 87.1 ms -55.1%
> Total CMA alloc time 48.46 ms 16.11 ms -66.8%
> Avg alloc latency 184.94 us 61.49 us -66.8%
> Median alloc latency 73.72 us 20.59 us -72.1%
> p90 alloc latency 329.76 us 55.63 us -83.1%
> p99 alloc latency 1866.76 us 859.83 us -53.9%
> Max alloc latency 4821.91 us 2324.41 us -51.8%
>
> By request size:
>
> Request Before Avg After Avg Change
> 1 page 79.68 us 34.47 us -56.7%
> 256 pages 285.50 us 87.30 us -69.4%
>
> I ran it on qemu, but the numbers are so impressive that I believe
> they will be reproduced baremetal.
>
> The tracing command is:
>
> sudo trace-cmd record \
> -o cma-dmabuf.dat \
> -b 65536 \
> -e cma:cma_alloc_start \
> -e cma:cma_alloc_finish \
> -e cma:cma_alloc_busy_retry \
> -e cma:cma_release \
> -- kselftest/dmabuf-heaps/dmabuf-heap
>
> Can you run it on your side before sending v5, and share your results?
>
> Adding John Stultz, the test author.
>
> Hi John.
>
> This series improves the underlying bitmap_find_next_zero_area_off()
> significantly for average bitmap, but shows ~8% slowdown for sparse
> bitmaps. With your CMA allocator test, the results are even stronger,
> comparing to the synthetic benchmark, and there seemingly are no
> drawbacks.
>
> Can you comment on the results and maybe reproduce it on your side?
> Are you or anyone aware of any other useful tests for CMA allocator?
> How important the sparse bitmap case overall?
>
Pulling in TJ who has more recent context here.
thanks
-john
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-09 2:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun
2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun
2026-06-08 22:15 ` Yury Norov
2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun
2026-06-08 19:14 ` Yury Norov
2026-06-08 19:24 ` Yury Norov
2026-06-08 7:44 ` 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 孙毅 (Yi Sun)
2026-06-08 21:54 ` Yury Norov
2026-06-09 1:06 ` Yury Norov
2026-06-09 2:09 ` John Stultz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.