* [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off()
@ 2026-06-01 9:42 Yi Sun
2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Yi Sun @ 2026-06-01 9:42 UTC (permalink / raw)
To: ynorov, yury.norov
Cc: yi.sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="y", Size: 1941 bytes --]
Test code has been added to PATCH v2.
No new APIs were introduced.
Testing with the test code showed a performance improvement
of approximately 70%.
Test result(random):
orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
test1 1388885 1154 1203 462923 1308 353 70.7%
test2 1393616 1324 1052 736193 1212 607 42.3%
test3 1391693 1216 1144 735808 1260 583 49%
test4 1393231 1275 1092 742731 1402 529 51.6%
test5 1390731 1260 1103 737231 1274 578 47.6%
Test result(sparse):
orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
test1 4496077 322477 13 2419462 322480 7 46.2%
test2 7514731 322482 23 5785808 322476 17 26.1%
test3 7490692 322493 23 7654423 322483 23 0%
test4 7474500 322469 23 7628230 322483 23 0%
test5 7452692 322481 23 7663116 322478 23 0%
Test result explanation:
Test both random and sparse five times.
@orig_ns/cnt: Original version results.
@new_ns/cnt: Optimized test results.
@orig_average = orig_ns / orig_cnt
@new_average = new_ns / new_cnt
@ratio = (orig_average - new_average) / orig_average
The test results show that the optimized version
improved performance in almost every test.
---
v3: https://lore.kernel.org/all/20260514090607.231387-1-yi.sun@unisoc.com
- Based on Michał Nazarewicz's suggestion,
code optimization was performed on PATCH v1.
v2: https://lore.kernel.org/all/20260514035644.4118050-1-yi.sun@unisoc.com
- Do not introduce find_last_bit_from().
v1: https://lore.kernel.org/all/20260512040659.2992142-1-yi.sun@unisoc.com
Yi Sun (2):
lib: bitmap: reduce the number of goto again in
bitmap_find_next_zero_area_off()
lib/bitmap: add tests for bitmap_find_next_zero_area_off()
lib/bitmap.c | 10 +++++++---
lib/find_bit_benchmark.c | 17 +++++++++++++++++
lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++
3 files changed, 52 insertions(+), 3 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() 2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun @ 2026-06-01 9:42 ` Yi Sun 2026-06-08 22:15 ` Yury Norov 2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Yi Sun @ 2026-06-01 9:42 UTC (permalink / raw) To: ynorov, yury.norov Cc: yi.sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel Finding a contiguous free region in a highly fragmented bitmap is not easy and may require many repeated attempts. Therefore, find_next_bit(map, end, index) is not the optimal choice. This is because there may be multiple scattered free regions within the range [index, end) and none of them will meet the length requirement of @nr. Instead, it's sufficient to directly find the last bit within the range [index, end), thus reducing unnecessary "goto again" calls. Signed-off-by: Yi Sun <yi.sun@unisoc.com> --- lib/bitmap.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/lib/bitmap.c b/lib/bitmap.c index b9bfa157e095..7d0fbcbe6973 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -432,7 +432,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map, unsigned long align_mask, unsigned long align_offset) { - unsigned long index, end, i; + unsigned long index, end, i, index_bits_align, index_idx; again: index = find_next_zero_bit(map, size, start); @@ -442,8 +442,12 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map, end = index + nr; if (end > size) return end; - i = find_next_bit(map, end, index); - if (i < end) { + + index_bits_align = round_down(index, BITS_PER_LONG); + index_idx = index / BITS_PER_LONG; + + i = find_last_bit(map + index_idx, end - index_bits_align) + index_bits_align; + if (i > index && i < end) { start = i + 1; goto again; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() 2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun @ 2026-06-08 22:15 ` Yury Norov 0 siblings, 0 replies; 10+ messages in thread From: Yury Norov @ 2026-06-08 22:15 UTC (permalink / raw) To: Yi Sun; +Cc: yury.norov, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel On Mon, Jun 01, 2026 at 05:42:33PM +0800, Yi Sun wrote: > Finding a contiguous free region in a highly fragmented > bitmap is not easy and may require many repeated attempts. > Therefore, find_next_bit(map, end, index) is not the optimal choice. > This is because there may be multiple scattered free regions > within the range [index, end) and none of them will meet the length > requirement of @nr. > Instead, it's sufficient to directly find the last bit within > the range [index, end), thus reducing unnecessary "goto again" calls. > > Signed-off-by: Yi Sun <yi.sun@unisoc.com> > --- > lib/bitmap.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/lib/bitmap.c b/lib/bitmap.c > index b9bfa157e095..7d0fbcbe6973 100644 > --- a/lib/bitmap.c > +++ b/lib/bitmap.c > @@ -432,7 +432,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map, > unsigned long align_mask, > unsigned long align_offset) > { > - unsigned long index, end, i; > + unsigned long index, end, i, index_bits_align, index_idx; > again: Can you make an extra step, and remove the 'goto again' completely with the more high-level for(), do-while() or similar? > index = find_next_zero_bit(map, size, start); > > @@ -442,8 +442,12 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map, > end = index + nr; > if (end > size) > return end; > - i = find_next_bit(map, end, index); > - if (i < end) { > + > + index_bits_align = round_down(index, BITS_PER_LONG); > + index_idx = index / BITS_PER_LONG; Too many indexes here. Can you find some better, meaningful names? > + > + i = find_last_bit(map + index_idx, end - index_bits_align) + index_bits_align; > + if (i > index && i < end) { > start = i + 1; > goto again; > } I spent a couple cycles trying to understand why it works better, comparing to find_next_bit(). The thing is that the distance between first set bit and the beginning of the area is greater than between the last bit and the beginning, so we make bigger steps traversing the butmap. Is that right? Can you add a step-by-step explanation in the commit message for both versions, for clarity? Still wondering why the 'last_bit' version is slower here. Looking at the find_bit_benchmark output, the 'last_bit' is generally faster than the 'next_bit'... Thanks, Yury > -- > 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() 2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun 2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun @ 2026-06-01 9:42 ` Yi Sun 2026-06-08 19:14 ` Yury Norov 2026-06-08 7:44 ` 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 孙毅 (Yi Sun) 2026-06-08 21:54 ` Yury Norov 3 siblings, 1 reply; 10+ messages in thread From: Yi Sun @ 2026-06-01 9:42 UTC (permalink / raw) To: ynorov, yury.norov Cc: yi.sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel Add functional and performance tests for bitmap_find_next_zero_area_off(). Signed-off-by: Yi Sun <yi.sun@unisoc.com> --- lib/find_bit_benchmark.c | 17 +++++++++++++++++ lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c index 00d9dc61cd46..37fe76ad322e 100644 --- a/lib/find_bit_benchmark.c +++ b/lib/find_bit_benchmark.c @@ -149,6 +149,21 @@ static int __init test_find_next_and_bit(const void *bitmap, return 0; } +static int __init +test_bitmap_find_next_zero_area_off(unsigned long *bitmap, unsigned long len) +{ + unsigned long i, cnt; + ktime_t time; + + time = ktime_get(); + for (cnt = i = 0; i < BITMAP_LEN; cnt++) + i = bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN, i, 8, 0, 0) + 1; + time = ktime_get() - time; + pr_err("bitmap_find_next_zero_area_off: %18llu ns, %6ld iterations\n", time, cnt); + + return 0; +} + static int __init find_bit_test(void) { unsigned long nbits = BITMAP_LEN / SPARSE; @@ -158,6 +173,7 @@ static int __init find_bit_test(void) get_random_bytes(bitmap, sizeof(bitmap)); get_random_bytes(bitmap2, sizeof(bitmap2)); + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN); test_find_next_bit(bitmap, BITMAP_LEN); test_find_next_zero_bit(bitmap, BITMAP_LEN); test_find_last_bit(bitmap, BITMAP_LEN); @@ -181,6 +197,7 @@ static int __init find_bit_test(void) __set_bit(get_random_u32_below(BITMAP_LEN), bitmap2); } + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN); test_find_next_bit(bitmap, BITMAP_LEN); test_find_next_zero_bit(bitmap, BITMAP_LEN); test_find_last_bit(bitmap, BITMAP_LEN); diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c index 69813c10e6c0..fad2b19760c2 100644 --- a/lib/test_bitmap.c +++ b/lib/test_bitmap.c @@ -234,6 +234,34 @@ static void __init test_find_nth_bit(void) } } +static void __init +test_bitmap_find_next_zero_area_off(void) +{ + int bmap_len = 64 * 3; + DECLARE_BITMAP(bmap, bmap_len); + + bitmap_set(bmap, 0, bmap_len); + + bitmap_clear(bmap, 0, 8); + __clear_bit(50, bmap); + bitmap_clear(bmap, 60, 10); + __clear_bit(80, bmap); + bitmap_clear(bmap, 100, 10); + __clear_bit(120, bmap); + bitmap_clear(bmap, 160, 32); + + expect_eq_uint(0, + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 8, 0, 0)); + expect_eq_uint(60, + bitmap_find_next_zero_area_off(bmap, bmap_len, 1, 8, 0, 0)); + expect_eq_uint(60, + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0)); + expect_eq_uint(160, + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 32, 0, 0)); + expect_eq_uint(1, + !!(bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 33, 0, 0) > bmap_len)); +} + static void __init test_fill_set(void) { DECLARE_BITMAP(bmap, 1024); -- 2.34.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() 2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun @ 2026-06-08 19:14 ` Yury Norov 2026-06-08 19:24 ` Yury Norov 0 siblings, 1 reply; 10+ messages in thread From: Yury Norov @ 2026-06-08 19:14 UTC (permalink / raw) To: Yi Sun; +Cc: yury.norov, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel On Mon, Jun 01, 2026 at 05:42:34PM +0800, Yi Sun wrote: > Add functional and performance tests > for bitmap_find_next_zero_area_off(). > > Signed-off-by: Yi Sun <yi.sun@unisoc.com> > --- > lib/find_bit_benchmark.c | 17 +++++++++++++++++ > lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++ > 2 files changed, 45 insertions(+) > > diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c > index 00d9dc61cd46..37fe76ad322e 100644 > --- a/lib/find_bit_benchmark.c > +++ b/lib/find_bit_benchmark.c > @@ -149,6 +149,21 @@ static int __init test_find_next_and_bit(const void *bitmap, > return 0; > } > > +static int __init > +test_bitmap_find_next_zero_area_off(unsigned long *bitmap, unsigned long len) > +{ > + unsigned long i, cnt; > + ktime_t time; > + > + time = ktime_get(); > + for (cnt = i = 0; i < BITMAP_LEN; cnt++) > + i = bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN, i, 8, 0, 0) + 1; > + time = ktime_get() - time; > + pr_err("bitmap_find_next_zero_area_off: %18llu ns, %6ld iterations\n", time, cnt); > + > + return 0; > +} > + > static int __init find_bit_test(void) > { > unsigned long nbits = BITMAP_LEN / SPARSE; > @@ -158,6 +173,7 @@ static int __init find_bit_test(void) > get_random_bytes(bitmap, sizeof(bitmap)); > get_random_bytes(bitmap2, sizeof(bitmap2)); > > + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN); > test_find_next_bit(bitmap, BITMAP_LEN); > test_find_next_zero_bit(bitmap, BITMAP_LEN); > test_find_last_bit(bitmap, BITMAP_LEN); > @@ -181,6 +197,7 @@ static int __init find_bit_test(void) > __set_bit(get_random_u32_below(BITMAP_LEN), bitmap2); > } > > + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN); > test_find_next_bit(bitmap, BITMAP_LEN); > test_find_next_zero_bit(bitmap, BITMAP_LEN); > test_find_last_bit(bitmap, BITMAP_LEN); > diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c > index 69813c10e6c0..fad2b19760c2 100644 > --- a/lib/test_bitmap.c > +++ b/lib/test_bitmap.c > @@ -234,6 +234,34 @@ static void __init test_find_nth_bit(void) > } > } > > +static void __init > +test_bitmap_find_next_zero_area_off(void) warning: ‘test_bitmap_find_next_zero_area_off’ defined but not used [-Wunused-function] 238 | test_bitmap_find_next_zero_area_off(void) > +{ > + int bmap_len = 64 * 3; > + DECLARE_BITMAP(bmap, bmap_len); I believe, everyone can realize that 64*3 == 192, and we don't need a variable for it. > + bitmap_set(bmap, 0, bmap_len); > + > + bitmap_clear(bmap, 0, 8); > + __clear_bit(50, bmap); > + bitmap_clear(bmap, 60, 10); > + __clear_bit(80, bmap); > + bitmap_clear(bmap, 100, 10); > + __clear_bit(120, bmap); > + bitmap_clear(bmap, 160, 32); Can you also test a 'dirty' region: bitmap_clear(60, 18); __set_bit(69); // find nothing bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0)); > + > + expect_eq_uint(0, > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 8, 0, 0)); > + expect_eq_uint(60, > + bitmap_find_next_zero_area_off(bmap, bmap_len, 1, 8, 0, 0)); > + expect_eq_uint(60, > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0)); > + expect_eq_uint(160, > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 32, 0, 0)); > + expect_eq_uint(1, > + !!(bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 33, 0, 0) > bmap_len)); Two last parameters are zero in all cases. Doesn't sound like an exhaustive testing. Real users provide non-zero alignments, so please you do. > +} > + > static void __init test_fill_set(void) > { > DECLARE_BITMAP(bmap, 1024); > -- > 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() 2026-06-08 19:14 ` Yury Norov @ 2026-06-08 19:24 ` Yury Norov 0 siblings, 0 replies; 10+ messages in thread From: Yury Norov @ 2026-06-08 19:24 UTC (permalink / raw) To: Yury Norov; +Cc: Yi Sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel (Sorry, hit 'send' prematurely with a fat finger) On Mon, Jun 08, 2026 at 03:14:22PM -0400, Yury Norov wrote: > On Mon, Jun 01, 2026 at 05:42:34PM +0800, Yi Sun wrote: > > Add functional and performance tests > > for bitmap_find_next_zero_area_off(). > > > > Signed-off-by: Yi Sun <yi.sun@unisoc.com> > > --- > > lib/find_bit_benchmark.c | 17 +++++++++++++++++ > > lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++ > > 2 files changed, 45 insertions(+) > > > > diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c > > index 00d9dc61cd46..37fe76ad322e 100644 > > --- a/lib/find_bit_benchmark.c > > +++ b/lib/find_bit_benchmark.c > > @@ -149,6 +149,21 @@ static int __init test_find_next_and_bit(const void *bitmap, > > return 0; > > } > > > > +static int __init > > +test_bitmap_find_next_zero_area_off(unsigned long *bitmap, unsigned long len) > > +{ > > + unsigned long i, cnt; > > + ktime_t time; > > + > > + time = ktime_get(); > > + for (cnt = i = 0; i < BITMAP_LEN; cnt++) > > + i = bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN, i, 8, 0, 0) + 1; > > + time = ktime_get() - time; > > + pr_err("bitmap_find_next_zero_area_off: %18llu ns, %6ld iterations\n", time, cnt); > > + > > + return 0; > > +} I asked you at least 3 times to print the exact test output in the commit message. If you didn't ignore it, you'd see something like: [ 0.230280] Start testing find_bit() with random-filled bitmap [ 0.231438] bitmap_find_next_zero_area_off: 352393 ns, 1235 iterations [ 0.232771] find_next_bit: 586225 ns, 163838 iterations [ 0.234022] find_next_zero_bit: 631854 ns, 163843 iterations [ 0.235154] find_last_bit: 489426 ns, 163839 iterations [ 0.238080] find_nth_bit: 2283610 ns, 16471 iterations [ 0.239545] find_first_bit: 844551 ns, 16472 iterations [ 0.248981] find_first_and_bit: 8840955 ns, 41232 iterations [ 0.250030] find_next_and_bit: 334729 ns, 81913 iterations [ 0.250728] [ 0.250728] Start testing find_bit() with sparse bitmap [ 0.253094] bitmap_find_next_zero_area_off: 1658964 ns, 322485 iterations [ 0.253851] find_next_bit: 8053 ns, 655 iterations [ 0.255664] find_next_zero_bit: 1247380 ns, 327026 iterations [ 0.256292] find_last_bit: 7444 ns, 655 iterations [ 0.257756] find_nth_bit: 865494 ns, 654 iterations [ 0.258720] find_first_bit: 284844 ns, 655 iterations [ 0.259347] find_first_and_bit: 1824 ns, 1 iterations [ 0.259992] find_next_and_bit: 1763 ns, 1 iterations And it clearly breaks the test format. Please, when it comes to v5, double check that you have all reviewers requests addressed. > > + > > static int __init find_bit_test(void) > > { > > unsigned long nbits = BITMAP_LEN / SPARSE; > > @@ -158,6 +173,7 @@ static int __init find_bit_test(void) > > get_random_bytes(bitmap, sizeof(bitmap)); > > get_random_bytes(bitmap2, sizeof(bitmap2)); > > > > + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN); > > test_find_next_bit(bitmap, BITMAP_LEN); > > test_find_next_zero_bit(bitmap, BITMAP_LEN); > > test_find_last_bit(bitmap, BITMAP_LEN); > > @@ -181,6 +197,7 @@ static int __init find_bit_test(void) > > __set_bit(get_random_u32_below(BITMAP_LEN), bitmap2); > > } > > > > + test_bitmap_find_next_zero_area_off(bitmap, BITMAP_LEN); > > test_find_next_bit(bitmap, BITMAP_LEN); > > test_find_next_zero_bit(bitmap, BITMAP_LEN); > > test_find_last_bit(bitmap, BITMAP_LEN); > > diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c > > index 69813c10e6c0..fad2b19760c2 100644 > > --- a/lib/test_bitmap.c > > +++ b/lib/test_bitmap.c > > @@ -234,6 +234,34 @@ static void __init test_find_nth_bit(void) > > } > > } > > > > +static void __init > > +test_bitmap_find_next_zero_area_off(void) > > warning: ‘test_bitmap_find_next_zero_area_off’ defined but not used [-Wunused-function] > 238 | test_bitmap_find_next_zero_area_off(void) > > > +{ > > + int bmap_len = 64 * 3; > > + DECLARE_BITMAP(bmap, bmap_len); > > I believe, everyone can realize that 64*3 == 192, and we don't need a > variable for it. > > > + bitmap_set(bmap, 0, bmap_len); > > + > > + bitmap_clear(bmap, 0, 8); > > + __clear_bit(50, bmap); > > + bitmap_clear(bmap, 60, 10); > > + __clear_bit(80, bmap); > > + bitmap_clear(bmap, 100, 10); > > + __clear_bit(120, bmap); > > + bitmap_clear(bmap, 160, 32); > > Can you also test a 'dirty' region: > > bitmap_clear(60, 18); > __set_bit(69); > // find nothing > bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0)); > > > + > > + expect_eq_uint(0, > > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 8, 0, 0)); > > + expect_eq_uint(60, > > + bitmap_find_next_zero_area_off(bmap, bmap_len, 1, 8, 0, 0)); > > + expect_eq_uint(60, > > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 10, 0, 0)); > > + expect_eq_uint(160, > > + bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 32, 0, 0)); > > + expect_eq_uint(1, > > + !!(bitmap_find_next_zero_area_off(bmap, bmap_len, 0, 33, 0, 0) > bmap_len)); > > Two last parameters are zero in all cases. Doesn't sound like an > exhaustive testing. Real users provide non-zero alignments, so please > you do. > > > +} > > + > > static void __init test_fill_set(void) > > { > > DECLARE_BITMAP(bmap, 1024); > > -- > > 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun 2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun 2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun @ 2026-06-08 7:44 ` 孙毅 (Yi Sun) 2026-06-08 21:54 ` Yury Norov 3 siblings, 0 replies; 10+ messages in thread From: 孙毅 (Yi Sun) @ 2026-06-08 7:44 UTC (permalink / raw) To: ynorov@nvidia.com, yury.norov@gmail.com Cc: mnazarewicz@gmail.com, akpm@linux-foundation.org, mina86@mina86.com, akinobu.mita@gmail.com, linux-kernel@vger.kernel.org, 王科 (Ke Wang) Hi Yury, Just a gentle ping. Is there anything I can do? > -----邮件原件----- > 发件人: 孙毅 (Yi Sun) <Yi.Sun@unisoc.com> > 发送时间: 2026年6月1日 17:43 > 收件人: ynorov@nvidia.com; yury.norov@gmail.com > 抄送: 孙毅 (Yi Sun) <Yi.Sun@unisoc.com>; mnazarewicz@gmail.com; > akpm@linux-foundation.org; mina86@mina86.com; akinobu.mita@gmail.com; > linux-kernel@vger.kernel.org > 主题: [PATCH v4 0/2] Improve the performance of > bitmap_find_next_zero_area_off() > > Test code has been added to PATCH v2. > No new APIs were introduced. > > Testing with the test code showed a performance improvement > of approximately 70%. > > Test result(random): > orig_ns orig_cnt orig_average new_ns > new_cnt new_average ratio > test1 1388885 1154 1203 462923 > 1308 353 70.7% > test2 1393616 1324 1052 736193 > 1212 607 42.3% > test3 1391693 1216 1144 735808 > 1260 583 49% > test4 1393231 1275 1092 742731 > 1402 529 51.6% > test5 1390731 1260 1103 737231 > 1274 578 47.6% > > Test result(sparse): > orig_ns orig_cnt orig_average new_ns > new_cnt new_average ratio > test1 4496077 322477 13 2419462 > 322480 7 46.2% > test2 7514731 322482 23 5785808 > 322476 17 26.1% > test3 7490692 322493 23 7654423 > 322483 23 0% > test4 7474500 322469 23 7628230 > 322483 23 0% > test5 7452692 322481 23 7663116 > 322478 23 0% > > Test result explanation: > Test both random and sparse five times. > @orig_ns/cnt: Original version results. > @new_ns/cnt: Optimized test results. > @orig_average = orig_ns / orig_cnt > @new_average = new_ns / new_cnt > @ratio = (orig_average - new_average) / orig_average > > The test results show that the optimized version > improved performance in almost every test. > > --- > v3: https://lore.kernel.org/all/20260514090607.231387-1-yi.sun@unisoc.com > - Based on MichaÅ‚ Nazarewicz's suggestion, > code optimization was performed on PATCH v1. > > v2: https://lore.kernel.org/all/20260514035644.4118050-1-yi.sun@unisoc.com > - Do not introduce find_last_bit_from(). > > v1: https://lore.kernel.org/all/20260512040659.2992142-1-yi.sun@unisoc.com > > > Yi Sun (2): > lib: bitmap: reduce the number of goto again in > bitmap_find_next_zero_area_off() > lib/bitmap: add tests for bitmap_find_next_zero_area_off() > > lib/bitmap.c | 10 +++++++--- > lib/find_bit_benchmark.c | 17 +++++++++++++++++ > lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++ > 3 files changed, 52 insertions(+), 3 deletions(-) > > -- > 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun ` (2 preceding siblings ...) 2026-06-08 7:44 ` 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 孙毅 (Yi Sun) @ 2026-06-08 21:54 ` Yury Norov 2026-06-09 1:06 ` Yury Norov 3 siblings, 1 reply; 10+ messages in thread From: Yury Norov @ 2026-06-08 21:54 UTC (permalink / raw) To: Yi Sun; +Cc: yury.norov, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote: > Test code has been added to PATCH v2. > No new APIs were introduced. > > Testing with the test code showed a performance improvement > of approximately 70%. No, it's not. Your numbers show approximately 50% improvement for the dense case, and approximately 2% slowdown for the sparse case. > Test result(random): > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio > test1 1388885 1154 1203 462923 1308 353 70.7% > test2 1393616 1324 1052 736193 1212 607 42.3% > test3 1391693 1216 1144 735808 1260 583 49% > test4 1393231 1275 1092 742731 1402 529 51.6% > test5 1390731 1260 1103 737231 1274 578 47.6% > > Test result(sparse): > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio > test1 4496077 322477 13 2419462 322480 7 46.2% > test2 7514731 322482 23 5785808 322476 17 26.1% > test3 7490692 322493 23 7654423 322483 23 0% > test4 7474500 322469 23 7628230 322483 23 0% > test5 7452692 322481 23 7663116 322478 23 0% The numbers look quite inconsistent. The first measurements are significantly faster for almost all experiments. In the 'new sparse' case the first run is 4 times faster than the others. And the ratio 0% is simply wrong. Please, run the test on a real hardware, not virtualized. Please built-in the test, so it's executed at boot time, or make sure you're not running anything on parallel, like a GUI or networking. I gave your code a brief test on my qemu, and I have 43% improvement in the dense case, with p-value 0.001; and -8% for sparse bitmap, with the p-value 0.044, still significant. Overall not bad. But if some critical user has actually a sparse bitmap, he'll be disappointed. There's not that many actual users of the function. For v5, can you CC those from non-driver part, at least. (The ARM GIC counts as the non-driver, I believe.) > Test result explanation: > Test both random and sparse five times. > @orig_ns/cnt: Original version results. > @new_ns/cnt: Optimized test results. > @orig_average = orig_ns / orig_cnt > @new_average = new_ns / new_cnt > @ratio = (orig_average - new_average) / orig_average > > The test results show that the optimized version > improved performance in almost every test. > > --- > v3: https://lore.kernel.org/all/20260514090607.231387-1-yi.sun@unisoc.com > - Based on Michał Nazarewicz's suggestion, > code optimization was performed on PATCH v1. We've got a special tag for it: Suggested-by. If the optimization is still there, please use the tag. Can you point to that suggestion? > v2: https://lore.kernel.org/all/20260514035644.4118050-1-yi.sun@unisoc.com > - Do not introduce find_last_bit_from(). > > v1: https://lore.kernel.org/all/20260512040659.2992142-1-yi.sun@unisoc.com > > > Yi Sun (2): > lib: bitmap: reduce the number of goto again in > bitmap_find_next_zero_area_off() > lib/bitmap: add tests for bitmap_find_next_zero_area_off() The patch order is wrong. You'd introduce the test first, then the improvement. I want to apply the 1st patch, then run the test, then apply the 2nd patch, and run the test again to compare. The way you're doing it now makes me reverting the patches, the useless work. Thanks, Yury > > lib/bitmap.c | 10 +++++++--- > lib/find_bit_benchmark.c | 17 +++++++++++++++++ > lib/test_bitmap.c | 28 ++++++++++++++++++++++++++++ > 3 files changed, 52 insertions(+), 3 deletions(-) > > -- > 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 2026-06-08 21:54 ` Yury Norov @ 2026-06-09 1:06 ` Yury Norov 2026-06-09 2:09 ` John Stultz 0 siblings, 1 reply; 10+ messages in thread From: Yury Norov @ 2026-06-09 1:06 UTC (permalink / raw) To: Yury Norov Cc: Yi Sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel, John Stultz, John Stultz On Mon, Jun 08, 2026 at 05:54:20PM -0400, Yury Norov wrote: > On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote: > > Test code has been added to PATCH v2. > > No new APIs were introduced. > > > > Testing with the test code showed a performance improvement > > of approximately 70%. > > No, it's not. Your numbers show approximately 50% improvement for > the dense case, and approximately 2% slowdown for the sparse case. > > > Test result(random): > > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio > > test1 1388885 1154 1203 462923 1308 353 70.7% > > test2 1393616 1324 1052 736193 1212 607 42.3% > > test3 1391693 1216 1144 735808 1260 583 49% > > test4 1393231 1275 1092 742731 1402 529 51.6% > > test5 1390731 1260 1103 737231 1274 578 47.6% > > > > Test result(sparse): > > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio > > test1 4496077 322477 13 2419462 322480 7 46.2% > > test2 7514731 322482 23 5785808 322476 17 26.1% > > test3 7490692 322493 23 7654423 322483 23 0% > > test4 7474500 322469 23 7628230 322483 23 0% > > test5 7452692 322481 23 7663116 322478 23 0% > > The numbers look quite inconsistent. The first measurements are > significantly faster for almost all experiments. In the 'new sparse' > case the first run is 4 times faster than the others. And the ratio > 0% is simply wrong. > > Please, run the test on a real hardware, not virtualized. Please > built-in the test, so it's executed at boot time, or make sure you're > not running anything on parallel, like a GUI or networking. > > I gave your code a brief test on my qemu, and I have 43% improvement > in the dense case, with p-value 0.001; and -8% for sparse bitmap, > with the p-value 0.044, still significant. > > Overall not bad. But if some critical user has actually a sparse bitmap, > he'll be disappointed. There's not that many actual users of the > function. For v5, can you CC those from non-driver part, at least. > > (The ARM GIC counts as the non-driver, I believe.) OK, I traced the cma_alloc(), which calls the bitmap function through cma_range_alloc(), and the numbers are looking really strong: Metric Before After Change Trace span 194.0 ms 87.1 ms -55.1% Total CMA alloc time 48.46 ms 16.11 ms -66.8% Avg alloc latency 184.94 us 61.49 us -66.8% Median alloc latency 73.72 us 20.59 us -72.1% p90 alloc latency 329.76 us 55.63 us -83.1% p99 alloc latency 1866.76 us 859.83 us -53.9% Max alloc latency 4821.91 us 2324.41 us -51.8% By request size: Request Before Avg After Avg Change 1 page 79.68 us 34.47 us -56.7% 256 pages 285.50 us 87.30 us -69.4% I ran it on qemu, but the numbers are so impressive that I believe they will be reproduced baremetal. The tracing command is: sudo trace-cmd record \ -o cma-dmabuf.dat \ -b 65536 \ -e cma:cma_alloc_start \ -e cma:cma_alloc_finish \ -e cma:cma_alloc_busy_retry \ -e cma:cma_release \ -- kselftest/dmabuf-heaps/dmabuf-heap Can you run it on your side before sending v5, and share your results? Adding John Stultz, the test author. Hi John. This series improves the underlying bitmap_find_next_zero_area_off() significantly for average bitmap, but shows ~8% slowdown for sparse bitmaps. With your CMA allocator test, the results are even stronger, comparing to the synthetic benchmark, and there seemingly are no drawbacks. Can you comment on the results and maybe reproduce it on your side? Are you or anyone aware of any other useful tests for CMA allocator? How important the sparse bitmap case overall? Thanks, Yury ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 2026-06-09 1:06 ` Yury Norov @ 2026-06-09 2:09 ` John Stultz 0 siblings, 0 replies; 10+ messages in thread From: John Stultz @ 2026-06-09 2:09 UTC (permalink / raw) To: Yury Norov, T.J. Mercier Cc: Yi Sun, mnazarewicz, akpm, mina86, akinobu.mita, linux-kernel On Mon, Jun 8, 2026 at 6:06 PM Yury Norov <yury.norov@gmail.com> wrote: > On Mon, Jun 08, 2026 at 05:54:20PM -0400, Yury Norov wrote: > > On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote: > > > Test code has been added to PATCH v2. > > > No new APIs were introduced. > > > > > > Testing with the test code showed a performance improvement > > > of approximately 70%. > > > > No, it's not. Your numbers show approximately 50% improvement for > > the dense case, and approximately 2% slowdown for the sparse case. > > > > > Test result(random): > > > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio > > > test1 1388885 1154 1203 462923 1308 353 70.7% > > > test2 1393616 1324 1052 736193 1212 607 42.3% > > > test3 1391693 1216 1144 735808 1260 583 49% > > > test4 1393231 1275 1092 742731 1402 529 51.6% > > > test5 1390731 1260 1103 737231 1274 578 47.6% > > > > > > Test result(sparse): > > > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio > > > test1 4496077 322477 13 2419462 322480 7 46.2% > > > test2 7514731 322482 23 5785808 322476 17 26.1% > > > test3 7490692 322493 23 7654423 322483 23 0% > > > test4 7474500 322469 23 7628230 322483 23 0% > > > test5 7452692 322481 23 7663116 322478 23 0% > > > > The numbers look quite inconsistent. The first measurements are > > significantly faster for almost all experiments. In the 'new sparse' > > case the first run is 4 times faster than the others. And the ratio > > 0% is simply wrong. > > > > Please, run the test on a real hardware, not virtualized. Please > > built-in the test, so it's executed at boot time, or make sure you're > > not running anything on parallel, like a GUI or networking. > > > > I gave your code a brief test on my qemu, and I have 43% improvement > > in the dense case, with p-value 0.001; and -8% for sparse bitmap, > > with the p-value 0.044, still significant. > > > > Overall not bad. But if some critical user has actually a sparse bitmap, > > he'll be disappointed. There's not that many actual users of the > > function. For v5, can you CC those from non-driver part, at least. > > > > (The ARM GIC counts as the non-driver, I believe.) > > OK, I traced the cma_alloc(), which calls the bitmap function through > cma_range_alloc(), and the numbers are looking really strong: > > Metric Before After Change > Trace span 194.0 ms 87.1 ms -55.1% > Total CMA alloc time 48.46 ms 16.11 ms -66.8% > Avg alloc latency 184.94 us 61.49 us -66.8% > Median alloc latency 73.72 us 20.59 us -72.1% > p90 alloc latency 329.76 us 55.63 us -83.1% > p99 alloc latency 1866.76 us 859.83 us -53.9% > Max alloc latency 4821.91 us 2324.41 us -51.8% > > By request size: > > Request Before Avg After Avg Change > 1 page 79.68 us 34.47 us -56.7% > 256 pages 285.50 us 87.30 us -69.4% > > I ran it on qemu, but the numbers are so impressive that I believe > they will be reproduced baremetal. > > The tracing command is: > > sudo trace-cmd record \ > -o cma-dmabuf.dat \ > -b 65536 \ > -e cma:cma_alloc_start \ > -e cma:cma_alloc_finish \ > -e cma:cma_alloc_busy_retry \ > -e cma:cma_release \ > -- kselftest/dmabuf-heaps/dmabuf-heap > > Can you run it on your side before sending v5, and share your results? > > Adding John Stultz, the test author. > > Hi John. > > This series improves the underlying bitmap_find_next_zero_area_off() > significantly for average bitmap, but shows ~8% slowdown for sparse > bitmaps. With your CMA allocator test, the results are even stronger, > comparing to the synthetic benchmark, and there seemingly are no > drawbacks. > > Can you comment on the results and maybe reproduce it on your side? > Are you or anyone aware of any other useful tests for CMA allocator? > How important the sparse bitmap case overall? > Pulling in TJ who has more recent context here. thanks -john ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-09 2:09 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun 2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun 2026-06-08 22:15 ` Yury Norov 2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun 2026-06-08 19:14 ` Yury Norov 2026-06-08 19:24 ` Yury Norov 2026-06-08 7:44 ` 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 孙毅 (Yi Sun) 2026-06-08 21:54 ` Yury Norov 2026-06-09 1:06 ` Yury Norov 2026-06-09 2:09 ` John Stultz
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.