From: Yury Norov <yury.norov@gmail.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Yi Sun <yi.sun@unisoc.com>,
mnazarewicz@gmail.com, akpm@linux-foundation.org,
mina86@mina86.com, akinobu.mita@gmail.com,
linux-kernel@vger.kernel.org,
John Stultz <john.stultz@linaro.org>,
John Stultz <jstultz@google.com>
Subject: Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off()
Date: Mon, 8 Jun 2026 21:06:26 -0400 [thread overview]
Message-ID: <aidnElRQBPoBdOIx@yury> (raw)
In-Reply-To: <aic6DPHDOOny_56B@yury>
On Mon, Jun 08, 2026 at 05:54:20PM -0400, Yury Norov wrote:
> On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote:
> > Test code has been added to PATCH v2.
> > No new APIs were introduced.
> >
> > Testing with the test code showed a performance improvement
> > of approximately 70%.
>
> No, it's not. Your numbers show approximately 50% improvement for
> the dense case, and approximately 2% slowdown for the sparse case.
>
> > Test result(random):
> > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
> > test1 1388885 1154 1203 462923 1308 353 70.7%
> > test2 1393616 1324 1052 736193 1212 607 42.3%
> > test3 1391693 1216 1144 735808 1260 583 49%
> > test4 1393231 1275 1092 742731 1402 529 51.6%
> > test5 1390731 1260 1103 737231 1274 578 47.6%
> >
> > Test result(sparse):
> > orig_ns orig_cnt orig_average new_ns new_cnt new_average ratio
> > test1 4496077 322477 13 2419462 322480 7 46.2%
> > test2 7514731 322482 23 5785808 322476 17 26.1%
> > test3 7490692 322493 23 7654423 322483 23 0%
> > test4 7474500 322469 23 7628230 322483 23 0%
> > test5 7452692 322481 23 7663116 322478 23 0%
>
> The numbers look quite inconsistent. The first measurements are
> significantly faster for almost all experiments. In the 'new sparse'
> case the first run is 4 times faster than the others. And the ratio
> 0% is simply wrong.
>
> Please, run the test on a real hardware, not virtualized. Please
> built-in the test, so it's executed at boot time, or make sure you're
> not running anything on parallel, like a GUI or networking.
>
> I gave your code a brief test on my qemu, and I have 43% improvement
> in the dense case, with p-value 0.001; and -8% for sparse bitmap,
> with the p-value 0.044, still significant.
>
> Overall not bad. But if some critical user has actually a sparse bitmap,
> he'll be disappointed. There's not that many actual users of the
> function. For v5, can you CC those from non-driver part, at least.
>
> (The ARM GIC counts as the non-driver, I believe.)
OK, I traced the cma_alloc(), which calls the bitmap function through
cma_range_alloc(), and the numbers are looking really strong:
Metric Before After Change
Trace span 194.0 ms 87.1 ms -55.1%
Total CMA alloc time 48.46 ms 16.11 ms -66.8%
Avg alloc latency 184.94 us 61.49 us -66.8%
Median alloc latency 73.72 us 20.59 us -72.1%
p90 alloc latency 329.76 us 55.63 us -83.1%
p99 alloc latency 1866.76 us 859.83 us -53.9%
Max alloc latency 4821.91 us 2324.41 us -51.8%
By request size:
Request Before Avg After Avg Change
1 page 79.68 us 34.47 us -56.7%
256 pages 285.50 us 87.30 us -69.4%
I ran it on qemu, but the numbers are so impressive that I believe
they will be reproduced baremetal.
The tracing command is:
sudo trace-cmd record \
-o cma-dmabuf.dat \
-b 65536 \
-e cma:cma_alloc_start \
-e cma:cma_alloc_finish \
-e cma:cma_alloc_busy_retry \
-e cma:cma_release \
-- kselftest/dmabuf-heaps/dmabuf-heap
Can you run it on your side before sending v5, and share your results?
Adding John Stultz, the test author.
Hi John.
This series improves the underlying bitmap_find_next_zero_area_off()
significantly for average bitmap, but shows ~8% slowdown for sparse
bitmaps. With your CMA allocator test, the results are even stronger,
comparing to the synthetic benchmark, and there seemingly are no
drawbacks.
Can you comment on the results and maybe reproduce it on your side?
Are you or anyone aware of any other useful tests for CMA allocator?
How important the sparse bitmap case overall?
Thanks,
Yury
next prev parent reply other threads:[~2026-06-09 1:06 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun
2026-06-01 9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun
2026-06-08 22:15 ` Yury Norov
2026-06-01 9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun
2026-06-08 19:14 ` Yury Norov
2026-06-08 19:24 ` Yury Norov
2026-06-08 7:44 ` 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 孙毅 (Yi Sun)
2026-06-08 21:54 ` Yury Norov
2026-06-09 1:06 ` Yury Norov [this message]
2026-06-09 2:09 ` John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aidnElRQBPoBdOIx@yury \
--to=yury.norov@gmail.com \
--cc=akinobu.mita@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=john.stultz@linaro.org \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mina86@mina86.com \
--cc=mnazarewicz@gmail.com \
--cc=yi.sun@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.