All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Yi Sun <yi.sun@unisoc.com>,
	mnazarewicz@gmail.com, akpm@linux-foundation.org,
	mina86@mina86.com, akinobu.mita@gmail.com,
	linux-kernel@vger.kernel.org,
	John Stultz <john.stultz@linaro.org>,
	John Stultz <jstultz@google.com>
Subject: Re: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off()
Date: Mon, 8 Jun 2026 21:06:26 -0400	[thread overview]
Message-ID: <aidnElRQBPoBdOIx@yury> (raw)
In-Reply-To: <aic6DPHDOOny_56B@yury>

On Mon, Jun 08, 2026 at 05:54:20PM -0400, Yury Norov wrote:
> On Mon, Jun 01, 2026 at 05:42:32PM +0800, Yi Sun wrote:
> > Test code has been added to PATCH v2.
> > No new APIs were introduced.
> > 
> > Testing with the test code showed a performance improvement
> > of approximately 70%.
> 
> No, it's not. Your numbers show approximately 50% improvement for
> the dense case, and approximately 2% slowdown for the sparse case. 
>  
> > Test result(random):
> > 	orig_ns		orig_cnt	orig_average	new_ns		new_cnt		new_average	ratio
> > test1	1388885		1154		1203		462923		1308		353		70.7%
> > test2	1393616		1324		1052		736193		1212		607		42.3%
> > test3	1391693		1216		1144		735808		1260		583		49%
> > test4	1393231		1275		1092		742731		1402		529		51.6%
> > test5	1390731		1260		1103		737231		1274		578		47.6%
> > 
> > Test result(sparse):
> > 	orig_ns		orig_cnt	orig_average	new_ns		new_cnt		new_average	ratio
> > test1	4496077		322477		13		2419462		322480		7		46.2%
> > test2	7514731		322482		23		5785808		322476		17		26.1%
> > test3	7490692		322493		23		7654423		322483		23		0%
> > test4	7474500		322469		23		7628230		322483		23		0%
> > test5	7452692		322481		23		7663116		322478		23		0%
> 
> The numbers look quite inconsistent. The first measurements are
> significantly faster for almost all experiments. In the 'new sparse'
> case the first run is 4 times faster than the others. And the ratio
> 0% is simply wrong.
> 
> Please, run the test on a real hardware, not virtualized. Please
> built-in the test, so it's executed at boot time, or make sure you're
> not running anything on parallel, like a GUI or networking.
> 
> I gave your code a brief test on my qemu, and I have 43% improvement
> in the dense case, with p-value 0.001; and -8% for sparse bitmap,
> with the p-value 0.044, still significant.
> 
> Overall not bad. But if some critical user has actually a sparse bitmap,
> he'll be disappointed. There's not that many actual users of the
> function. For v5, can you CC those from non-driver part, at least.
> 
> (The ARM GIC counts as the non-driver, I believe.)

OK, I traced the cma_alloc(), which calls the bitmap function through
cma_range_alloc(), and the numbers are looking really strong:

Metric                      Before         After          Change
Trace span                194.0 ms       87.1 ms          -55.1%
Total CMA alloc time      48.46 ms      16.11 ms          -66.8%
Avg alloc latency        184.94 us      61.49 us          -66.8%
Median alloc latency      73.72 us      20.59 us          -72.1%
p90 alloc latency        329.76 us      55.63 us          -83.1%
p99 alloc latency       1866.76 us     859.83 us          -53.9%
Max alloc latency       4821.91 us    2324.41 us          -51.8%

By request size:

Request      Before Avg    After Avg          Change
1 page         79.68 us     34.47 us          -56.7%
256 pages     285.50 us     87.30 us          -69.4%

I ran it on qemu, but the numbers are so impressive that I believe
they will be reproduced baremetal.

The tracing command is:

  sudo trace-cmd record \
    -o cma-dmabuf.dat \
    -b 65536 \
    -e cma:cma_alloc_start \
    -e cma:cma_alloc_finish \
    -e cma:cma_alloc_busy_retry \
    -e cma:cma_release \
    -- kselftest/dmabuf-heaps/dmabuf-heap

Can you run it on your side before sending v5, and share your results?

Adding John Stultz, the test author.

Hi John.

This series improves the underlying bitmap_find_next_zero_area_off() 
significantly for average bitmap, but shows ~8% slowdown for sparse
bitmaps. With your CMA allocator test, the results are even stronger,
comparing to the synthetic benchmark, and there seemingly are no
drawbacks.

Can you comment on the results and maybe reproduce it on your side?
Are you or anyone aware of any other useful tests for CMA allocator?
How important the sparse bitmap case overall?

Thanks,
Yury

  reply	other threads:[~2026-06-09  1:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01  9:42 [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() Yi Sun
2026-06-01  9:42 ` [PATCH v4 1/2] lib: bitmap: reduce the number of goto again in bitmap_find_next_zero_area_off() Yi Sun
2026-06-08 22:15   ` Yury Norov
2026-06-01  9:42 ` [PATCH v4 2/2] lib/bitmap: add tests for bitmap_find_next_zero_area_off() Yi Sun
2026-06-08 19:14   ` Yury Norov
2026-06-08 19:24     ` Yury Norov
2026-06-08  7:44 ` 答复: [PATCH v4 0/2] Improve the performance of bitmap_find_next_zero_area_off() 孙毅 (Yi Sun)
2026-06-08 21:54 ` Yury Norov
2026-06-09  1:06   ` Yury Norov [this message]
2026-06-09  2:09     ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aidnElRQBPoBdOIx@yury \
    --to=yury.norov@gmail.com \
    --cc=akinobu.mita@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=john.stultz@linaro.org \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mina86@mina86.com \
    --cc=mnazarewicz@gmail.com \
    --cc=yi.sun@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.