Re: [RFC PATCH] mm/damon/ops-common: optimize damon_hot_score() using fls()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: Liew Rui Yan <aethernet65535@gmail.com>
Cc: SeongJae Park <sj@kernel.org>, damon@lists.linux.dev, linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/damon/ops-common: optimize damon_hot_score()  using fls()
Date: Fri, 20 Mar 2026 08:05:36 -0700	[thread overview]
Message-ID: <20260320150536.98893-1-sj@kernel.org> (raw)
In-Reply-To: <20260320072431.248235-1-aethernet65535@gmail.com>

Hello Liew,

On Fri, 20 Mar 2026 15:24:31 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> The current implementation of damon_hot_score() uses a manual for-loop
> to calculate the value of 'age_in_log'. This can be efficiently replaced
> by the fls().

This makes sense.  But, it seems ilog2() looks more same to what the current
code is trying to do.  How about using ilog2() instead of fls()?

> 
> In a simulated performance test with 10,000,000 iterations, this
> optimization showed a significant reduction in latency:
> - Average Latency: Reduced from ~9ns to ~1ns.
> - P99 Latency: Reduced from ~60ns to ~41ns.
> - Throughput: The loop-based version mostly fell into the 40-50ns range,
>   while the fls-based version shifted significantly towards the 20-39ns
>   range in the test environment.
> 
> Although these results are based on a simulated kernel module test
> environment [1], they indicate a clear instruction-level optimization.
> 
> [1] https://github.com/aethernet65535/damon-hot-score-fls-optimize/blob/master/test-kernel-module/fls.c

Makes sense!

> 
> Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
> ---
> Note on testing methodology:
> I attempted to measure the performance directly within the kernel using
> bpftrace, perf, and ktime inside damon_hot_score(). However, the results
> were highly unstable (ktime), and in some cases (perf/bpftrace) the
> function was difficult to trace reliably (likely due to my own tracing
> limitations).
> 
> Despite the instability of in-kernel ktime measurements, one thing
> remained consistent: the fls-based version significantly improves the
> "long tail" latency compared to the for-loop.
> 
> Test results from the simulated module:
> - fls-based:
>     DAMON Perf Test: Starting 10000000 iterations
>     =============================================
>      Total Iterations : 10000000
>      Average Latency  : 1 ns
>      P95 Latency      : 40 ns
>      P99 Latency      : 41 ns
>     ---------------------------------------------
>      Range (ns)      | Count    | Percent
>     ---------------------------------------------
>      20-39           | 3522000  |   35%
>      40-59           | 6478000  |   64%
>      60-79           | 0        |    0%
>     =============================================
> 
> - for-loop:
>     DAMON Perf Test: Starting 10000000 iterations
>     =============================================
>      Total Iterations : 10000000
>      Average Latency  : 9 ns
>      P95 Latency      : 51 ns
>      P99 Latency      : 60 ns
>     ---------------------------------------------
>      Range (ns)      | Count    | Percent
>     ---------------------------------------------
>      20-39           | 0        |    0%
>      40-59           | 9894000  |   98%
>      60-79           | 98000    |    0%
>     =============================================
> 
> Full raw benchmark results can be found at [2].
> 
> If anyone could suggest a more robust way to profile this specific
> function within live DAMON context, I would greatly appreciate the
> guidance.
> 
> [2] https://github.com/aethernet65535/damon-hot-score-fls-optimize/tree/master/result-raw

Nice test results!  I think this deserves to be in the git history.  Could you
please add this on the commit message area, rather than this commentary area in
the next version?

> 
>  mm/damon/ops-common.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
> index 8c6d613425c1..0294de61a23a 100644
> --- a/mm/damon/ops-common.c
> +++ b/mm/damon/ops-common.c
> @@ -117,9 +117,7 @@ int damon_hot_score(struct damon_ctx *c, struct damon_region *r,
>  		damon_max_nr_accesses(&c->attrs);
>  
>  	age_in_sec = (unsigned long)r->age * c->attrs.aggr_interval / 1000000;
> -	for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec;
> -			age_in_log++, age_in_sec >>= 1)
> -		;
> +	age_in_log = min_t(int, fls(age_in_sec), DAMON_MAX_AGE_IN_LOG);
>  
>  	/* If frequency is 0, higher age means it's colder */
>  	if (freq_subscore == 0)
> -- 
> 2.53.0


Thanks,
SJ

next prev parent reply	other threads:[~2026-03-20 15:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20  7:24 [RFC PATCH] mm/damon/ops-common: optimize damon_hot_score() using fls() Liew Rui Yan
2026-03-20 15:05 ` SeongJae Park [this message]
2026-03-20 19:20   ` [PATCH v2] mm/damon/ops-common: optimize damon_hot_score() using ilog2() Liew Rui Yan
2026-03-21  0:23     ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260320150536.98893-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=aethernet65535@gmail.com \
    --cc=damon@lists.linux.dev \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.