[RFC PATCH] mm/damon/ops-common: optimize damon_hot_score() using fls()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Liew Rui Yan <aethernet65535@gmail.com>
To: sj@kernel.org
Cc: damon@lists.linux.dev, linux-mm@kvack.org,
	Liew Rui Yan <aethernet65535@gmail.com>
Subject: [RFC PATCH] mm/damon/ops-common: optimize damon_hot_score()  using fls()
Date: Fri, 20 Mar 2026 15:24:31 +0800	[thread overview]
Message-ID: <20260320072431.248235-1-aethernet65535@gmail.com> (raw)

The current implementation of damon_hot_score() uses a manual for-loop
to calculate the value of 'age_in_log'. This can be efficiently replaced
by the fls().

In a simulated performance test with 10,000,000 iterations, this
optimization showed a significant reduction in latency:
- Average Latency: Reduced from ~9ns to ~1ns.
- P99 Latency: Reduced from ~60ns to ~41ns.
- Throughput: The loop-based version mostly fell into the 40-50ns range,
  while the fls-based version shifted significantly towards the 20-39ns
  range in the test environment.

Although these results are based on a simulated kernel module test
environment [1], they indicate a clear instruction-level optimization.

[1] https://github.com/aethernet65535/damon-hot-score-fls-optimize/blob/master/test-kernel-module/fls.c

Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
---
Note on testing methodology:
I attempted to measure the performance directly within the kernel using
bpftrace, perf, and ktime inside damon_hot_score(). However, the results
were highly unstable (ktime), and in some cases (perf/bpftrace) the
function was difficult to trace reliably (likely due to my own tracing
limitations).

Despite the instability of in-kernel ktime measurements, one thing
remained consistent: the fls-based version significantly improves the
"long tail" latency compared to the for-loop.

Test results from the simulated module:
- fls-based:
    DAMON Perf Test: Starting 10000000 iterations
    =============================================
     Total Iterations : 10000000
     Average Latency  : 1 ns
     P95 Latency      : 40 ns
     P99 Latency      : 41 ns
    ---------------------------------------------
     Range (ns)      | Count    | Percent
    ---------------------------------------------
     20-39           | 3522000  |   35%
     40-59           | 6478000  |   64%
     60-79           | 0        |    0%
    =============================================

- for-loop:
    DAMON Perf Test: Starting 10000000 iterations
    =============================================
     Total Iterations : 10000000
     Average Latency  : 9 ns
     P95 Latency      : 51 ns
     P99 Latency      : 60 ns
    ---------------------------------------------
     Range (ns)      | Count    | Percent
    ---------------------------------------------
     20-39           | 0        |    0%
     40-59           | 9894000  |   98%
     60-79           | 98000    |    0%
    =============================================

Full raw benchmark results can be found at [2].

If anyone could suggest a more robust way to profile this specific
function within live DAMON context, I would greatly appreciate the
guidance.

[2] https://github.com/aethernet65535/damon-hot-score-fls-optimize/tree/master/result-raw

 mm/damon/ops-common.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/damon/ops-common.c b/mm/damon/ops-common.c
index 8c6d613425c1..0294de61a23a 100644
--- a/mm/damon/ops-common.c
+++ b/mm/damon/ops-common.c
@@ -117,9 +117,7 @@ int damon_hot_score(struct damon_ctx *c, struct damon_region *r,
 		damon_max_nr_accesses(&c->attrs);
 
 	age_in_sec = (unsigned long)r->age * c->attrs.aggr_interval / 1000000;
-	for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec;
-			age_in_log++, age_in_sec >>= 1)
-		;
+	age_in_log = min_t(int, fls(age_in_sec), DAMON_MAX_AGE_IN_LOG);
 
 	/* If frequency is 0, higher age means it's colder */
 	if (freq_subscore == 0)
-- 
2.53.0

next             reply	other threads:[~2026-03-20  7:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20  7:24 Liew Rui Yan [this message]
2026-03-20 15:05 ` [RFC PATCH] mm/damon/ops-common: optimize damon_hot_score() using fls() SeongJae Park
2026-03-20 19:20   ` [PATCH v2] mm/damon/ops-common: optimize damon_hot_score() using ilog2() Liew Rui Yan
2026-03-21  0:23     ` SeongJae Park

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:8c6d613425c dfblob:0294de61a23 )
 OR (
bs:"[RFC PATCH] mm/damon/ops-common: optimize damon_hot_score()  using fls()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260320072431.248235-1-aethernet65535@gmail.com \
    --to=aethernet65535@gmail.com \
    --cc=damon@lists.linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.