* [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path
@ 2026-03-22 21:43 Josh Law
2026-03-22 21:43 ` [PATCH v3 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops Josh Law
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Josh Law @ 2026-03-22 21:43 UTC (permalink / raw)
To: sj, akpm; +Cc: damon, linux-mm, linux-kernel, Josh Law
Hello,
This patch series provides two performance optimizations for the DAMON
core, specifically targeting the hot paths in kdamond.
The first patch optimizes kdamond_apply_schemes() by inverting the loop
order. By iterating over schemes first and regions second, we can
evaluate scheme-level invariants (like activation status and quotas)
once per scheme rather than for every single region. This significantly
reduces CPU overhead when multiple schemes are present or when quotas
are reached.
The second patch eliminates a hardware integer division in
damon_max_nr_accesses() by using the pre-cached aggr_samples value.
Since this function is called once per region per sampling interval,
removing the division provides a measurable reduction in CPU cycles
spent in the access rate update path.
Changes from v2:
- Fix multi-line if statement alignment in the first patch to satisfy
checkpatch --strict.
Changes from v1:
- Use min_t(unsigned long, ...) in damon_max_nr_accesses() to satisfy
checkpatch warnings and improve readability.
Josh Law (2):
mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme
and region loops
mm/damon/core: eliminate hot-path integer division in
damon_max_nr_accesses()
include/linux/damon.h | 3 +-
mm/damon/core.c | 68 ++++++++++++++++++-------------------------
2 files changed, 29 insertions(+), 42 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops
2026-03-22 21:43 [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path Josh Law
@ 2026-03-22 21:43 ` Josh Law
2026-03-23 14:07 ` SeongJae Park
2026-03-22 21:43 ` [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses() Josh Law
2026-03-23 14:06 ` [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path SeongJae Park
2 siblings, 1 reply; 8+ messages in thread
From: Josh Law @ 2026-03-22 21:43 UTC (permalink / raw)
To: sj, akpm; +Cc: damon, linux-mm, linux-kernel, Josh Law
Currently, kdamond_apply_schemes() iterates over all targets, then over all
regions, and finally calls damon_do_apply_schemes() which iterates over
all schemes. This nested structure causes scheme-level invariants (such as
time intervals, activation status, and quota limits) to be evaluated inside
the innermost loop for every single region.
If a scheme is inactive, has not reached its apply interval, or has already
fulfilled its quota (quota->charged_sz >= quota->esz), the kernel still
needlessly iterates through thousands of regions only to repeatedly
evaluate these same scheme-level conditions and continue.
This patch inlines damon_do_apply_schemes() into kdamond_apply_schemes()
and inverts the loop ordering. It now iterates over schemes on the outside,
and targets/regions on the inside.
This allows the code to evaluate scheme-level limits once per scheme.
If a scheme's quota is met or it is inactive, we completely bypass the
O(Targets * Regions) inner loop for that scheme. This drastically reduces
unnecessary branching, cache thrashing, and CPU overhead in the kdamond
hot path.
Signed-off-by: Josh Law <objecting@objecting.org>
---
mm/damon/core.c | 72 +++++++++++++++++++++----------------------------
1 file changed, 30 insertions(+), 42 deletions(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index c884bb31c9b8..a9cfbd6ce3d4 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2112,40 +2112,6 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t,
damos_update_stat(s, sz, sz_applied, sz_ops_filter_passed);
}
-static void damon_do_apply_schemes(struct damon_ctx *c,
- struct damon_target *t,
- struct damon_region *r)
-{
- struct damos *s;
-
- damon_for_each_scheme(s, c) {
- struct damos_quota *quota = &s->quota;
-
- if (time_before(c->passed_sample_intervals, s->next_apply_sis))
- continue;
-
- if (!s->wmarks.activated)
- continue;
-
- /* Check the quota */
- if (quota->esz && quota->charged_sz >= quota->esz)
- continue;
-
- if (damos_skip_charged_region(t, r, s, c->min_region_sz))
- continue;
-
- if (s->max_nr_snapshots &&
- s->max_nr_snapshots <= s->stat.nr_snapshots)
- continue;
-
- if (damos_valid_target(c, r, s))
- damos_apply_scheme(c, t, r, s);
-
- if (damon_is_last_region(r, t))
- s->stat.nr_snapshots++;
- }
-}
-
/*
* damon_feed_loop_next_input() - get next input to achieve a target score.
* @last_input The last input.
@@ -2494,17 +2460,39 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
return;
mutex_lock(&c->walk_control_lock);
- damon_for_each_target(t, c) {
- if (c->ops.target_valid && c->ops.target_valid(t) == false)
- continue;
-
- damon_for_each_region(r, t)
- damon_do_apply_schemes(c, t, r);
- }
-
damon_for_each_scheme(s, c) {
+ struct damos_quota *quota = &s->quota;
+
if (time_before(c->passed_sample_intervals, s->next_apply_sis))
continue;
+
+ if (!s->wmarks.activated)
+ continue;
+
+ damon_for_each_target(t, c) {
+ if (c->ops.target_valid && c->ops.target_valid(t) == false)
+ continue;
+
+ damon_for_each_region(r, t) {
+ /* Check the quota */
+ if (quota->esz && quota->charged_sz >= quota->esz)
+ goto next_scheme;
+
+ if (s->max_nr_snapshots &&
+ s->max_nr_snapshots <= s->stat.nr_snapshots)
+ goto next_scheme;
+
+ if (damos_skip_charged_region(t, r, s, c->min_region_sz))
+ continue;
+
+ if (damos_valid_target(c, r, s))
+ damos_apply_scheme(c, t, r, s);
+
+ if (damon_is_last_region(r, t))
+ s->stat.nr_snapshots++;
+ }
+ }
+next_scheme:
damos_walk_complete(c, s);
damos_set_next_apply_sis(s, c);
s->last_applied = NULL;
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses()
2026-03-22 21:43 [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path Josh Law
2026-03-22 21:43 ` [PATCH v3 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops Josh Law
@ 2026-03-22 21:43 ` Josh Law
2026-03-23 14:10 ` SeongJae Park
2026-03-24 7:19 ` SeongJae Park
2026-03-23 14:06 ` [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path SeongJae Park
2 siblings, 2 replies; 8+ messages in thread
From: Josh Law @ 2026-03-22 21:43 UTC (permalink / raw)
To: sj, akpm; +Cc: damon, linux-mm, linux-kernel, Josh Law
Hardware integer division is slow. The function damon_max_nr_accesses(),
which is called very frequently (e.g., once per region per sample
interval inside damon_update_region_access_rate), performs an integer
division: attrs->aggr_interval / attrs->sample_interval.
However, the struct damon_attrs already caches this exact ratio in the
internal field aggr_samples (since earlier commits). We can eliminate
the hardware division in the hot path by simply returning aggr_samples.
This significantly reduces the CPU cycle overhead of updating the access
rates for thousands of regions.
Signed-off-by: Josh Law <objecting@objecting.org>
---
include/linux/damon.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 6bd71546f7b2..438fe6f3eab4 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -960,8 +960,7 @@ static inline bool damon_target_has_pid(const struct damon_ctx *ctx)
static inline unsigned int damon_max_nr_accesses(const struct damon_attrs *attrs)
{
/* {aggr,sample}_interval are unsigned long, hence could overflow */
- return min(attrs->aggr_interval / attrs->sample_interval,
- (unsigned long)UINT_MAX);
+ return min_t(unsigned long, attrs->aggr_samples, UINT_MAX);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path
2026-03-22 21:43 [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path Josh Law
2026-03-22 21:43 ` [PATCH v3 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops Josh Law
2026-03-22 21:43 ` [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses() Josh Law
@ 2026-03-23 14:06 ` SeongJae Park
2 siblings, 0 replies; 8+ messages in thread
From: SeongJae Park @ 2026-03-23 14:06 UTC (permalink / raw)
To: Josh Law; +Cc: SeongJae Park, akpm, damon, linux-mm, linux-kernel
On Sun, 22 Mar 2026 21:43:23 +0000 Josh Law <objecting@objecting.org> wrote:
> Hello,
>
> This patch series provides two performance optimizations for the DAMON
> core, specifically targeting the hot paths in kdamond.
>
> The first patch optimizes kdamond_apply_schemes() by inverting the loop
> order. By iterating over schemes first and regions second, we can
> evaluate scheme-level invariants (like activation status and quotas)
> once per scheme rather than for every single region. This significantly
> reduces CPU overhead when multiple schemes are present or when quotas
> are reached.
>
> The second patch eliminates a hardware integer division in
> damon_max_nr_accesses() by using the pre-cached aggr_samples value.
> Since this function is called once per region per sampling interval,
> removing the division provides a measurable reduction in CPU cycles
> spent in the access rate update path.
>
> Changes from v2:
> - Fix multi-line if statement alignment in the first patch to satisfy
> checkpatch --strict.
>
> Changes from v1:
> - Use min_t(unsigned long, ...) in damon_max_nr_accesses() to satisfy
> checkpatch warnings and improve readability.
Thank you for adding the change log. Please also consider adding links [1] to
previous versions.
Also, please consider giving at least about one day before sending new revision
of a series, so that people can get a chance to review. If you find something
that you need to change on new version, you can comment first about your
planned change, and wait for others' comments.
[1] https://docs.kernel.org/process/submitting-patches.html#commentary
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops
2026-03-22 21:43 ` [PATCH v3 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops Josh Law
@ 2026-03-23 14:07 ` SeongJae Park
0 siblings, 0 replies; 8+ messages in thread
From: SeongJae Park @ 2026-03-23 14:07 UTC (permalink / raw)
To: Josh Law; +Cc: SeongJae Park, akpm, damon, linux-mm, linux-kernel
I show you posted a new version [1] of this patch, so I'm skipping this.
[1] https://lore.kernel.org/20260322225627.263202-1-objecting@objecting.org
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses()
2026-03-22 21:43 ` [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses() Josh Law
@ 2026-03-23 14:10 ` SeongJae Park
2026-03-24 7:19 ` SeongJae Park
1 sibling, 0 replies; 8+ messages in thread
From: SeongJae Park @ 2026-03-23 14:10 UTC (permalink / raw)
To: Josh Law; +Cc: SeongJae Park, akpm, damon, linux-mm, linux-kernel
On Sun, 22 Mar 2026 21:43:25 +0000 Josh Law <objecting@objecting.org> wrote:
> Hardware integer division is slow. The function damon_max_nr_accesses(),
> which is called very frequently (e.g., once per region per sample
> interval inside damon_update_region_access_rate), performs an integer
> division: attrs->aggr_interval / attrs->sample_interval.
>
> However, the struct damon_attrs already caches this exact ratio in the
> internal field aggr_samples (since earlier commits). We can eliminate
> the hardware division in the hot path by simply returning aggr_samples.
>
> This significantly reduces the CPU cycle overhead of updating the access
> rates for thousands of regions.
>
> Signed-off-by: Josh Law <objecting@objecting.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses()
2026-03-22 21:43 ` [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses() Josh Law
2026-03-23 14:10 ` SeongJae Park
@ 2026-03-24 7:19 ` SeongJae Park
2026-03-24 7:22 ` Josh Law
1 sibling, 1 reply; 8+ messages in thread
From: SeongJae Park @ 2026-03-24 7:19 UTC (permalink / raw)
To: Josh Law; +Cc: SeongJae Park, akpm, damon, linux-mm, linux-kernel
On Sun, 22 Mar 2026 21:43:25 +0000 Josh Law <objecting@objecting.org> wrote:
> Hardware integer division is slow. The function damon_max_nr_accesses(),
> which is called very frequently (e.g., once per region per sample
> interval inside damon_update_region_access_rate), performs an integer
> division: attrs->aggr_interval / attrs->sample_interval.
>
> However, the struct damon_attrs already caches this exact ratio in the
> internal field aggr_samples (since earlier commits). We can eliminate
> the hardware division in the hot path by simply returning aggr_samples.
>
> This significantly reduces the CPU cycle overhead of updating the access
> rates for thousands of regions.
>
> Signed-off-by: Josh Law <objecting@objecting.org>
> ---
> include/linux/damon.h | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 6bd71546f7b2..438fe6f3eab4 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -960,8 +960,7 @@ static inline bool damon_target_has_pid(const struct damon_ctx *ctx)
> static inline unsigned int damon_max_nr_accesses(const struct damon_attrs *attrs)
> {
> /* {aggr,sample}_interval are unsigned long, hence could overflow */
> - return min(attrs->aggr_interval / attrs->sample_interval,
> - (unsigned long)UINT_MAX);
> + return min_t(unsigned long, attrs->aggr_samples, UINT_MAX);
> }
I just found this patch causes below divide-by-zero when
tools/testing/selftets/damon/sysfs.py is executed.
'''
[ 42.462039] Oops: divide error: 0000 [#1] SMP NOPTI
[ 42.463673] CPU: 4 UID: 0 PID: 2044 Comm: kdamond.0 Not tainted 7.0.0-rc4-mm-new-damon+ #354 PREEMPT(full)
[ 42.465193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 42.466590] RIP: 0010:damon_set_attrs (mm/damon/core.c:769 (discriminator 1) mm/damon/core.c:775 (discriminator 1) mm/damon/core.c:786 (discriminator 1) mm/damon/core.c:827 (discriminator 1) mm/damon/core.c:897 (discriminator 1))
[ 42.467287] Code: 48 39 c5 0f 84 dd 00 00 00 41 bb 59 17 b7 d1 48 8b 43 10 4c 8d 43 10 48 8d 48 e0 49 39 c0 75 5b e9 b0 00 00 00 8b 41 18 31 d2 <41> f7 f6 41 89 c5 69 c2 10 27 00 00 31 d2 45 69 ed 10 27 00 00 41
All code
========
0: 48 39 c5 cmp %rax,%rbp
3: 0f 84 dd 00 00 00 je 0xe6
9: 41 bb 59 17 b7 d1 mov $0xd1b71759,%r11d
f: 48 8b 43 10 mov 0x10(%rbx),%rax
13: 4c 8d 43 10 lea 0x10(%rbx),%r8
17: 48 8d 48 e0 lea -0x20(%rax),%rcx
1b: 49 39 c0 cmp %rax,%r8
1e: 75 5b jne 0x7b
20: e9 b0 00 00 00 jmp 0xd5
25: 8b 41 18 mov 0x18(%rcx),%eax
28: 31 d2 xor %edx,%edx
2a:* 41 f7 f6 div %r14d <-- trapping instruction
2d: 41 89 c5 mov %eax,%r13d
30: 69 c2 10 27 00 00 imul $0x2710,%edx,%eax
36: 31 d2 xor %edx,%edx
38: 45 69 ed 10 27 00 00 imul $0x2710,%r13d,%r13d
3f: 41 rex.B
Code starting with the faulting instruction
===========================================
0: 41 f7 f6 div %r14d
3: 41 89 c5 mov %eax,%r13d
6: 69 c2 10 27 00 00 imul $0x2710,%edx,%eax
c: 31 d2 xor %edx,%edx
e: 45 69 ed 10 27 00 00 imul $0x2710,%r13d,%r13d
15: 41 rex.B
[ 42.470046] RSP: 0018:ffffd25c4586bcb0 EFLAGS: 00010246
[ 42.470818] RAX: 0000000000000000 RBX: ffff891346919400 RCX: ffff8913502dd040
[ 42.471923] RDX: 0000000000000000 RSI: ffff891348527600 RDI: ffff891344d94400
[ 42.472972] RBP: ffff891344d94598 R08: ffff891346919410 R09: 0000000000000000
[ 42.474028] R10: 0000000000000000 R11: 00000000d1b71759 R12: 0000000000000014
[ 42.475104] R13: ffff891348527778 R14: 0000000000000000 R15: ffff891348527798
[ 42.476191] FS: 0000000000000000(0000) GS:ffff89149efd8000(0000) knlGS:0000000000000000
[ 42.477375] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 42.478235] CR2: 000000003a84f080 CR3: 000000004a824000 CR4: 00000000000006f0
[ 42.479291] Call Trace:
[ 42.479670] <TASK>
[ 42.480044] damon_commit_ctx (mm/damon/core.c:1538)
[ 42.480660] damon_sysfs_commit_input (mm/damon/sysfs.c:2153 mm/damon/sysfs.c:2181)
[ 42.481389] kdamond_call (mm/damon/core.c:3186)
[ 42.482492] kdamond_fn (mm/damon/core.c:3428)
[ 42.483041] ? kthread_affine_node (kernel/kthread.c:377)
[ 42.483766] ? kfree (include/linux/kmemleak.h:50 mm/slub.c:2610 mm/slub.c:6165 mm/slub.c:6483)
[ 42.484257] ? __pfx_kdamond_fn (mm/damon/core.c:3368)
[ 42.484855] ? __pfx_kdamond_fn (mm/damon/core.c:3368)
[ 42.485459] kthread (kernel/kthread.c:436)
[ 42.485959] ? __pfx_kthread (kernel/kthread.c:381)
[ 42.486524] ret_from_fork (arch/x86/kernel/process.c:164)
[ 42.487105] ? __pfx_kthread (kernel/kthread.c:381)
[ 42.487668] ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
[ 42.488304] </TASK>
'''
That's because damon_commit_ctx() is called to a context that just generated
using damon_new_ctx(), which doesn't set the aggr_samples. After applying
below change, the divide-by-zero is gone.
'''
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -676,6 +676,7 @@ struct damon_ctx *damon_new_ctx(void)
ctx->attrs.sample_interval = 5 * 1000;
ctx->attrs.aggr_interval = 100 * 1000;
ctx->attrs.ops_update_interval = 60 * 1000 * 1000;
+ ctx->attrs.aggr_samples = 20;
ctx->passed_sample_intervals = 0;
/* These will be set from kdamond_init_ctx() */
'''
Also, kunit crashes like below.
'''
$ ./tools/testing/kunit/kunit.py run --kunitconfig mm/damon/tests/
[00:07:19] Configuring KUnit Kernel ...
[00:07:19] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=8
[00:08:11] Starting KUnit Kernel (1/1)...
[00:08:11] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[00:08:11] =================== damon (28 subtests) ====================
[00:08:11] [PASSED] damon_test_target
[00:08:11] [PASSED] damon_test_regions
[00:08:11] [PASSED] damon_test_aggregate
[00:08:11] [PASSED] damon_test_split_at
[00:08:11] [PASSED] damon_test_merge_two
[00:08:11] [PASSED] damon_test_merge_regions_of
[00:08:11] [PASSED] damon_test_split_regions_of
[00:08:11] [PASSED] damon_test_ops_registration
[00:08:11] [PASSED] damon_test_set_regions
[00:08:11] [ERROR] Test: damon: missing expected subtest!
[00:08:11] Kernel panic - not syncing: Kernel mode signal 4
'''
It can run without the error after below changes are applied:
'''
--- a/mm/damon/tests/core-kunit.h
+++ b/mm/damon/tests/core-kunit.h
@@ -514,6 +514,8 @@ static void damon_test_nr_accesses_to_accesses_bp(struct kunit *test)
.aggr_interval = ((unsigned long)UINT_MAX + 1) * 10
};
+ attrs.aggr_samples = attrs.aggr_interval / attrs.sample_interval;
+
/*
* In some cases such as 32bit architectures where UINT_MAX is
* ULONG_MAX, attrs.aggr_interval becomes zero. Calling
@@ -532,7 +534,8 @@ static void damon_test_nr_accesses_to_accesses_bp(struct kunit *test)
static void damon_test_update_monitoring_result(struct kunit *test)
{
struct damon_attrs old_attrs = {
- .sample_interval = 10, .aggr_interval = 1000,};
+ .sample_interval = 10, .aggr_interval = 1000,
+ .aggr_samples = 100,};
struct damon_attrs new_attrs;
struct damon_region *r = damon_new_region(3, 7);
@@ -544,19 +547,24 @@ static void damon_test_update_monitoring_result(struct kunit *test)
r->age = 20;
new_attrs = (struct damon_attrs){
- .sample_interval = 100, .aggr_interval = 10000,};
+ .sample_interval = 100, .aggr_interval = 10000,
+ .aggr_samples = 100,};
damon_update_monitoring_result(r, &old_attrs, &new_attrs, false);
KUNIT_EXPECT_EQ(test, r->nr_accesses, 15);
KUNIT_EXPECT_EQ(test, r->age, 2);
new_attrs = (struct damon_attrs){
- .sample_interval = 1, .aggr_interval = 1000};
+ .sample_interval = 1, .aggr_interval = 1000,
+ .aggr_samples = 1000,
+ };
damon_update_monitoring_result(r, &old_attrs, &new_attrs, false);
KUNIT_EXPECT_EQ(test, r->nr_accesses, 150);
KUNIT_EXPECT_EQ(test, r->age, 2);
new_attrs = (struct damon_attrs){
- .sample_interval = 1, .aggr_interval = 100};
+ .sample_interval = 1, .aggr_interval = 100,
+ .aggr_samples = 100,
+ };
damon_update_monitoring_result(r, &old_attrs, &new_attrs, false);
KUNIT_EXPECT_EQ(test, r->nr_accesses, 150);
KUNIT_EXPECT_EQ(test, r->age, 20);
'''
Josh, could you please also take a look if these fixups are sufficient? And
once the sufficient fixes are found from your side, could you please post a new
version of this patch after applying the fixees? Also, please drop my
Reviewed-by: from the new version. I will review it again.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses()
2026-03-24 7:19 ` SeongJae Park
@ 2026-03-24 7:22 ` Josh Law
0 siblings, 0 replies; 8+ messages in thread
From: Josh Law @ 2026-03-24 7:22 UTC (permalink / raw)
To: SeongJae Park; +Cc: akpm, damon, linux-mm, linux-kernel
On 24 March 2026 07:19:08 GMT, SeongJae Park <sj@kernel.org> wrote:
>On Sun, 22 Mar 2026 21:43:25 +0000 Josh Law <objecting@objecting.org> wrote:
>
>> Hardware integer division is slow. The function damon_max_nr_accesses(),
>> which is called very frequently (e.g., once per region per sample
>> interval inside damon_update_region_access_rate), performs an integer
>> division: attrs->aggr_interval / attrs->sample_interval.
>>
>> However, the struct damon_attrs already caches this exact ratio in the
>> internal field aggr_samples (since earlier commits). We can eliminate
>> the hardware division in the hot path by simply returning aggr_samples.
>>
>> This significantly reduces the CPU cycle overhead of updating the access
>> rates for thousands of regions.
>>
>> Signed-off-by: Josh Law <objecting@objecting.org>
>> ---
>> include/linux/damon.h | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/include/linux/damon.h b/include/linux/damon.h
>> index 6bd71546f7b2..438fe6f3eab4 100644
>> --- a/include/linux/damon.h
>> +++ b/include/linux/damon.h
>> @@ -960,8 +960,7 @@ static inline bool damon_target_has_pid(const struct damon_ctx *ctx)
>> static inline unsigned int damon_max_nr_accesses(const struct damon_attrs *attrs)
>> {
>> /* {aggr,sample}_interval are unsigned long, hence could overflow */
>> - return min(attrs->aggr_interval / attrs->sample_interval,
>> - (unsigned long)UINT_MAX);
>> + return min_t(unsigned long, attrs->aggr_samples, UINT_MAX);
>> }
>
>I just found this patch causes below divide-by-zero when
>tools/testing/selftets/damon/sysfs.py is executed.
>
>'''
>[ 42.462039] Oops: divide error: 0000 [#1] SMP NOPTI
>[ 42.463673] CPU: 4 UID: 0 PID: 2044 Comm: kdamond.0 Not tainted 7.0.0-rc4-mm-new-damon+ #354 PREEMPT(full)
>[ 42.465193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
>[ 42.466590] RIP: 0010:damon_set_attrs (mm/damon/core.c:769 (discriminator 1) mm/damon/core.c:775 (discriminator 1) mm/damon/core.c:786 (discriminator 1) mm/damon/core.c:827 (discriminator 1) mm/damon/core.c:897 (discriminator 1))
>[ 42.467287] Code: 48 39 c5 0f 84 dd 00 00 00 41 bb 59 17 b7 d1 48 8b 43 10 4c 8d 43 10 48 8d 48 e0 49 39 c0 75 5b e9 b0 00 00 00 8b 41 18 31 d2 <41> f7 f6 41 89 c5 69 c2 10 27 00 00 31 d2 45 69 ed 10 27 00 00 41
>All code
>========
> 0: 48 39 c5 cmp %rax,%rbp
> 3: 0f 84 dd 00 00 00 je 0xe6
> 9: 41 bb 59 17 b7 d1 mov $0xd1b71759,%r11d
> f: 48 8b 43 10 mov 0x10(%rbx),%rax
> 13: 4c 8d 43 10 lea 0x10(%rbx),%r8
> 17: 48 8d 48 e0 lea -0x20(%rax),%rcx
> 1b: 49 39 c0 cmp %rax,%r8
> 1e: 75 5b jne 0x7b
> 20: e9 b0 00 00 00 jmp 0xd5
> 25: 8b 41 18 mov 0x18(%rcx),%eax
> 28: 31 d2 xor %edx,%edx
> 2a:* 41 f7 f6 div %r14d <-- trapping instruction
> 2d: 41 89 c5 mov %eax,%r13d
> 30: 69 c2 10 27 00 00 imul $0x2710,%edx,%eax
> 36: 31 d2 xor %edx,%edx
> 38: 45 69 ed 10 27 00 00 imul $0x2710,%r13d,%r13d
> 3f: 41 rex.B
>
>Code starting with the faulting instruction
>===========================================
> 0: 41 f7 f6 div %r14d
> 3: 41 89 c5 mov %eax,%r13d
> 6: 69 c2 10 27 00 00 imul $0x2710,%edx,%eax
> c: 31 d2 xor %edx,%edx
> e: 45 69 ed 10 27 00 00 imul $0x2710,%r13d,%r13d
> 15: 41 rex.B
>[ 42.470046] RSP: 0018:ffffd25c4586bcb0 EFLAGS: 00010246
>[ 42.470818] RAX: 0000000000000000 RBX: ffff891346919400 RCX: ffff8913502dd040
>[ 42.471923] RDX: 0000000000000000 RSI: ffff891348527600 RDI: ffff891344d94400
>[ 42.472972] RBP: ffff891344d94598 R08: ffff891346919410 R09: 0000000000000000
>[ 42.474028] R10: 0000000000000000 R11: 00000000d1b71759 R12: 0000000000000014
>[ 42.475104] R13: ffff891348527778 R14: 0000000000000000 R15: ffff891348527798
>[ 42.476191] FS: 0000000000000000(0000) GS:ffff89149efd8000(0000) knlGS:0000000000000000
>[ 42.477375] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 42.478235] CR2: 000000003a84f080 CR3: 000000004a824000 CR4: 00000000000006f0
>[ 42.479291] Call Trace:
>[ 42.479670] <TASK>
>[ 42.480044] damon_commit_ctx (mm/damon/core.c:1538)
>[ 42.480660] damon_sysfs_commit_input (mm/damon/sysfs.c:2153 mm/damon/sysfs.c:2181)
>[ 42.481389] kdamond_call (mm/damon/core.c:3186)
>[ 42.482492] kdamond_fn (mm/damon/core.c:3428)
>[ 42.483041] ? kthread_affine_node (kernel/kthread.c:377)
>[ 42.483766] ? kfree (include/linux/kmemleak.h:50 mm/slub.c:2610 mm/slub.c:6165 mm/slub.c:6483)
>[ 42.484257] ? __pfx_kdamond_fn (mm/damon/core.c:3368)
>[ 42.484855] ? __pfx_kdamond_fn (mm/damon/core.c:3368)
>[ 42.485459] kthread (kernel/kthread.c:436)
>[ 42.485959] ? __pfx_kthread (kernel/kthread.c:381)
>[ 42.486524] ret_from_fork (arch/x86/kernel/process.c:164)
>[ 42.487105] ? __pfx_kthread (kernel/kthread.c:381)
>[ 42.487668] ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
>[ 42.488304] </TASK>
>'''
>
>That's because damon_commit_ctx() is called to a context that just generated
>using damon_new_ctx(), which doesn't set the aggr_samples. After applying
>below change, the divide-by-zero is gone.
>
>'''
>--- a/mm/damon/core.c
>+++ b/mm/damon/core.c
>@@ -676,6 +676,7 @@ struct damon_ctx *damon_new_ctx(void)
> ctx->attrs.sample_interval = 5 * 1000;
> ctx->attrs.aggr_interval = 100 * 1000;
> ctx->attrs.ops_update_interval = 60 * 1000 * 1000;
>+ ctx->attrs.aggr_samples = 20;
>
> ctx->passed_sample_intervals = 0;
> /* These will be set from kdamond_init_ctx() */
>'''
>
>Also, kunit crashes like below.
>
>'''
>$ ./tools/testing/kunit/kunit.py run --kunitconfig mm/damon/tests/
>[00:07:19] Configuring KUnit Kernel ...
>[00:07:19] Building KUnit Kernel ...
>Populating config with:
>$ make ARCH=um O=.kunit olddefconfig
>Building with:
>$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=8
>[00:08:11] Starting KUnit Kernel (1/1)...
>[00:08:11] ============================================================
>Running tests with:
>$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
>[00:08:11] =================== damon (28 subtests) ====================
>[00:08:11] [PASSED] damon_test_target
>[00:08:11] [PASSED] damon_test_regions
>[00:08:11] [PASSED] damon_test_aggregate
>[00:08:11] [PASSED] damon_test_split_at
>[00:08:11] [PASSED] damon_test_merge_two
>[00:08:11] [PASSED] damon_test_merge_regions_of
>[00:08:11] [PASSED] damon_test_split_regions_of
>[00:08:11] [PASSED] damon_test_ops_registration
>[00:08:11] [PASSED] damon_test_set_regions
>[00:08:11] [ERROR] Test: damon: missing expected subtest!
>[00:08:11] Kernel panic - not syncing: Kernel mode signal 4
>'''
>
>It can run without the error after below changes are applied:
>
>'''
>--- a/mm/damon/tests/core-kunit.h
>+++ b/mm/damon/tests/core-kunit.h
>@@ -514,6 +514,8 @@ static void damon_test_nr_accesses_to_accesses_bp(struct kunit *test)
> .aggr_interval = ((unsigned long)UINT_MAX + 1) * 10
> };
>
>+ attrs.aggr_samples = attrs.aggr_interval / attrs.sample_interval;
>+
> /*
> * In some cases such as 32bit architectures where UINT_MAX is
> * ULONG_MAX, attrs.aggr_interval becomes zero. Calling
>@@ -532,7 +534,8 @@ static void damon_test_nr_accesses_to_accesses_bp(struct kunit *test)
> static void damon_test_update_monitoring_result(struct kunit *test)
> {
> struct damon_attrs old_attrs = {
>- .sample_interval = 10, .aggr_interval = 1000,};
>+ .sample_interval = 10, .aggr_interval = 1000,
>+ .aggr_samples = 100,};
> struct damon_attrs new_attrs;
> struct damon_region *r = damon_new_region(3, 7);
>
>@@ -544,19 +547,24 @@ static void damon_test_update_monitoring_result(struct kunit *test)
> r->age = 20;
>
> new_attrs = (struct damon_attrs){
>- .sample_interval = 100, .aggr_interval = 10000,};
>+ .sample_interval = 100, .aggr_interval = 10000,
>+ .aggr_samples = 100,};
> damon_update_monitoring_result(r, &old_attrs, &new_attrs, false);
> KUNIT_EXPECT_EQ(test, r->nr_accesses, 15);
> KUNIT_EXPECT_EQ(test, r->age, 2);
>
> new_attrs = (struct damon_attrs){
>- .sample_interval = 1, .aggr_interval = 1000};
>+ .sample_interval = 1, .aggr_interval = 1000,
>+ .aggr_samples = 1000,
>+ };
> damon_update_monitoring_result(r, &old_attrs, &new_attrs, false);
> KUNIT_EXPECT_EQ(test, r->nr_accesses, 150);
> KUNIT_EXPECT_EQ(test, r->age, 2);
>
> new_attrs = (struct damon_attrs){
>- .sample_interval = 1, .aggr_interval = 100};
>+ .sample_interval = 1, .aggr_interval = 100,
>+ .aggr_samples = 100,
>+ };
> damon_update_monitoring_result(r, &old_attrs, &new_attrs, false);
> KUNIT_EXPECT_EQ(test, r->nr_accesses, 150);
> KUNIT_EXPECT_EQ(test, r->age, 20);
>'''
>
>Josh, could you please also take a look if these fixups are sufficient? And
>once the sufficient fixes are found from your side, could you please post a new
>version of this patch after applying the fixees? Also, please drop my
>Reviewed-by: from the new version. I will review it again.
>
>
>Thanks,
>SJ
>
>[...]
Aw, I didnt notice that, I'm so sorry, I'll fix this soon
V/R
Josh Law
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-24 7:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-22 21:43 [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path Josh Law
2026-03-22 21:43 ` [PATCH v3 1/2] mm/damon/core: optimize kdamond_apply_schemes() by inverting scheme and region loops Josh Law
2026-03-23 14:07 ` SeongJae Park
2026-03-22 21:43 ` [PATCH v3 2/2] mm/damon/core: eliminate hot-path integer division in damon_max_nr_accesses() Josh Law
2026-03-23 14:10 ` SeongJae Park
2026-03-24 7:19 ` SeongJae Park
2026-03-24 7:22 ` Josh Law
2026-03-23 14:06 ` [PATCH v3 0/2] mm/damon/core: Performance optimizations for the kdamond hot path SeongJae Park
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox