DAMON development mailing list
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: SeongJae Park <sj@kernel.org>
Cc: sashiko-bot@kernel.org, damon@lists.linux.dev
Subject: Re: [RFC PATCH v1.2 03/17] mm/damon/core: always update ->last_nr_accesses for intervals change
Date: Sun, 21 Jun 2026 14:13:02 -0700	[thread overview]
Message-ID: <20260621211302.11854-1-sj@kernel.org> (raw)
In-Reply-To: <20260621202307.10276-1-sj@kernel.org>

On Sun, 21 Jun 2026 13:23:16 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Sun, 21 Jun 2026 16:11:04 +0000 sashiko-bot@kernel.org wrote:
> 
> > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> > 
> > Pre-existing issues:
> > - [High] Arithmetic overflow in DAMON access scaling functions when interval ratio is large.
> > - [High] The patch attempts to fix a 'wrong value' returned by `damon_nr_accesses_mvsum()`, but misdiagnoses the root cause and leaves a major double-counting bug unresolved.
> > --
> > 
> > commit 6636564dd42488eb260d7102569887a00bdd3d71
> > Author: SeongJae Park <sj@kernel.org>
> > 
> > mm/damon/core: always update ->last_nr_accesses for intervals change
> > 
> > This commit modifies damon_update_monitoring_result() to unconditionally
> > update last_nr_accesses when aggregation intervals change. It aims to
> > prevent incorrect calculations in damon_nr_accesses_mvsum() that could occur
> > if called with a stale last_nr_accesses value alongside a fresh aggregation
> > timestamp.
> > 
> > > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > > index addd14e4e70bf..bc40a6d10f277 100644
> > > --- a/mm/damon/core.c
> > > +++ b/mm/damon/core.c
> > > @@ -873,6 +873,8 @@ static void damon_update_monitoring_result(struct damon_region *r,
> > >  		struct damon_attrs *old_attrs, struct damon_attrs *new_attrs,
> > >  		bool aggregating)
> > >  {
> > > +	r->last_nr_accesses = damon_nr_accesses_for_new_attrs(
> > > +			r->last_nr_accesses, old_attrs, new_attrs);
> > 
> > [Severity: High]
> > This is a pre-existing issue, but does this scaling trigger an arithmetic
> > overflow when the interval ratio is large?
> > 
> > When damon_nr_accesses_for_new_attrs() is called, it performs 32-bit
> > arithmetic in damon_nr_accesses_to_accesses_bp() and
> > damon_accesses_bp_to_nr_accesses().
> > 
> > Specifically:
> > 
> > mm/damon/core.c:damon_nr_accesses_to_accesses_bp() {
> >     ...
> >     return mult_frac(nr_accesses, 10000, damon_max_nr_accesses(attrs));
> > }
> > 
> > mm/damon/core.c:damon_accesses_bp_to_nr_accesses() {
> >     ...
> >     return accesses_bp * damon_max_nr_accesses(attrs) / 10000;
> > }
> > 
> > If the ratio of aggr_interval to sample_interval exceeds 429,496 (for
> > example, a 1s aggregation and 1us sample), the intermediate multiplications
> > like r * 10000 inside mult_frac() are computed as 32-bit unsigned int,
> > which overflows the 32-bit maximum.
> > 
> > Can this silently corrupt the region access counts upon attribute updates,
> > leading to completely incorrect monitoring results?
> 
> Known low priority issue.  Not a blocker of this patch.
> 
> > 
> > 
> > [Severity: High]
> > This isn't a bug introduced by this patch, but does scaling
> > last_nr_accesses fully resolve the incorrect values returned by
> > damon_nr_accesses_mvsum(), or does it leave a double-counting flaw intact?
> > 
> > The commit message notes that damon_nr_accesses_mvsum() can return the
> > wrong value. However, the root cause appears to be a state desynchronization
> > rather than just a stale last_nr_accesses.
> > 
> > When damon_set_attrs() is called, it advances ctx->next_aggregation_sis
> > into the future:
> > 
> > mm/damon/core.c:damon_set_attrs() {
> >     ...
> >     ctx->next_aggregation_sis = ctx->passed_sample_intervals +
> >                                 attrs->aggr_interval / sample_interval;
> > }
> > 
> > If a subsequent call to damon_nr_accesses_mvsum() evaluates this future
> > timestamp, it assumes it is at the start of a new interval, meaning
> > left_window_bp evaluates to 10000.
> > 
> > mm/damon/core.c:damon_nr_accesses_mvsum() {
> >     ...
> >     return damon_mvsum(r->nr_accesses, r->last_nr_accesses, left_window_bp);
> > }
> > 
> > Because kdamond_reset_aggregated() has not yet executed, r->nr_accesses
> > still holds the fully accumulated count of the just-completed interval.
> > This means damon_mvsum() adds two full intervals' worth of access counts
> > together, effectively calculating:
> > r->nr_accesses + (r->last_nr_accesses * 10000 / 10000).
> > 
> > Does unconditionally scaling last_nr_accesses merely adjust the second term
> > of this sum while preserving the underlying double-counting logic flaw?
> 
> Good point.  In the long term, damon_update_monitoring_result() should modify
> nr_accesses.  We cannot do that at this moment, since we have to support both
> nr_accesses_bp and damon_nr_accesses_mvsum().   I will add a temporal
> preparatory hack to damonn_nr_accesses_mvsum() for this issue, like below.
> 
> '''
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -268,6 +268,8 @@ unsigned int damon_nr_accesses_mvsum(struct damon_region *r,
>                         ctx->passed_sample_intervals;
>         left_window_bp = mult_frac(left_window, 10000, window_len);
> 
> +       if (left_window_bp == 10000)
> +               return r->last_nr_accesses;
>         return damon_mvsum(r->nr_accesses, r->last_nr_accesses,
>                         left_window_bp);
>  }
> '''
> 
> At the end of the series, we will updte damon_update_monitoring_result() to
> reset nr_accesses always,and remove this temporal hack.

No, I will not.

After kdamond_call() that could call damon_set_attrs(), if it was the last
iteration for the current aggregation, kdamond_fn() will further invoke
aggregation interval operations based on the before-damon_set_attrs() cached
timestamps.  Those operations include DAMOS (if apply interval is aligned with
the aggregation interval), regions split, regions reset and intervals
auto-tuning.  Some of those operations still use nr_accesses, so nr_accesses
cannot unconditionally reset in damon_update_monitoring_result().

I will make this series be merged with the nr_accesses_mvsum() change, and
cleanup/simplify the logic after this series.


Thanks,
SJ

[...]

  reply	other threads:[~2026-06-21 21:13 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-21 15:56 [RFC PATCH v1.2 00/17] mm/damon: optimize out nr_accesses_bp SeongJae Park
2026-06-21 15:56 ` [RFC PATCH v1.2 01/17] mm/damon: introduce damon_nr_accesses_mvsum() SeongJae Park
2026-06-21 16:10   ` sashiko-bot
2026-06-21 20:01     ` SeongJae Park
2026-06-21 15:56 ` [RFC PATCH v1.2 02/17] mm/damon/tests/core-kunit: test damon_mvsum() SeongJae Park
2026-06-21 15:56 ` [RFC PATCH v1.2 03/17] mm/damon/core: always update ->last_nr_accesses for intervals change SeongJae Park
2026-06-21 16:11   ` sashiko-bot
2026-06-21 20:23     ` SeongJae Park
2026-06-21 21:13       ` SeongJae Park [this message]
2026-06-21 15:57 ` [RFC PATCH v1.2 04/17] mm/damon/core: use damon_nr_accesses_mvsum() in __damos_valid_target() SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 05/17] mm/damon/core: use damon_nr_accesses_mvsum() for damos region tracing SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 06/17] mm/damon/sysfs-schemes: use damon_nr_accesses_mvsum() for damo regions SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 07/17] mm/damon/core: remove damon_warn_fix_nr_accesses_corruption() SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 08/17] mm/damon/core: remove damon_verify_reset_aggregated() SeongJae Park
2026-06-21 16:06   ` sashiko-bot
2026-06-21 20:24     ` SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 09/17] mm/damon/core: remove damon_verify_merge_regions_of() SeongJae Park
2026-06-21 18:09   ` sashiko-bot
2026-06-21 20:35     ` SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 10/17] mm/damon/tests/core-kunit: remove nr_accesses_bp setup and tests SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 11/17] selftests/damon/drgn_dump_damon_status: do not dump nr_accesses_bp SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 12/17] mm/damon/core: remove nr_accesses_bp setups and updates SeongJae Park
2026-06-21 18:10   ` sashiko-bot
2026-06-21 20:37     ` SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 13/17] mm/damon/core: remove attrs param from damon_update_region_access_rate() SeongJae Park
2026-06-21 16:14   ` sashiko-bot
2026-06-21 20:40     ` SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 14/17] mm/damonn/paddr: remove attrs param from __damon_pa_check_access() SeongJae Park
2026-06-21 16:07   ` sashiko-bot
2026-06-21 20:42     ` SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 15/17] mm/damon/vaddr: remove attrs param from __damon_va_check_access() SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 16/17] mm/damon/core: remove damon_moving_sum() and its unit test SeongJae Park
2026-06-21 15:57 ` [RFC PATCH v1.2 17/17] mm/damon: remove damon_region->nr_accesses_bp SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260621211302.11854-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=sashiko-bot@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox