DAMON development mailing list
 help / color / mirror / Atom feed
* [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz
       [not found] <2026050324-shed-cultivate-ed89@gregkh>
@ 2026-05-04 12:58 ` SeongJae Park
  2026-05-04 18:10   ` sashiko-bot
  0 siblings, 1 reply; 3+ messages in thread
From: SeongJae Park @ 2026-05-04 12:58 UTC (permalink / raw)
  To: stable; +Cc: damon, SeongJae Park, Andrew Morton

When the throughput of a DAMOS scheme is very slow, DAMOS time quota can
make the effective size quota smaller than damon_ctx->min_region_sz.  In
the case, damos_apply_scheme() will skip applying the action, because the
action is tried at region level, which requires >=min_region_sz size.
That is, the quota is effectively exceeded for the quota charge window.

Because no action will be applied, the total_charged_sz and
total_charged_ns are also not updated.  damos_set_effective_quota() will
try to update the effective size quota before starting the next charge
window.  However, because the total_charged_sz and total_charged_ns have
not updated, the throughput and effective size quota are also not changed.
Since effective size quota can only be decreased, other effective size
quota update factors including DAMOS quota goals and size quota cannot
make any change, either.

As a result, the scheme is unexpectedly deactivated until the user notices
and mitigates the situation.  The users can mitigate this situation by
changing the time quota online or re-install the scheme.  While the
mitigation is somewhat straightforward, finding the situation would be
challenging, because DAMON is not providing good observabilities for that.
Even if such observability is provided, doing the additional monitoring
and the mitigation is somewhat cumbersome and not aligned to the intention
of the time quota.  The time quota was intended to help reduce the user's
administration overhead.

Fix the problem by setting time quota-modified effective size quota be at
least min_region_sz always.

The issue was discovered [1] by sashiko.

Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1]
Fixes: 1cd243030059 ("mm/damon/schemes: implement time quota")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8bbde987c2b84f80da0853f739f0a920386f8b99)
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index ab5c351b276c..94f8450f6e84 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -802,6 +802,7 @@ static void damos_set_effective_quota(struct damos_quota *quota)
 	else
 		throughput = PAGE_SIZE * 1024;
 	esz = throughput * quota->ms;
+	esz = max(DAMON_MIN_REGION, esz);
 
 	if (quota->sz && quota->sz < esz)
 		esz = quota->sz;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz
  2026-05-04 12:58 ` [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz SeongJae Park
@ 2026-05-04 18:10   ` sashiko-bot
  2026-05-05 15:00     ` SeongJae Park
  0 siblings, 1 reply; 3+ messages in thread
From: sashiko-bot @ 2026-05-04 18:10 UTC (permalink / raw)
  To: SeongJae Park; +Cc: damon

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unmitigated integer overflow in the `throughput` calculation permanently breaks time quotas.
- [High] Incorrect initialization of `charged_from` completely bypasses quotas for the first 5 minutes of system boot.
--

commit 558d8bfd17c7ba00954bbbf18a4eb2783d713b59
Author: SeongJae Park <sj@kernel.org>
mm/damon/core: disallow time-quota setting zero esz

This commit addresses an issue where DAMOS time quotas could make the
effective size quota smaller than the minimum region size. By forcing
the effective size quota to be at least DAMON_MIN_REGION, it prevents
schemes from being unexpectedly deactivated.

Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1]

> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index ab5c351b276ce..94f8450f6e843 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -802,6 +802,7 @@ static void damos_set_effective_quota(struct damos_quota *quota)
>  	else
>  		throughput = PAGE_SIZE * 1024;
>  	esz = throughput * quota->ms;
> +	esz = max(DAMON_MIN_REGION, esz);

Can the throughput calculation overflow here?

Looking at damos_set_effective_quota() before this change, throughput is
calculated using:

    throughput = quota->total_charged_sz * 1000000 / quota->total_charged_ns;

Both total_charged_sz and total_charged_ns accumulate indefinitely over the
scheme's lifetime. The intermediate multiplication by 1000000 is performed
in unsigned long precision.

Could this overflow after a large amount of memory is processed? For
instance, it could wrap around after around 18.4 TB on 64-bit systems, or
just 4KB on 32-bit systems where unsigned long is 32-bit.

When the multiplication wraps around, the numerator becomes artificially
small, causing the computed throughput to collapse to near-zero. While
clamping esz to DAMON_MIN_REGION prevents total deactivation, does the
overflow permanently trap the scheme into processing exactly one minimal
region per charge window?

Also related to the effective size quota updates, is there a regression
during early system boot?

Looking at damos_quota_init_priv(), quota->charged_from is initialized
to 0 rather than the current jiffies.

In kdamond_apply_schemes(), the charge window only starts when:

    time_after_eq(jiffies, quota->charged_from + interval)

Because jiffies starts at INITIAL_JIFFIES (which casts to a negative long,
typically -300*HZ), does time_after_eq() fail for the first 5 minutes of
system uptime?

During this 5-minute period, damos_set_effective_quota() would never be
invoked, leaving quota->esz at its default initialized value of 0.

Because quota->esz == 0 is treated as no limit by the scheme application
checks, do early-boot schemes (like damon_reclaim) run entirely
unthrottled for the first 5 minutes, completely ignoring user-configured
CPU and size quotas?

>  
>  	if (quota->sz && quota->sz < esz)
>  		esz = quota->sz;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504125808.22145-1-sj@kernel.org?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz
  2026-05-04 18:10   ` sashiko-bot
@ 2026-05-05 15:00     ` SeongJae Park
  0 siblings, 0 replies; 3+ messages in thread
From: SeongJae Park @ 2026-05-05 15:00 UTC (permalink / raw)
  To: sashiko-bot; +Cc: SeongJae Park, damon

On Mon, 04 May 2026 18:10:48 +0000 sashiko-bot@kernel.org wrote:

> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [High] Unmitigated integer overflow in the `throughput` calculation permanently breaks time quotas.
> - [High] Incorrect initialization of `charged_from` completely bypasses quotas for the first 5 minutes of system boot.

No real issues for this patch.

> --
> 
> commit 558d8bfd17c7ba00954bbbf18a4eb2783d713b59
> Author: SeongJae Park <sj@kernel.org>
> mm/damon/core: disallow time-quota setting zero esz
> 
> This commit addresses an issue where DAMOS time quotas could make the
> effective size quota smaller than the minimum region size. By forcing
> the effective size quota to be at least DAMON_MIN_REGION, it prevents
> schemes from being unexpectedly deactivated.
> 
> Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
> Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1]
> 
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index ab5c351b276ce..94f8450f6e843 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> > @@ -802,6 +802,7 @@ static void damos_set_effective_quota(struct damos_quota *quota)
> >  	else
> >  		throughput = PAGE_SIZE * 1024;
> >  	esz = throughput * quota->ms;
> > +	esz = max(DAMON_MIN_REGION, esz);
> 
> Can the throughput calculation overflow here?
> 
> Looking at damos_set_effective_quota() before this change, throughput is
> calculated using:
> 
>     throughput = quota->total_charged_sz * 1000000 / quota->total_charged_ns;
> 
> Both total_charged_sz and total_charged_ns accumulate indefinitely over the
> scheme's lifetime. The intermediate multiplication by 1000000 is performed
> in unsigned long precision.
> 
> Could this overflow after a large amount of memory is processed? For
> instance, it could wrap around after around 18.4 TB on 64-bit systems, or
> just 4KB on 32-bit systems where unsigned long is 32-bit.
> 
> When the multiplication wraps around, the numerator becomes artificially
> small, causing the computed throughput to collapse to near-zero. While
> clamping esz to DAMON_MIN_REGION prevents total deactivation, does the
> overflow permanently trap the scheme into processing exactly one minimal
> region per charge window?

Good catch.  But not this patch's fault.  On 32bit, use of addr_unit would
mitigate the issue so that reasonable amount of progress can be made.  Also,
the impact should not be significant.

> 
> Also related to the effective size quota updates, is there a regression
> during early system boot?
> 
> Looking at damos_quota_init_priv(), quota->charged_from is initialized
> to 0 rather than the current jiffies.
> 
> In kdamond_apply_schemes(), the charge window only starts when:
> 
>     time_after_eq(jiffies, quota->charged_from + interval)
> 
> Because jiffies starts at INITIAL_JIFFIES (which casts to a negative long,
> typically -300*HZ), does time_after_eq() fail for the first 5 minutes of
> system uptime?
> 
> During this 5-minute period, damos_set_effective_quota() would never be
> invoked, leaving quota->esz at its default initialized value of 0.
> 
> Because quota->esz == 0 is treated as no limit by the scheme application
> checks, do early-boot schemes (like damon_reclaim) run entirely
> unthrottled for the first 5 minutes, completely ignoring user-configured
> CPU and size quotas?

We initialize charged_from as jiffies for every charge window, on
damos_adjust_quota().  So the issue shouldn't exist.


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-05 15:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2026050324-shed-cultivate-ed89@gregkh>
2026-05-04 12:58 ` [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz SeongJae Park
2026-05-04 18:10   ` sashiko-bot
2026-05-05 15:00     ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox