DAMON development mailing list
 help / color / mirror / Atom feed
* [PATCH 6.6.y] mm/damon/core: disallow time-quota setting zero esz
       [not found] <2026050319-verbally-voltage-207f@gregkh>
@ 2026-05-04 12:55 ` SeongJae Park
  2026-05-04 14:26   ` sashiko-bot
  0 siblings, 1 reply; 3+ messages in thread
From: SeongJae Park @ 2026-05-04 12:55 UTC (permalink / raw)
  To: stable; +Cc: damon, SeongJae Park, Andrew Morton

When the throughput of a DAMOS scheme is very slow, DAMOS time quota can
make the effective size quota smaller than damon_ctx->min_region_sz.  In
the case, damos_apply_scheme() will skip applying the action, because the
action is tried at region level, which requires >=min_region_sz size.
That is, the quota is effectively exceeded for the quota charge window.

Because no action will be applied, the total_charged_sz and
total_charged_ns are also not updated.  damos_set_effective_quota() will
try to update the effective size quota before starting the next charge
window.  However, because the total_charged_sz and total_charged_ns have
not updated, the throughput and effective size quota are also not changed.
Since effective size quota can only be decreased, other effective size
quota update factors including DAMOS quota goals and size quota cannot
make any change, either.

As a result, the scheme is unexpectedly deactivated until the user notices
and mitigates the situation.  The users can mitigate this situation by
changing the time quota online or re-install the scheme.  While the
mitigation is somewhat straightforward, finding the situation would be
challenging, because DAMON is not providing good observabilities for that.
Even if such observability is provided, doing the additional monitoring
and the mitigation is somewhat cumbersome and not aligned to the intention
of the time quota.  The time quota was intended to help reduce the user's
administration overhead.

Fix the problem by setting time quota-modified effective size quota be at
least min_region_sz always.

The issue was discovered [1] by sashiko.

Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1]
Fixes: 1cd243030059 ("mm/damon/schemes: implement time quota")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.16.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 8bbde987c2b84f80da0853f739f0a920386f8b99)
Signed-off-by: SeongJae Park <sj@kernel.org>
---
 mm/damon/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/damon/core.c b/mm/damon/core.c
index 48747236c21c..12c2e584659a 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1026,6 +1026,7 @@ static void damos_set_effective_quota(struct damos_quota *quota)
 	else
 		throughput = PAGE_SIZE * 1024;
 	esz = throughput * quota->ms;
+	esz = max(DAMON_MIN_REGION, esz);
 
 	if (quota->sz && quota->sz < esz)
 		esz = quota->sz;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 6.6.y] mm/damon/core: disallow time-quota setting zero esz
  2026-05-04 12:55 ` [PATCH 6.6.y] mm/damon/core: disallow time-quota setting zero esz SeongJae Park
@ 2026-05-04 14:26   ` sashiko-bot
  2026-05-06 17:04     ` SeongJae Park
  0 siblings, 1 reply; 3+ messages in thread
From: sashiko-bot @ 2026-05-04 14:26 UTC (permalink / raw)
  To: SeongJae Park; +Cc: damon

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Integer overflow in throughput estimation permanently breaks DAMOS time quota mechanism.
--

commit 352051c643d14056b79b910e005f54b4d071e250
Author: SeongJae Park <sj@kernel.org>

mm/damon/core: disallow time-quota setting zero esz

This patch fixes an issue where a slow DAMOS scheme causes the effective size
quota to fall below the minimum region size, which prevents actions from being
applied. Because no actions are applied, the charge statistics stop updating
and the scheme is unexpectedly deactivated. The fix ensures the effective size
quota is always clamped to at least the minimum region size.

Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1]

> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index 48747236c21c..12c2e584659a 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -1026,6 +1026,7 @@ static void damos_set_effective_quota(struct damos_quota *quota)
>  	else
>  		throughput = PAGE_SIZE * 1024;
>  	esz = throughput * quota->ms;

Since total_charged_sz accumulates lifetime processed bytes without being
reset, can the throughput calculation that precedes this line cause a
regression by overflowing?

In damos_set_effective_quota(), throughput is calculated as:
throughput = quota->total_charged_sz * 1000000 / quota->total_charged_ns;

On 32-bit architectures, total_charged_sz * 1000000 would overflow when
total_charged_sz reaches just 4295 bytes. On 64-bit architectures, it would
overflow after about 18.44 TB of memory processed.

Once total_charged_sz * 1000000 wraps around, the numerator becomes extremely
small compared to the denominator (which on 64-bit takes 584 years to wrap),
causing the computed throughput to instantly drop to near zero.

> +	esz = max(DAMON_MIN_REGION, esz);

Although this clamping ensures esz never drops below DAMON_MIN_REGION, would
the scheme become permanently throttled down to this minimum region size
until the numerator catches up again after another 18 TB of processing?

>  
>  	if (quota->sz && quota->sz < esz)
>  		esz = quota->sz;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260504125529.18220-1-sj@kernel.org?part=1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 6.6.y] mm/damon/core: disallow time-quota setting zero esz
  2026-05-04 14:26   ` sashiko-bot
@ 2026-05-06 17:04     ` SeongJae Park
  0 siblings, 0 replies; 3+ messages in thread
From: SeongJae Park @ 2026-05-06 17:04 UTC (permalink / raw)
  To: sashiko-bot; +Cc: SeongJae Park, damon

On Mon, 04 May 2026 14:26:22 +0000 sashiko-bot@kernel.org wrote:

> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Integer overflow in throughput estimation permanently breaks DAMOS time quota mechanism.

Mild issue that not an issue for this patch.  Read below for more details.

> --
> 
> commit 352051c643d14056b79b910e005f54b4d071e250
> Author: SeongJae Park <sj@kernel.org>
> 
> mm/damon/core: disallow time-quota setting zero esz
> 
> This patch fixes an issue where a slow DAMOS scheme causes the effective size
> quota to fall below the minimum region size, which prevents actions from being
> applied. Because no actions are applied, the charge statistics stop updating
> and the scheme is unexpectedly deactivated. The fix ensures the effective size
> quota is always clamped to at least the minimum region size.
> 
> Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org
> Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1]
> 
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index 48747236c21c..12c2e584659a 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> > @@ -1026,6 +1026,7 @@ static void damos_set_effective_quota(struct damos_quota *quota)
> >  	else
> >  		throughput = PAGE_SIZE * 1024;
> >  	esz = throughput * quota->ms;
> 
> Since total_charged_sz accumulates lifetime processed bytes without being
> reset, can the throughput calculation that precedes this line cause a
> regression by overflowing?
> 
> In damos_set_effective_quota(), throughput is calculated as:
> throughput = quota->total_charged_sz * 1000000 / quota->total_charged_ns;
> 
> On 32-bit architectures, total_charged_sz * 1000000 would overflow when
> total_charged_sz reaches just 4295 bytes. On 64-bit architectures, it would
> overflow after about 18.44 TB of memory processed.
> 
> Once total_charged_sz * 1000000 wraps around, the numerator becomes extremely
> small compared to the denominator (which on 64-bit takes 584 years to wrap),
> causing the computed throughput to instantly drop to near zero.
> 
> > +	esz = max(DAMON_MIN_REGION, esz);
> 
> Although this clamping ensures esz never drops below DAMON_MIN_REGION, would
> the scheme become permanently throttled down to this minimum region size
> until the numerator catches up again after another 18 TB of processing?

Not this patch's fault.  The user impact should also be trivial.  Please refer
to my previous answer [1] to the similar Sashiko question.

[1] https://lore.kernel.org/20260505150012.29007-1-sj@kernel.org


Thanks,
SJ

[...]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-06 17:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <2026050319-verbally-voltage-207f@gregkh>
2026-05-04 12:55 ` [PATCH 6.6.y] mm/damon/core: disallow time-quota setting zero esz SeongJae Park
2026-05-04 14:26   ` sashiko-bot
2026-05-06 17:04     ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox