* [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz [not found] <2026050324-shed-cultivate-ed89@gregkh> @ 2026-05-04 12:58 ` SeongJae Park 2026-05-04 18:10 ` sashiko-bot 0 siblings, 1 reply; 3+ messages in thread From: SeongJae Park @ 2026-05-04 12:58 UTC (permalink / raw) To: stable; +Cc: damon, SeongJae Park, Andrew Morton When the throughput of a DAMOS scheme is very slow, DAMOS time quota can make the effective size quota smaller than damon_ctx->min_region_sz. In the case, damos_apply_scheme() will skip applying the action, because the action is tried at region level, which requires >=min_region_sz size. That is, the quota is effectively exceeded for the quota charge window. Because no action will be applied, the total_charged_sz and total_charged_ns are also not updated. damos_set_effective_quota() will try to update the effective size quota before starting the next charge window. However, because the total_charged_sz and total_charged_ns have not updated, the throughput and effective size quota are also not changed. Since effective size quota can only be decreased, other effective size quota update factors including DAMOS quota goals and size quota cannot make any change, either. As a result, the scheme is unexpectedly deactivated until the user notices and mitigates the situation. The users can mitigate this situation by changing the time quota online or re-install the scheme. While the mitigation is somewhat straightforward, finding the situation would be challenging, because DAMON is not providing good observabilities for that. Even if such observability is provided, doing the additional monitoring and the mitigation is somewhat cumbersome and not aligned to the intention of the time quota. The time quota was intended to help reduce the user's administration overhead. Fix the problem by setting time quota-modified effective size quota be at least min_region_sz always. The issue was discovered [1] by sashiko. Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1] Fixes: 1cd243030059 ("mm/damon/schemes: implement time quota") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 5.16.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 8bbde987c2b84f80da0853f739f0a920386f8b99) Signed-off-by: SeongJae Park <sj@kernel.org> --- mm/damon/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/damon/core.c b/mm/damon/core.c index ab5c351b276c..94f8450f6e84 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -802,6 +802,7 @@ static void damos_set_effective_quota(struct damos_quota *quota) else throughput = PAGE_SIZE * 1024; esz = throughput * quota->ms; + esz = max(DAMON_MIN_REGION, esz); if (quota->sz && quota->sz < esz) esz = quota->sz; -- 2.47.3 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz 2026-05-04 12:58 ` [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz SeongJae Park @ 2026-05-04 18:10 ` sashiko-bot 2026-05-05 15:00 ` SeongJae Park 0 siblings, 1 reply; 3+ messages in thread From: sashiko-bot @ 2026-05-04 18:10 UTC (permalink / raw) To: SeongJae Park; +Cc: damon Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] Unmitigated integer overflow in the `throughput` calculation permanently breaks time quotas. - [High] Incorrect initialization of `charged_from` completely bypasses quotas for the first 5 minutes of system boot. -- commit 558d8bfd17c7ba00954bbbf18a4eb2783d713b59 Author: SeongJae Park <sj@kernel.org> mm/damon/core: disallow time-quota setting zero esz This commit addresses an issue where DAMOS time quotas could make the effective size quota smaller than the minimum region size. By forcing the effective size quota to be at least DAMON_MIN_REGION, it prevents schemes from being unexpectedly deactivated. Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1] > diff --git a/mm/damon/core.c b/mm/damon/core.c > index ab5c351b276ce..94f8450f6e843 100644 > --- a/mm/damon/core.c > +++ b/mm/damon/core.c > @@ -802,6 +802,7 @@ static void damos_set_effective_quota(struct damos_quota *quota) > else > throughput = PAGE_SIZE * 1024; > esz = throughput * quota->ms; > + esz = max(DAMON_MIN_REGION, esz); Can the throughput calculation overflow here? Looking at damos_set_effective_quota() before this change, throughput is calculated using: throughput = quota->total_charged_sz * 1000000 / quota->total_charged_ns; Both total_charged_sz and total_charged_ns accumulate indefinitely over the scheme's lifetime. The intermediate multiplication by 1000000 is performed in unsigned long precision. Could this overflow after a large amount of memory is processed? For instance, it could wrap around after around 18.4 TB on 64-bit systems, or just 4KB on 32-bit systems where unsigned long is 32-bit. When the multiplication wraps around, the numerator becomes artificially small, causing the computed throughput to collapse to near-zero. While clamping esz to DAMON_MIN_REGION prevents total deactivation, does the overflow permanently trap the scheme into processing exactly one minimal region per charge window? Also related to the effective size quota updates, is there a regression during early system boot? Looking at damos_quota_init_priv(), quota->charged_from is initialized to 0 rather than the current jiffies. In kdamond_apply_schemes(), the charge window only starts when: time_after_eq(jiffies, quota->charged_from + interval) Because jiffies starts at INITIAL_JIFFIES (which casts to a negative long, typically -300*HZ), does time_after_eq() fail for the first 5 minutes of system uptime? During this 5-minute period, damos_set_effective_quota() would never be invoked, leaving quota->esz at its default initialized value of 0. Because quota->esz == 0 is treated as no limit by the scheme application checks, do early-boot schemes (like damon_reclaim) run entirely unthrottled for the first 5 minutes, completely ignoring user-configured CPU and size quotas? > > if (quota->sz && quota->sz < esz) > esz = quota->sz; -- Sashiko AI review · https://sashiko.dev/#/patchset/20260504125808.22145-1-sj@kernel.org?part=1 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz 2026-05-04 18:10 ` sashiko-bot @ 2026-05-05 15:00 ` SeongJae Park 0 siblings, 0 replies; 3+ messages in thread From: SeongJae Park @ 2026-05-05 15:00 UTC (permalink / raw) To: sashiko-bot; +Cc: SeongJae Park, damon On Mon, 04 May 2026 18:10:48 +0000 sashiko-bot@kernel.org wrote: > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: > - [High] Unmitigated integer overflow in the `throughput` calculation permanently breaks time quotas. > - [High] Incorrect initialization of `charged_from` completely bypasses quotas for the first 5 minutes of system boot. No real issues for this patch. > -- > > commit 558d8bfd17c7ba00954bbbf18a4eb2783d713b59 > Author: SeongJae Park <sj@kernel.org> > mm/damon/core: disallow time-quota setting zero esz > > This commit addresses an issue where DAMOS time quotas could make the > effective size quota smaller than the minimum region size. By forcing > the effective size quota to be at least DAMON_MIN_REGION, it prevents > schemes from being unexpectedly deactivated. > > Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org > Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1] > > > diff --git a/mm/damon/core.c b/mm/damon/core.c > > index ab5c351b276ce..94f8450f6e843 100644 > > --- a/mm/damon/core.c > > +++ b/mm/damon/core.c > > @@ -802,6 +802,7 @@ static void damos_set_effective_quota(struct damos_quota *quota) > > else > > throughput = PAGE_SIZE * 1024; > > esz = throughput * quota->ms; > > + esz = max(DAMON_MIN_REGION, esz); > > Can the throughput calculation overflow here? > > Looking at damos_set_effective_quota() before this change, throughput is > calculated using: > > throughput = quota->total_charged_sz * 1000000 / quota->total_charged_ns; > > Both total_charged_sz and total_charged_ns accumulate indefinitely over the > scheme's lifetime. The intermediate multiplication by 1000000 is performed > in unsigned long precision. > > Could this overflow after a large amount of memory is processed? For > instance, it could wrap around after around 18.4 TB on 64-bit systems, or > just 4KB on 32-bit systems where unsigned long is 32-bit. > > When the multiplication wraps around, the numerator becomes artificially > small, causing the computed throughput to collapse to near-zero. While > clamping esz to DAMON_MIN_REGION prevents total deactivation, does the > overflow permanently trap the scheme into processing exactly one minimal > region per charge window? Good catch. But not this patch's fault. On 32bit, use of addr_unit would mitigate the issue so that reasonable amount of progress can be made. Also, the impact should not be significant. > > Also related to the effective size quota updates, is there a regression > during early system boot? > > Looking at damos_quota_init_priv(), quota->charged_from is initialized > to 0 rather than the current jiffies. > > In kdamond_apply_schemes(), the charge window only starts when: > > time_after_eq(jiffies, quota->charged_from + interval) > > Because jiffies starts at INITIAL_JIFFIES (which casts to a negative long, > typically -300*HZ), does time_after_eq() fail for the first 5 minutes of > system uptime? > > During this 5-minute period, damos_set_effective_quota() would never be > invoked, leaving quota->esz at its default initialized value of 0. > > Because quota->esz == 0 is treated as no limit by the scheme application > checks, do early-boot schemes (like damon_reclaim) run entirely > unthrottled for the first 5 minutes, completely ignoring user-configured > CPU and size quotas? We initialize charged_from as jiffies for every charge window, on damos_adjust_quota(). So the issue shouldn't exist. Thanks, SJ [...] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-05 15:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2026050324-shed-cultivate-ed89@gregkh>
2026-05-04 12:58 ` [PATCH 6.1.y] mm/damon/core: disallow time-quota setting zero esz SeongJae Park
2026-05-04 18:10 ` sashiko-bot
2026-05-05 15:00 ` SeongJae Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox