* [PATCH 6.12.y] mm/damon/core: disallow time-quota setting zero esz [not found] <2026050313-giddy-quizzical-4cc1@gregkh> @ 2026-05-04 12:52 ` SeongJae Park 2026-05-04 18:02 ` sashiko-bot 0 siblings, 1 reply; 3+ messages in thread From: SeongJae Park @ 2026-05-04 12:52 UTC (permalink / raw) To: stable; +Cc: damon, SeongJae Park, Andrew Morton When the throughput of a DAMOS scheme is very slow, DAMOS time quota can make the effective size quota smaller than damon_ctx->min_region_sz. In the case, damos_apply_scheme() will skip applying the action, because the action is tried at region level, which requires >=min_region_sz size. That is, the quota is effectively exceeded for the quota charge window. Because no action will be applied, the total_charged_sz and total_charged_ns are also not updated. damos_set_effective_quota() will try to update the effective size quota before starting the next charge window. However, because the total_charged_sz and total_charged_ns have not updated, the throughput and effective size quota are also not changed. Since effective size quota can only be decreased, other effective size quota update factors including DAMOS quota goals and size quota cannot make any change, either. As a result, the scheme is unexpectedly deactivated until the user notices and mitigates the situation. The users can mitigate this situation by changing the time quota online or re-install the scheme. While the mitigation is somewhat straightforward, finding the situation would be challenging, because DAMON is not providing good observabilities for that. Even if such observability is provided, doing the additional monitoring and the mitigation is somewhat cumbersome and not aligned to the intention of the time quota. The time quota was intended to help reduce the user's administration overhead. Fix the problem by setting time quota-modified effective size quota be at least min_region_sz always. The issue was discovered [1] by sashiko. Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1] Fixes: 1cd243030059 ("mm/damon/schemes: implement time quota") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 5.16.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 8bbde987c2b84f80da0853f739f0a920386f8b99) Signed-off-by: SeongJae Park <sj@kernel.org> --- mm/damon/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/damon/core.c b/mm/damon/core.c index ed2b75023181..69f8244324b3 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -1577,6 +1577,7 @@ static void damos_set_effective_quota(struct damos_quota *quota) esz = min(throughput * quota->ms, esz); else esz = throughput * quota->ms; + esz = max(DAMON_MIN_REGION, esz); } if (quota->sz && quota->sz < esz) -- 2.47.3 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH 6.12.y] mm/damon/core: disallow time-quota setting zero esz 2026-05-04 12:52 ` [PATCH 6.12.y] mm/damon/core: disallow time-quota setting zero esz SeongJae Park @ 2026-05-04 18:02 ` sashiko-bot 2026-05-06 16:59 ` SeongJae Park 0 siblings, 1 reply; 3+ messages in thread From: sashiko-bot @ 2026-05-04 18:02 UTC (permalink / raw) To: SeongJae Park; +Cc: damon Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] The patch masks a severe pre-existing integer overflow in the time quota throughput calculation, which permanently throttles the scheme instead of fixing the root cause. - [Medium] The patch forces the effective size quota to be at least DAMON_MIN_REGION when a time quota is configured, breaking the ability of quota goals to intentionally pause the scheme. -- commit 648433abd4b11fd3f4a77018e7e47105abac1ce7 Author: SeongJae Park <sj@kernel.org> mm/damon/core: disallow time-quota setting zero esz When the throughput of a DAMOS scheme is very slow, the DAMOS time quota can make the effective size quota smaller than the minimum region size, preventing actions from being applied. This commit fixes the issue by setting the time quota-modified effective size quota to at least DAMON_MIN_REGION. Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1] > diff --git a/mm/damon/core.c b/mm/damon/core.c > index ed2b750231810..69f8244324b36 100644 > --- a/mm/damon/core.c > +++ b/mm/damon/core.c > @@ -1577,6 +1577,7 @@ static void damos_set_effective_quota(struct damos_quota *quota) > esz = min(throughput * quota->ms, esz); > else > esz = throughput * quota->ms; > + esz = max(DAMON_MIN_REGION, esz); Does this cause a regression by breaking the ability of quota goals to intentionally pause the scheme? If a dynamic quota goal determines that actions should be paused and requests an esz smaller than DAMON_MIN_REGION, this max() macro forcibly overrides it and bumps the quota back to DAMON_MIN_REGION. This causes the scheme to process at least 1 page per charge window instead of fully pausing. Could the max() boundary be applied exclusively to the throughput-derived quota before it is min()'ed with the goal-derived quota? Additionally, does this code mask a pre-existing integer overflow regression in the throughput calculation? Looking earlier in damos_set_effective_quota(): if (quota->total_charged_ns) throughput = quota->total_charged_sz * 1000000 / quota->total_charged_ns; Because total_charged_sz and total_charged_ns are continuously accumulated over the scheme's lifetime, total_charged_sz * 1000000 will overflow an unsigned long on 64-bit systems after processing around 18.4 TB of memory. When the numerator overflows, it wraps modulo 2^64 while total_charged_ns continues to grow. This causes the calculated throughput to collapse to near-zero. Also, on 32-bit systems, total_charged_ns stores nanoseconds and overflows a 32-bit unsigned long after 4.29 seconds of accumulated action time, causing erratic throughput calculations. By clamping the output to DAMON_MIN_REGION here, the patch seems to permanently throttle the scheme to process exactly 1 page per charge window once the overflow occurs, rather than fixing the underlying throughput collapse. > } > > if (quota->sz && quota->sz < esz) -- Sashiko AI review · https://sashiko.dev/#/patchset/20260504125220.15509-1-sj@kernel.org?part=1 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 6.12.y] mm/damon/core: disallow time-quota setting zero esz 2026-05-04 18:02 ` sashiko-bot @ 2026-05-06 16:59 ` SeongJae Park 0 siblings, 0 replies; 3+ messages in thread From: SeongJae Park @ 2026-05-06 16:59 UTC (permalink / raw) To: sashiko-bot; +Cc: SeongJae Park, damon On Mon, 04 May 2026 18:02:15 +0000 sashiko-bot@kernel.org wrote: > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: > - [High] The patch masks a severe pre-existing integer overflow in the time quota throughput calculation, which permanently throttles the scheme instead of fixing the root cause. > - [Medium] The patch forces the effective size quota to be at least DAMON_MIN_REGION when a time quota is configured, breaking the ability of quota goals to intentionally pause the scheme. No issue for this patch. Please read below for more details. > -- > > commit 648433abd4b11fd3f4a77018e7e47105abac1ce7 > Author: SeongJae Park <sj@kernel.org> > > mm/damon/core: disallow time-quota setting zero esz > > When the throughput of a DAMOS scheme is very slow, the DAMOS time quota can > make the effective size quota smaller than the minimum region size, preventing > actions from being applied. This commit fixes the issue by setting the time > quota-modified effective size quota to at least DAMON_MIN_REGION. > > Link: https://lore.kernel.org/20260407003153.79589-1-sj@kernel.org > Link: https://lore.kernel.org/20260405192504.110014-1-sj@kernel.org [1] > > > diff --git a/mm/damon/core.c b/mm/damon/core.c > > index ed2b750231810..69f8244324b36 100644 > > --- a/mm/damon/core.c > > +++ b/mm/damon/core.c > > @@ -1577,6 +1577,7 @@ static void damos_set_effective_quota(struct damos_quota *quota) > > esz = min(throughput * quota->ms, esz); > > else > > esz = throughput * quota->ms; > > + esz = max(DAMON_MIN_REGION, esz); > > Does this cause a regression by breaking the ability of quota goals to > intentionally pause the scheme? > > If a dynamic quota goal determines that actions should be paused and requests > an esz smaller than DAMON_MIN_REGION, this max() macro forcibly overrides it > and bumps the quota back to DAMON_MIN_REGION. > > This causes the scheme to process at least 1 page per charge window instead > of fully pausing. > > Could the max() boundary be applied exclusively to the throughput-derived quota > before it is min()'ed with the goal-derived quota? No, this should be fine. This is not a very technically intended behavior, but letting DAMOS process one page per interval only when time quota is set should make no real issue. Let's keep this behavior. > > Additionally, does this code mask a pre-existing integer overflow regression > in the throughput calculation? > > Looking earlier in damos_set_effective_quota(): > > if (quota->total_charged_ns) > throughput = quota->total_charged_sz * 1000000 / > quota->total_charged_ns; > > Because total_charged_sz and total_charged_ns are continuously accumulated > over the scheme's lifetime, total_charged_sz * 1000000 will overflow an > unsigned long on 64-bit systems after processing around 18.4 TB of memory. > > When the numerator overflows, it wraps modulo 2^64 while total_charged_ns > continues to grow. This causes the calculated throughput to collapse to > near-zero. > > Also, on 32-bit systems, total_charged_ns stores nanoseconds and overflows > a 32-bit unsigned long after 4.29 seconds of accumulated action time, causing > erratic throughput calculations. > > By clamping the output to DAMON_MIN_REGION here, the patch seems to permanently > throttle the scheme to process exactly 1 page per charge window once the > overflow occurs, rather than fixing the underlying throughput collapse. Good catch, but not this patch's fault. Refer to my previous answer [1] to a similar Sashiko question. [1] https://lore.kernel.org/20260505150012.29007-1-sj@kernel.org Thanks, SJ [...] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-06 16:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2026050313-giddy-quizzical-4cc1@gregkh>
2026-05-04 12:52 ` [PATCH 6.12.y] mm/damon/core: disallow time-quota setting zero esz SeongJae Park
2026-05-04 18:02 ` sashiko-bot
2026-05-06 16:59 ` SeongJae Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox