* [PATCH 6.1.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values [not found] <2026051257-daytime-precise-afd3@gregkh> @ 2026-05-13 5:05 ` SeongJae Park 2026-05-14 2:17 ` sashiko-bot 2026-05-14 5:45 ` SeongJae Park 0 siblings, 2 replies; 4+ messages in thread From: SeongJae Park @ 2026-05-13 5:05 UTC (permalink / raw) To: stable; +Cc: SeongJae Park, damon, Liew Rui Yan, Andrew Morton Patch series "mm/damon/modules: detect and use fresh status", v3. DAMON modules including DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT commonly expose the kdamond running status via their parameters. Under certain scenarios including wrong user inputs and memory allocation failures, those parameter values can be stale. It can confuse users. For DAMON_RECLAIM and DAMON_LRU_SORT, it even makes the kdamond unable to be restarted before the system reboot. The problem comes from the fact that there are multiple events for the status changes and it is difficult to follow up all the scenarios. Fix the issue by detecting and using the status on demand, instead of using a cached status that is difficult to be updated. Patches 1-3 fix the bugs in DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT in the order. This patch (of 3): DAMON_RECLAIM updates 'enabled' and 'kdamond_pid' parameter values, which represents the running status of its kdamond, when the user explicitly requests start/stop of the kdamond. The kdamond can, however, be stopped in events other than the explicit user request in the following three events. 1. ctx->regions_score_histogram allocation failure at beginning of the execution, 2. damon_commit_ctx() failure due to invalid user input, and 3. damon_commit_ctx() failure due to its internal allocation failures. Hence, if the kdamond is stopped by the above three events, the values of the status parameters can be stale. Users could show the stale values and be confused. This is already bad, but the real consequence is worse. DAMON_RECLAIM avoids unnecessary damon_start() and damon_stop() calls based on the 'enabled' parameter value. And the update of 'enabled' parameter value depends on the damon_start() and damon_stop() call results. Hence, once the kdamond has stopped by the unintentional events, the user cannot restart the kdamond before the system reboot. For example, the issue can be reproduced via below steps. # cd /sys/module/damon_reclaim/parameters # # # start DAMON_RECLAIM # echo Y > enabled # ps -ef | grep kdamond root 806 2 0 17:53 ? 00:00:00 [kdamond.0] root 808 803 0 17:53 pts/4 00:00:00 grep kdamond # # # commit wrong input to stop kdamond withou explicit stop request # echo 3 > addr_unit # echo Y > commit_inputs bash: echo: write error: Invalid argument # # # confirm kdamond is stopped # ps -ef | grep kdamond root 811 803 0 17:53 pts/4 00:00:00 grep kdamond # # # users casn now show stable status # cat enabled Y # cat kdamond_pid 806 # # # even after fixing the wrong parameter, # # kdamond cannot be restarted. # echo 1 > addr_unit # echo Y > enabled # ps -ef | grep kdamond root 815 803 0 17:54 pts/4 00:00:00 grep kdamond The problem will only rarely happen in real and common setups for the following reasons. The allocation failures are unlikely in such setups since those allocations are arguably too small to fail. Also sane users on real production environments may not commit wrong input parameters. But once it happens, the consequence is quite bad. And the bug is a bug. The issue stems from the fact that there are multiple events that can change the status, and following all the events is challenging. Dynamically detect and use the fresh status for the parameters when those are requested. Link: https://lore.kernel.org/20260419161003.79176-1-sj@kernel.org Link: https://lore.kernel.org/20260419161003.79176-2-sj@kernel.org Fixes: e035c280f6df ("mm/damon/reclaim: support online inputs update") Co-developed-by: Liew Rui Yan <aethernet65535@gmail.com> Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com> Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 5.19.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 64a140afa5ed1c6f5ba6d451512cbdbbab1ba339) Signed-off-by: SeongJae Park <sj@kernel.org> --- This depends on other two backported patches [1,2]. Please apply this after those. [1] https://lore.kernel.org/20260513043039.173237-1-sj@kernel.org [2] https://lore.kernel.org/20260513043039.173237-2-sj@kernel.org mm/damon/reclaim.c | 74 +++++++++++++++++++++++++++++++++------------- 1 file changed, 53 insertions(+), 21 deletions(-) diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c index 7952a0b7f409d..dc435aefd44f5 100644 --- a/mm/damon/reclaim.c +++ b/mm/damon/reclaim.c @@ -100,15 +100,6 @@ module_param(monitor_region_start, ulong, 0600); static unsigned long monitor_region_end __read_mostly; module_param(monitor_region_end, ulong, 0600); -/* - * PID of the DAMON thread - * - * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread. - * Else, -1. - */ -static int kdamond_pid __read_mostly = -1; -module_param(kdamond_pid, int, 0400); - static struct damos_stat damon_reclaim_stat; DEFINE_DAMON_MODULES_DAMOS_STATS_PARAMS(damon_reclaim_stat, reclaim_tried_regions, reclaimed_regions, quota_exceeds); @@ -184,22 +175,14 @@ static int damon_reclaim_turn(bool on) { int err; - if (!on) { - err = damon_stop(&ctx, 1); - if (!err) - kdamond_pid = -1; - return err; - } + if (!on) + return damon_stop(&ctx, 1); err = damon_reclaim_apply_parameters(); if (err) return err; - err = damon_start(&ctx, 1, true); - if (err) - return err; - kdamond_pid = ctx->kdamond->pid; - return 0; + return damon_start(&ctx, 1, true); } static struct delayed_work damon_reclaim_timer; @@ -220,6 +203,13 @@ static DECLARE_DELAYED_WORK(damon_reclaim_timer, damon_reclaim_timer_fn); static bool damon_reclaim_initialized; +static bool damon_reclaim_enabled(void) +{ + if (!ctx) + return false; + return damon_is_running(ctx); +} + static int damon_reclaim_enabled_store(const char *val, const struct kernel_param *kp) { @@ -236,15 +226,57 @@ static int damon_reclaim_enabled_store(const char *val, return 0; } +static int damon_reclaim_enabled_load(char *buffer, + const struct kernel_param *kp) +{ + return sprintf(buffer, "%c\n", damon_reclaim_enabled() ? 'Y' : 'N'); +} + static const struct kernel_param_ops enabled_param_ops = { .set = damon_reclaim_enabled_store, - .get = param_get_bool, + .get = damon_reclaim_enabled_load, }; module_param_cb(enabled, &enabled_param_ops, &enabled, 0600); MODULE_PARM_DESC(enabled, "Enable or disable DAMON_RECLAIM (default: disabled)"); +static int damon_reclaim_kdamond_pid_store(const char *val, + const struct kernel_param *kp) +{ + /* + * kdamond_pid is read-only, but kernel command line could write it. + * Do nothing here. + */ + return 0; +} + +static int damon_reclaim_kdamond_pid_load(char *buffer, + const struct kernel_param *kp) +{ + int kdamond_pid = -1; + + if (ctx) { + kdamond_pid = damon_kdamond_pid(ctx); + if (kdamond_pid < 0) + kdamond_pid = -1; + } + return sprintf(buffer, "%d\n", kdamond_pid); +} + +static const struct kernel_param_ops kdamond_pid_param_ops = { + .set = damon_reclaim_kdamond_pid_store, + .get = damon_reclaim_kdamond_pid_load, +}; + +/* + * PID of the DAMON thread + * + * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread. + * Else, -1. + */ +module_param_cb(kdamond_pid, &kdamond_pid_param_ops, NULL, 0400); + static int damon_reclaim_handle_commit_inputs(void) { int err; -- 2.47.3 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 6.1.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values 2026-05-13 5:05 ` [PATCH 6.1.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values SeongJae Park @ 2026-05-14 2:17 ` sashiko-bot 2026-05-14 5:42 ` SeongJae Park 2026-05-14 5:45 ` SeongJae Park 1 sibling, 1 reply; 4+ messages in thread From: sashiko-bot @ 2026-05-14 2:17 UTC (permalink / raw) To: SeongJae Park; +Cc: damon Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] The patch fails to fix the state machine deadlock that prevents kdamond from restarting, leaving the bug described in the commit message unresolved. - [Low] The patch introduces calls to undefined functions `damon_is_running()` and `damon_kdamond_pid()`, causing a build failure. -- commit de09cefa1bba6b569eff518a49dbc5fb07b59452 Author: SeongJae Park <sj@kernel.org> mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values This patch updates DAMON_RECLAIM to dynamically detect the status of the enabled and kdamond_pid parameters. It aims to prevent users from seeing stale status values and being unable to restart the kdamond thread when it stops unintentionally. Link: https://lore.kernel.org/20260419161003.79176-1-sj@kernel.org Link: https://lore.kernel.org/20260419161003.79176-2-sj@kernel.org > For example, the issue can be reproduced via below steps. [ ... ] > # # even after fixing the wrong parameter, > # # kdamond cannot be restarted. > # echo 1 > addr_unit > # echo Y > enabled > # ps -ef | grep kdamond > root 815 803 0 17:54 pts/4 00:00:00 grep kdamond Does this patch actually resolve the restart issue described here? Looking at damon_reclaim_timer_fn(), the state machine still appears to deadlock if kdamond stops unintentionally: now_enabled = enabled; if (last_enabled != now_enabled) { if (!damon_reclaim_turn(now_enabled)) last_enabled = now_enabled; else enabled = last_enabled; } When kdamond dies, the global enabled and static last_enabled variables remain true. If a user writes 'Y' to enabled, the timer sees last_enabled == now_enabled (true == true) and exits without restarting kdamond. If a user writes 'N' to enabled, the timer calls damon_reclaim_turn(false), which invokes damon_stop(). Since the thread is already dead, damon_stop() returns -EPERM. The timer misinterprets this as a failure to stop and executes enabled = last_enabled, reverting enabled back to true. Could this leave the state machine permanently deadlocked? > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c > index 7952a0b7f409d..dc435aefd44f5 100644 > --- a/mm/damon/reclaim.c > +++ b/mm/damon/reclaim.c [ ... ] > @@ -220,6 +203,13 @@ static DECLARE_DELAYED_WORK(damon_reclaim_timer, damon_reclaim_timer_fn); > > static bool damon_reclaim_initialized; > > +static bool damon_reclaim_enabled(void) > +{ > + if (!ctx) > + return false; > + return damon_is_running(ctx); Are damon_is_running() and damon_kdamond_pid() defined anywhere in the tree? They don't appear to be declared in include/linux/damon.h or defined in mm/damon/core.c. Could this patch depend on prerequisite patches that might be missing? Without them, this causes implicit function declaration errors and unresolved symbol linker errors. -- Sashiko AI review · https://sashiko.dev/#/patchset/20260513050501.216835-1-sj@kernel.org?part=1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 6.1.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values 2026-05-14 2:17 ` sashiko-bot @ 2026-05-14 5:42 ` SeongJae Park 0 siblings, 0 replies; 4+ messages in thread From: SeongJae Park @ 2026-05-14 5:42 UTC (permalink / raw) To: sashiko-bot; +Cc: SeongJae Park, damon On Thu, 14 May 2026 02:17:24 +0000 sashiko-bot@kernel.org wrote: > Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: > - [High] The patch fails to fix the state machine deadlock that prevents kdamond from restarting, leaving the bug described in the commit message unresolved. > - [Low] The patch introduces calls to undefined functions `damon_is_running()` and `damon_kdamond_pid()`, causing a build failure. > -- > > commit de09cefa1bba6b569eff518a49dbc5fb07b59452 > Author: SeongJae Park <sj@kernel.org> > > mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values > > This patch updates DAMON_RECLAIM to dynamically detect the status of the > enabled and kdamond_pid parameters. It aims to prevent users from seeing > stale status values and being unable to restart the kdamond thread when > it stops unintentionally. > > Link: https://lore.kernel.org/20260419161003.79176-1-sj@kernel.org > Link: https://lore.kernel.org/20260419161003.79176-2-sj@kernel.org > > > For example, the issue can be reproduced via below steps. > [ ... ] > > # # even after fixing the wrong parameter, > > # # kdamond cannot be restarted. > > # echo 1 > addr_unit > > # echo Y > enabled > > # ps -ef | grep kdamond > > root 815 803 0 17:54 pts/4 00:00:00 grep kdamond > > Does this patch actually resolve the restart issue described here? > > Looking at damon_reclaim_timer_fn(), the state machine still appears to > deadlock if kdamond stops unintentionally: > > now_enabled = enabled; > if (last_enabled != now_enabled) { > if (!damon_reclaim_turn(now_enabled)) > last_enabled = now_enabled; > else > enabled = last_enabled; > } > > When kdamond dies, the global enabled and static last_enabled variables > remain true. > > If a user writes 'Y' to enabled, the timer sees last_enabled == now_enabled > (true == true) and exits without restarting kdamond. > > If a user writes 'N' to enabled, the timer calls damon_reclaim_turn(false), > which invokes damon_stop(). Since the thread is already dead, damon_stop() > returns -EPERM. The timer misinterprets this as a failure to stop and > executes enabled = last_enabled, reverting enabled back to true. > > Could this leave the state machine permanently deadlocked? Correct. The enabled_store() part porting was incomplete, similar to the matching lru_sort patch. I will post a complete porting. > > > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c > > index 7952a0b7f409d..dc435aefd44f5 100644 > > --- a/mm/damon/reclaim.c > > +++ b/mm/damon/reclaim.c > [ ... ] > > @@ -220,6 +203,13 @@ static DECLARE_DELAYED_WORK(damon_reclaim_timer, damon_reclaim_timer_fn); > > > > static bool damon_reclaim_initialized; > > > > +static bool damon_reclaim_enabled(void) > > +{ > > + if (!ctx) > > + return false; > > + return damon_is_running(ctx); > > Are damon_is_running() and damon_kdamond_pid() defined anywhere in the tree? > > They don't appear to be declared in include/linux/damon.h or defined in > mm/damon/core.c. > > Could this patch depend on prerequisite patches that might be missing? > Without them, this causes implicit function declaration errors and > unresolved symbol linker errors. I informed this and where the dependent patches can be found, on the commentary area of the patch. > > -- > Sashiko AI review · https://sashiko.dev/#/patchset/20260513050501.216835-1-sj@kernel.org?part=1 Thanks, SJ ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 6.1.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values 2026-05-13 5:05 ` [PATCH 6.1.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values SeongJae Park 2026-05-14 2:17 ` sashiko-bot @ 2026-05-14 5:45 ` SeongJae Park 1 sibling, 0 replies; 4+ messages in thread From: SeongJae Park @ 2026-05-14 5:45 UTC (permalink / raw) To: SeongJae Park; +Cc: stable, damon, Liew Rui Yan, Andrew Morton Hello, On Tue, 12 May 2026 22:05:00 -0700 SeongJae Park <sj@kernel.org> wrote: [...] > Link: https://lore.kernel.org/20260419161003.79176-1-sj@kernel.org > Link: https://lore.kernel.org/20260419161003.79176-2-sj@kernel.org > Fixes: e035c280f6df ("mm/damon/reclaim: support online inputs update") > Co-developed-by: Liew Rui Yan <aethernet65535@gmail.com> > Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com> > Signed-off-by: SeongJae Park <sj@kernel.org> > Cc: <stable@vger.kernel.org> # 5.19.x > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > (cherry picked from commit 64a140afa5ed1c6f5ba6d451512cbdbbab1ba339) > Signed-off-by: SeongJae Park <sj@kernel.org> Sashiko found this is incompletely porting the enabled_store() part change. Please read my reply [1] to Sashioko for more details. I will send v2 of this patch as a reply to this mail. [1] https://lore.kernel.org/20260514054253.116346-1-sj@kernel.org Thanks, SJ [...] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-14 5:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <2026051257-daytime-precise-afd3@gregkh>
2026-05-13 5:05 ` [PATCH 6.1.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values SeongJae Park
2026-05-14 2:17 ` sashiko-bot
2026-05-14 5:42 ` SeongJae Park
2026-05-14 5:45 ` SeongJae Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox