Linux kernel -stable discussions
 help / color / mirror / Atom feed
* [PATCH 6.12.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
  2026-05-12 14:05 FAILED: patch "[PATCH] mm/damon/reclaim: detect and use fresh enabled and" failed to apply to 6.12-stable tree gregkh
@ 2026-05-13  4:52 ` SeongJae Park
  0 siblings, 0 replies; 2+ messages in thread
From: SeongJae Park @ 2026-05-13  4:52 UTC (permalink / raw)
  To: stable; +Cc: SeongJae Park, damon, Liew Rui Yan, Andrew Morton

Patch series "mm/damon/modules: detect and use fresh status", v3.

DAMON modules including DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT
commonly expose the kdamond running status via their parameters.  Under
certain scenarios including wrong user inputs and memory allocation
failures, those parameter values can be stale.  It can confuse users.  For
DAMON_RECLAIM and DAMON_LRU_SORT, it even makes the kdamond unable to be
restarted before the system reboot.

The problem comes from the fact that there are multiple events for the
status changes and it is difficult to follow up all the scenarios.  Fix
the issue by detecting and using the status on demand, instead of using a
cached status that is difficult to be updated.

Patches 1-3 fix the bugs in DAMON_RECLAIM, DAMON_LRU_SORT and DAMON_STAT
in the order.

This patch (of 3):

DAMON_RECLAIM updates 'enabled' and 'kdamond_pid' parameter values, which
represents the running status of its kdamond, when the user explicitly
requests start/stop of the kdamond.  The kdamond can, however, be stopped
in events other than the explicit user request in the following three
events.

1. ctx->regions_score_histogram allocation failure at beginning of the
   execution,
2. damon_commit_ctx() failure due to invalid user input, and
3. damon_commit_ctx() failure due to its internal allocation failures.

Hence, if the kdamond is stopped by the above three events, the values of
the status parameters can be stale.  Users could show the stale values and
be confused.  This is already bad, but the real consequence is worse.
DAMON_RECLAIM avoids unnecessary damon_start() and damon_stop() calls
based on the 'enabled' parameter value.  And the update of 'enabled'
parameter value depends on the damon_start() and damon_stop() call
results.  Hence, once the kdamond has stopped by the unintentional events,
the user cannot restart the kdamond before the system reboot.  For
example, the issue can be reproduced via below steps.

    # cd /sys/module/damon_reclaim/parameters
    #
    # # start DAMON_RECLAIM
    # echo Y > enabled
    # ps -ef | grep kdamond
    root         806       2  0 17:53 ?        00:00:00 [kdamond.0]
    root         808     803  0 17:53 pts/4    00:00:00 grep kdamond
    #
    # # commit wrong input to stop kdamond withou explicit stop request
    # echo 3 > addr_unit
    # echo Y > commit_inputs
    bash: echo: write error: Invalid argument
    #
    # # confirm kdamond is stopped
    # ps -ef | grep kdamond
    root         811     803  0 17:53 pts/4    00:00:00 grep kdamond
    #
    # # users casn now show stable status
    # cat enabled
    Y
    # cat kdamond_pid
    806
    #
    # # even after fixing the wrong parameter,
    # # kdamond cannot be restarted.
    # echo 1 > addr_unit
    # echo Y > enabled
    # ps -ef | grep kdamond
    root         815     803  0 17:54 pts/4    00:00:00 grep kdamond

The problem will only rarely happen in real and common setups for the
following reasons.  The allocation failures are unlikely in such setups
since those allocations are arguably too small to fail.  Also sane users
on real production environments may not commit wrong input parameters.
But once it happens, the consequence is quite bad.  And the bug is a bug.

The issue stems from the fact that there are multiple events that can
change the status, and following all the events is challenging.
Dynamically detect and use the fresh status for the parameters when those
are requested.

Link: https://lore.kernel.org/20260419161003.79176-1-sj@kernel.org
Link: https://lore.kernel.org/20260419161003.79176-2-sj@kernel.org
Fixes: e035c280f6df ("mm/damon/reclaim: support online inputs update")
Co-developed-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 5.19.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 64a140afa5ed1c6f5ba6d451512cbdbbab1ba339)
Signed-off-by: SeongJae Park <sj@kernel.org>
---
This depends on other two backported patches [1,2].  Please apply this
after those.

[1] https://lore.kernel.org/20260513040734.144259-1-sj@kernel.org
[2] https://lore.kernel.org/20260513040734.144259-2-sj@kernel.org

 mm/damon/reclaim.c | 88 +++++++++++++++++++++++++++++-----------------
 1 file changed, 56 insertions(+), 32 deletions(-)

diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index 65842e6854fd1..9df096218beb7 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -137,15 +137,6 @@ module_param(monitor_region_end, ulong, 0600);
 static bool skip_anon __read_mostly;
 module_param(skip_anon, bool, 0600);
 
-/*
- * PID of the DAMON thread
- *
- * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread.
- * Else, -1.
- */
-static int kdamond_pid __read_mostly = -1;
-module_param(kdamond_pid, int, 0400);
-
 static struct damos_stat damon_reclaim_stat;
 DEFINE_DAMON_MODULES_DAMOS_STATS_PARAMS(damon_reclaim_stat,
 		reclaim_tried_regions, reclaimed_regions, quota_exceeds);
@@ -247,60 +238,93 @@ static int damon_reclaim_turn(bool on)
 {
 	int err;
 
-	if (!on) {
-		err = damon_stop(&ctx, 1);
-		if (!err)
-			kdamond_pid = -1;
-		return err;
-	}
+	if (!on)
+		return damon_stop(&ctx, 1);
 
 	err = damon_reclaim_apply_parameters();
 	if (err)
 		return err;
 
-	err = damon_start(&ctx, 1, true);
-	if (err)
-		return err;
-	kdamond_pid = ctx->kdamond->pid;
-	return 0;
+	return damon_start(&ctx, 1, true);
+}
+
+static bool damon_reclaim_enabled(void)
+{
+	if (!ctx)
+		return false;
+	return damon_is_running(ctx);
 }
 
 static int damon_reclaim_enabled_store(const char *val,
 		const struct kernel_param *kp)
 {
-	bool is_enabled = enabled;
-	bool enable;
 	int err;
 
-	err = kstrtobool(val, &enable);
+	err = kstrtobool(val, &enabled);
 	if (err)
 		return err;
 
-	if (is_enabled == enable)
+	if (damon_reclaim_enabled() == enabled)
 		return 0;
 
 	/* Called before init function.  The function will handle this. */
 	if (!ctx)
-		goto set_param_out;
+		return 0;
 
-	err = damon_reclaim_turn(enable);
-	if (err)
-		return err;
+	return damon_reclaim_turn(enabled);
+}
 
-set_param_out:
-	enabled = enable;
-	return err;
+static int damon_reclaim_enabled_load(char *buffer,
+		const struct kernel_param *kp)
+{
+	return sprintf(buffer, "%c\n", damon_reclaim_enabled() ? 'Y' : 'N');
 }
 
 static const struct kernel_param_ops enabled_param_ops = {
 	.set = damon_reclaim_enabled_store,
-	.get = param_get_bool,
+	.get = damon_reclaim_enabled_load,
 };
 
 module_param_cb(enabled, &enabled_param_ops, &enabled, 0600);
 MODULE_PARM_DESC(enabled,
 	"Enable or disable DAMON_RECLAIM (default: disabled)");
 
+static int damon_reclaim_kdamond_pid_store(const char *val,
+		const struct kernel_param *kp)
+{
+	/*
+	 * kdamond_pid is read-only, but kernel command line could write it.
+	 * Do nothing here.
+	 */
+	return 0;
+}
+
+static int damon_reclaim_kdamond_pid_load(char *buffer,
+		const struct kernel_param *kp)
+{
+	int kdamond_pid = -1;
+
+	if (ctx) {
+		kdamond_pid = damon_kdamond_pid(ctx);
+		if (kdamond_pid < 0)
+			kdamond_pid = -1;
+	}
+	return sprintf(buffer, "%d\n", kdamond_pid);
+}
+
+static const struct kernel_param_ops kdamond_pid_param_ops = {
+	.set = damon_reclaim_kdamond_pid_store,
+	.get = damon_reclaim_kdamond_pid_load,
+};
+
+/*
+ * PID of the DAMON thread
+ *
+ * If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread.
+ * Else, -1.
+ */
+module_param_cb(kdamond_pid, &kdamond_pid_param_ops, NULL, 0400);
+
 static int damon_reclaim_handle_commit_inputs(void)
 {
 	int err;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH 6.12.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
       [not found] <20260514021838.B3997C19425@smtp.kernel.org>
@ 2026-05-14  6:09 ` SeongJae Park
  0 siblings, 0 replies; 2+ messages in thread
From: SeongJae Park @ 2026-05-14  6:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman, sashiko-bot; +Cc: SeongJae Park, stable, damon

On Thu, 14 May 2026 02:18:36 +0000 sashiko-bot@kernel.org wrote:

> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [High] A race condition during module initialization allows userspace to trigger a use-after-free kernel panic via the `enabled` sysfs parameter.
> - [High] `damon_reclaim_apply_parameters()` incorrectly applies schemes and attributes to the global `ctx` instead of `param_ctx`, causing them to be destroyed by `damon_commit_ctx()`.
> - [Low] The patch introduces calls to undeclared functions `damon_is_running()` and `damon_kdamond_pid()`, causing a build failure.

TL; DR: Sashiko found upstream commit fed48693bdfe ("mm/damon/reclaim: use
parameter context correctly") is required for 6.12.y regardless of this patch,
but not yet picked by stable kernel maintainers.  Stable kernel maintainers,
could you please add the patch to 6.12.y?

Other Sashiko-found issues are no issue in my opinion.

Please read below if you are interested in more details.

> --
> 
> commit e5b3e77c68d72139966a3f423a04c39c6310ef66
> Author: SeongJae Park <sj@kernel.org>
> 
> mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values
> 
> This commit updates DAMON_RECLAIM to dynamically detect and use the fresh
> running status and process id for the enabled and kdamond_pid parameters.
> It avoids caching the status, which can become stale when the thread stops
> unexpectedly.
> 
> Link: https://lore.kernel.org/20260419161003.79176-1-sj@kernel.org
> Link: https://lore.kernel.org/20260419161003.79176-2-sj@kernel.org
> 
> > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
> > index 65842e6854fd1..9df096218beb7 100644
> > --- a/mm/damon/reclaim.c
> > +++ b/mm/damon/reclaim.c
> > @@ -247,60 +238,93 @@ static int damon_reclaim_turn(bool on)
> >  {
> >  	int err;
> >  
> > -	if (!on) {
> > -		err = damon_stop(&ctx, 1);
> > -		if (!err)
> > -			kdamond_pid = -1;
> > -		return err;
> > -	}
> > +	if (!on)
> > +		return damon_stop(&ctx, 1);
> >  
> >  	err = damon_reclaim_apply_parameters();
> 
> Does damon_reclaim_apply_parameters() correctly configure the scheme?
> 
> Looking at its implementation in mm/damon/reclaim.c:
> 
>     err = damon_modules_new_paddr_ctx_target(&param_ctx, &param_target);
>     ...
>     err = damon_set_attrs(ctx, &damon_reclaim_mon_attrs);
>     ...
>     scheme = damon_reclaim_new_scheme();
>     ...
>     damon_set_schemes(ctx, &scheme, 1);
>     ...
>     err = damon_commit_ctx(ctx, param_ctx);
> 
> It seems damon_set_attrs() and damon_set_schemes() are called on the global
> ctx rather than param_ctx.
> 
> Since param_ctx has zero schemes when damon_commit_ctx(ctx, param_ctx) is
> called, will this aggressively destroy the schemes that were just added to
> ctx, leaving DAMON_RECLAIM non-functional?

The bug is introduced by commit 11ddcfc257a3 ("mm/damon/reclaim: use
damon_commit_ctx()"), which was merged into the mainline by 6.11.  So the
broken commit is in 6.12.y.

The fix was made by commit fed48693bdfe ("mm/damon/reclaim: use parameter
context correctly") which was merged into the mainline by 6.17.  And the fix
commit has correct Fixes: tag, but somehow didn't picked by stable kernel
maintainers.

Stable kernel maintainers, could you please backport the fix commit to 6.12.y?
Let me know if any different action from my side is needed.

> 
> >  	if (err)
> >  		return err;
> >  
> > -	err = damon_start(&ctx, 1, true);
> > -	if (err)
> > -		return err;
> > -	kdamond_pid = ctx->kdamond->pid;
> > -	return 0;
> > +	return damon_start(&ctx, 1, true);
> > +}
> > +
> > +static bool damon_reclaim_enabled(void)
> > +{
> > +	if (!ctx)
> > +		return false;
> > +	return damon_is_running(ctx);
> > +}
> 
> Does this code compile?
> 
> It appears that damon_is_running() was introduced in a newer kernel version.
> Because this is a cherry-pick, are we missing a prerequisite refactoring
> patch that provides this function?

I informed this and where the dependent patches can be found, on the commentary
area of the patch.

> 
> >  
> >  static int damon_reclaim_enabled_store(const char *val,
> >  		const struct kernel_param *kp)
> >  {
> > -	bool is_enabled = enabled;
> > -	bool enable;
> >  	int err;
> >  
> > -	err = kstrtobool(val, &enable);
> > +	err = kstrtobool(val, &enabled);
> >  	if (err)
> >  		return err;
> >  
> > -	if (is_enabled == enable)
> > +	if (damon_reclaim_enabled() == enabled)
> >  		return 0;
> >  
> >  	/* Called before init function.  The function will handle this. */
> >  	if (!ctx)
> > -		goto set_param_out;
> > +		return 0;
> >  
> > -	err = damon_reclaim_turn(enable);
> > -	if (err)
> > -		return err;
> > +	return damon_reclaim_turn(enabled);
> > +}
> 
> Is there a race condition between module initialization and this sysfs
> parameter store?
> 
> The sysfs file for enabled is exposed to userspace by load_module() before
> damon_reclaim_init() completes. damon_reclaim_init() allocates ctx early.
> 
> If userspace writes to enabled after ctx is allocated but before
> initialization finishes, this function will start the kdamond thread.
> 
> When damon_reclaim_init() resumes, it will see enabled is true and attempt to
> start the thread again, resulting in -EBUSY.
> 
> If initialization fails and the module memory is freed, could the
> already-running kdamond thread execute unmapped memory and cause a
> use-after-free regression?

DAMON_RECLAIM is a static module.  Userspace cannot write sysfs files before
the static module initializations are finished to my understanding.  So this
kind of race cannot happen.  Please let me know if I'm wrong.

> 
> [ ... ]
> 
> > +static int damon_reclaim_kdamond_pid_load(char *buffer,
> > +		const struct kernel_param *kp)
> > +{
> > +	int kdamond_pid = -1;
> > +
> > +	if (ctx) {
> > +		kdamond_pid = damon_kdamond_pid(ctx);
> > +		if (kdamond_pid < 0)
> > +			kdamond_pid = -1;
> > +	}
> > +	return sprintf(buffer, "%d\n", kdamond_pid);
> > +}
> 
> Likewise, is damon_kdamond_pid() available in this kernel tree, or will this
> result in an implicit function declaration error?

I informed this and where the dependent patches can be found on the commentary
area of this patch.

> 
> -- 
> Sashiko AI review · https://sashiko.dev/#/patchset/20260513045238.194161-1-sj@kernel.org?part=1


Thanks,
SJ

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-14  6:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260514021838.B3997C19425@smtp.kernel.org>
2026-05-14  6:09 ` [PATCH 6.12.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values SeongJae Park
2026-05-12 14:05 FAILED: patch "[PATCH] mm/damon/reclaim: detect and use fresh enabled and" failed to apply to 6.12-stable tree gregkh
2026-05-13  4:52 ` [PATCH 6.12.y] mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox