[RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
@ 2026-03-30 16:43 Liew Rui Yan
  2026-03-30 18:53 ` (sashiko review) " Liew Rui Yan
  2026-03-31  5:02 ` SeongJae Park
  0 siblings, 2 replies; 17+ messages in thread
From: Liew Rui Yan @ 2026-03-30 16:43 UTC (permalink / raw)
  To: sj; +Cc: damon, linux-mm, Liew Rui Yan

Problem
=======
kdamond does not actively modify the module's
(DAMON_LRU_SORT/DAMON_RECLAIM) 'enabled' and 'kdamond_pid' after
exiting. After an unexpected termination, user will still see 'enabled'
as 'Y' and 'kdamond_pid' as a value other than -1. Furthermore, user
cannot restart the unexpectedly terminated kdamond by executing 'echo
Y/N > enabled' again.

Solution
========
Introduce a 'thread_status' structure to link the internal kdamond
state with module parameters ('enabled' and 'kdamond_pid').

Specifically:
1. Extend 'struct damon_ctx' to include pointers to the module's
   parameters.
2. Initialize these pointers in damon_lru_sort_apply_parameters()
   and damon_reclaim_apply_parameters() to point the respective
   module variables.
3. Implement damon_update_thread_status() to reset 'enabled' to
   false and 'kdamond_pid' to -1 when the kdamond thread finishes.

Signed-off-by: Liew Rui Yan <aethernet65535@gmail.com>
---
 include/linux/damon.h |  7 +++++++
 mm/damon/core.c       | 19 ++++++++++++++++++-
 mm/damon/lru_sort.c   |  5 +++--
 mm/damon/reclaim.c    |  5 +++--
 4 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index d9a3babbafc1..66cdac1eea93 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -763,6 +763,11 @@ struct damon_attrs {
 	unsigned long aggr_samples;
 };
 
+struct damon_thread_status {
+	int *kdamond_pid;
+	bool *enabled;
+};
+
 /**
  * struct damon_ctx - Represents a context for each monitoring.  This is the
  * main interface that allows users to set the attributes and get the results
@@ -841,6 +846,8 @@ struct damon_ctx {
 
 	struct list_head adaptive_targets;
 	struct list_head schemes;
+
+	struct damon_thread_status thread_status;
 };
 
 static inline struct damon_region *damon_next_region(struct damon_region *r)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index db6c67e52d2b..6c71203beec5 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1353,6 +1353,9 @@ int damon_commit_ctx(struct damon_ctx *dst, struct damon_ctx *src)
 	dst->addr_unit = src->addr_unit;
 	dst->min_region_sz = src->min_region_sz;
 
+	dst->thread_status.kdamond_pid = src->thread_status.kdamond_pid;
+	dst->thread_status.enabled = src->thread_status.enabled;
+
 	dst->maybe_corrupted = false;
 	return 0;
 }
@@ -2941,6 +2944,14 @@ static void kdamond_init_ctx(struct damon_ctx *ctx)
 	}
 }
 
+static void damon_update_thread_status(struct damon_ctx *ctx)
+{
+	if (ctx->thread_status.kdamond_pid)
+		*ctx->thread_status.kdamond_pid = -1;
+	if (ctx->thread_status.enabled)
+		*ctx->thread_status.enabled = false;
+}
+
 /*
  * The monitoring daemon that runs as a kernel thread
  */
@@ -3065,17 +3076,23 @@ static int kdamond_fn(void *data)
 	kdamond_call(ctx, true);
 	damos_walk_cancel(ctx);
 
-	pr_debug("kdamond (%d) finishes\n", current->pid);
 	mutex_lock(&ctx->kdamond_lock);
 	ctx->kdamond = NULL;
 	mutex_unlock(&ctx->kdamond_lock);
 
+	if (ctx->thread_status.enabled && *ctx->thread_status.enabled)
+		pr_debug("kdamond (%d) crashed\n", current->pid);
+	else
+		pr_debug("kdamond (%d) finishes\n", current->pid);
+
 	mutex_lock(&damon_lock);
 	nr_running_ctxs--;
 	if (!nr_running_ctxs && running_exclusive_ctxs)
 		running_exclusive_ctxs = false;
 	mutex_unlock(&damon_lock);
 
+	damon_update_thread_status(ctx);
+
 	return 0;
 }
 
diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index 554559d72976..7196010ee9b4 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -294,6 +294,9 @@ static int damon_lru_sort_apply_parameters(void)
 	param_ctx->addr_unit = addr_unit;
 	param_ctx->min_region_sz = max(DAMON_MIN_REGION_SZ / addr_unit, 1);
 
+	param_ctx->thread_status.kdamond_pid = &kdamond_pid;
+	param_ctx->thread_status.enabled = &enabled;
+
 	if (!damon_lru_sort_mon_attrs.sample_interval) {
 		err = -EINVAL;
 		goto out;
@@ -388,8 +391,6 @@ static int damon_lru_sort_turn(bool on)
 
 	if (!on) {
 		err = damon_stop(&ctx, 1);
-		if (!err)
-			kdamond_pid = -1;
 		return err;
 	}
 
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index 86da14778658..61592be438d3 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -204,6 +204,9 @@ static int damon_reclaim_apply_parameters(void)
 	param_ctx->addr_unit = addr_unit;
 	param_ctx->min_region_sz = max(DAMON_MIN_REGION_SZ / addr_unit, 1);
 
+	param_ctx->thread_status.kdamond_pid = &kdamond_pid;
+	param_ctx->thread_status.enabled = &enabled;
+
 	if (!damon_reclaim_mon_attrs.aggr_interval) {
 		err = -EINVAL;
 		goto out;
@@ -290,8 +293,6 @@ static int damon_reclaim_turn(bool on)
 
 	if (!on) {
 		err = damon_stop(&ctx, 1);
-		if (!err)
-			kdamond_pid = -1;
 		return err;
 	}
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: (sashiko review) [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-03-30 16:43 [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
@ 2026-03-30 18:53 ` Liew Rui Yan
  2026-03-30 19:51   ` Liew Rui Yan
  2026-03-31  5:02 ` SeongJae Park
  1 sibling, 1 reply; 17+ messages in thread
From: Liew Rui Yan @ 2026-03-30 18:53 UTC (permalink / raw)
  To: aethernet65535; +Cc: damon, linux-mm, sj

Forwarding Sashiko review for doing discussions via mails.

# review url: https://sashiko.dev/#/patchset/20260330164347.12772-1-aethernet65535@gmail.com
# start of sashiko.dev inline review
commit 636b179cc5df5a638d7988bd9de56ce4d8602f4a
Author: Liew Rui Yan <aethernet65535@gmail.com>

mm/damon: reset thread status parameters upon kdamond termination

This commit introduces a thread status structure in the DAMON context to
link the kdamond state with module parameters. It updates these parameters
when the kdamond thread finishes to reflect its actual state.

> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index db6c67e52d2b..6c71203beec5 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c

[ ... ]

> @@ -2941,6 +2944,14 @@ static void kdamond_init_ctx(struct damon_ctx *ctx)
>  	}
>  }
>  
> +static void damon_update_thread_status(struct damon_ctx *ctx)
> +{
> +	if (ctx->thread_status.kdamond_pid)
> +		*ctx->thread_status.kdamond_pid = -1;
> +	if (ctx->thread_status.enabled)
> +		*ctx->thread_status.enabled = false;

Can this write race with the user enabling the module? If a user writes 'Y'
to 'enabled', the store function spawns kdamond and prepares to set 'enabled'
to true. If kdamond exits immediately (e.g., due to invalid targets), could
this line asynchronously set 'enabled' to false before the store function
overwrites it with true?

If so, the system would be left in a state where the thread is dead but
'enabled' is true. Subsequent attempts to write 'N' to 'enabled' would fail
on damon_stop(), leaving the module permanently locked.

> +}
> +
>  /*
>   * The monitoring daemon that runs as a kernel thread
>   */

[ ... ]

> @@ -3065,17 +3076,23 @@ static int kdamond_fn(void *data)
>  	kdamond_call(ctx, true);
>  	damos_walk_cancel(ctx);
>  
> -	pr_debug("kdamond (%d) finishes\n", current->pid);
>  	mutex_lock(&ctx->kdamond_lock);
>  	ctx->kdamond = NULL;
>  	mutex_unlock(&ctx->kdamond_lock);
>  
> +	if (ctx->thread_status.enabled && *ctx->thread_status.enabled)

Can this access freed memory? If the kdamond_lock is dropped and
ctx->kdamond is NULL, damon_is_running(ctx) becomes false. If a concurrent
sysfs operation removes the context, could ctx be freed by damon_destroy_ctx()
before these lines execute, causing a use-after-free and memory corruption in
damon_update_thread_status(ctx)?

> +		pr_debug("kdamond (%d) crashed\n", current->pid);

Does this log normal user-requested shutdowns as crashes? When a user stops
the module by writing 'N' to 'enabled', the parameter store blocks on
damon_stop(). Since the global 'enabled' variable is still true at this
point, won't this incorrectly print a crash message instead of finishing
normally?

> +	else
> +		pr_debug("kdamond (%d) finishes\n", current->pid);
> +
>  	mutex_lock(&damon_lock);
>  	nr_running_ctxs--;
>  	if (!nr_running_ctxs && running_exclusive_ctxs)
>  		running_exclusive_ctxs = false;
>  	mutex_unlock(&damon_lock);
>  
> +	damon_update_thread_status(ctx);
> +
>  	return 0;
>  }

# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260330164347.12772-1-aethernet65535@gmail.com
#
# hkml [1] generated a draft of this mail.  It can be regenerated
# using below command:
#
#     hkml patch sashiko_dev --for_forwarding \
#             20260330164347.12772-1-aethernet65535@gmail.com
#
# [1] https://github.com/sjp38/hackermail

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: (sashiko review) [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-03-30 18:53 ` (sashiko review) " Liew Rui Yan
@ 2026-03-30 19:51   ` Liew Rui Yan
  2026-03-30 22:42     ` Liew Rui Yan
  0 siblings, 1 reply; 17+ messages in thread
From: Liew Rui Yan @ 2026-03-30 19:51 UTC (permalink / raw)
  To: aethernet65535, sj; +Cc: damon, linux-mm

> [...]
> > @@ -2941,6 +2944,14 @@ static void kdamond_init_ctx(struct damon_ctx *ctx)
> >  	}
> >  }
> >  
> > +static void damon_update_thread_status(struct damon_ctx *ctx)
> > +{
> > +	if (ctx->thread_status.kdamond_pid)
> > +		*ctx->thread_status.kdamond_pid = -1;
> > +	if (ctx->thread_status.enabled)
> > +		*ctx->thread_status.enabled = false;
> 
> Can this write race with the user enabling the module? If a user writes 'Y'
> to 'enabled', the store function spawns kdamond and prepares to set 'enabled'
> to true. If kdamond exits immediately (e.g., due to invalid targets), could
> this line asynchronously set 'enabled' to false before the store function
> overwrites it with true?
> 
> If so, the system would be left in a state where the thread is dead but
> 'enabled' is true. Subsequent attempts to write 'N' to 'enabled' would fail
> on damon_stop(), leaving the module permanently locked.

You are right. I now see the potential race between kdamond exiting and
the enabled_store(). While it seems unlikely, the window exists and
could lead to an inconsistent state. I'm sitll thinking about the way to
synchronize this without introducing new issues. I will try to address
this is next-version once I have a solid plan.

> > +}
> > +
> >  /*
> >   * The monitoring daemon that runs as a kernel thread
> >   */
> 
> [ ... ]
> 
> > @@ -3065,17 +3076,23 @@ static int kdamond_fn(void *data)
> >  	kdamond_call(ctx, true);
> >  	damos_walk_cancel(ctx);
> >  
> > -	pr_debug("kdamond (%d) finishes\n", current->pid);
> >  	mutex_lock(&ctx->kdamond_lock);
> >  	ctx->kdamond = NULL;
> >  	mutex_unlock(&ctx->kdamond_lock);
> >  
> > +	if (ctx->thread_status.enabled && *ctx->thread_status.enabled)
> 
> Can this access freed memory? If the kdamond_lock is dropped and
> ctx->kdamond is NULL, damon_is_running(ctx) becomes false. If a concurrent
> sysfs operation removes the context, could ctx be freed by damon_destroy_ctx()
> before these lines execute, causing a use-after-free and memory corruption in
> damon_update_thread_status(ctx)?

I have performed tests with KASAN enabled on virtme-ng. During multiple
start/stop/fail cycles, KASAN did not report any UAF.

> > +		pr_debug("kdamond (%d) crashed\n", current->pid);
> 
> Does this log normal user-requested shutdowns as crashes? When a user stops
> the module by writing 'N' to 'enabled', the parameter store blocks on
> damon_stop(). Since the global 'enabled' variable is still true at this
> point, won't this incorrectly print a crash message instead of finishing
> normally?

Thank you for reminder. This logic is indeed redundant and pontentially
confusing to users. I will restore the original output in next-version.
:>

> [...]

Best regards,
Rui Yan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: (sashiko review) [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-03-30 19:51   ` Liew Rui Yan
@ 2026-03-30 22:42     ` Liew Rui Yan
  0 siblings, 0 replies; 17+ messages in thread
From: Liew Rui Yan @ 2026-03-30 22:42 UTC (permalink / raw)
  To: aethernet65535, sj; +Cc: damon, linux-mm

On Tue, 31 Mar 2026 03:51:07 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> [...]
> > > +static void damon_update_thread_status(struct damon_ctx *ctx)
> > > +{
> > > +	if (ctx->thread_status.kdamond_pid)
> > > +		*ctx->thread_status.kdamond_pid = -1;
> > > +	if (ctx->thread_status.enabled)
> > > +		*ctx->thread_status.enabled = false;
> > 
> > Can this write race with the user enabling the module? If a user writes 'Y'
> > to 'enabled', the store function spawns kdamond and prepares to set 'enabled'
> > to true. If kdamond exits immediately (e.g., due to invalid targets), could
> > this line asynchronously set 'enabled' to false before the store function
> > overwrites it with true?
> > 
> > If so, the system would be left in a state where the thread is dead but
> > 'enabled' is true. Subsequent attempts to write 'N' to 'enabled' would fail
> > on damon_stop(), leaving the module permanently locked.
> 
> You are right. I now see the potential race between kdamond exiting and
> the enabled_store(). While it seems unlikely, the window exists and
> could lead to an inconsistent state. I'm sitll thinking about the way to
> synchronize this without introducing new issues. I will try to address
> this is next-version once I have a solid plan.

I think I've found a solution for the race.

In damon_lru_sort_enabled_store(), the current code sets 'enabled =
enable' only after damon_lru_sort_turn(enable) returns successfully.
This creates the window you mentioned.

I plan to update 'enabled' before calling damon_lru_sort_turn(), and
rollback the value only if the turn operation fails.

Best regards,
Rui Yan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-03-30 16:43 [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
  2026-03-30 18:53 ` (sashiko review) " Liew Rui Yan
@ 2026-03-31  5:02 ` SeongJae Park
  2026-03-31  6:58   ` Liew Rui Yan
  1 sibling, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2026-03-31  5:02 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Tue, 31 Mar 2026 00:43:47 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> Problem
> =======
> kdamond does not actively modify the module's
> (DAMON_LRU_SORT/DAMON_RECLAIM) 'enabled' and 'kdamond_pid' after
> exiting. After an unexpected termination, user will still see 'enabled'
> as 'Y' and 'kdamond_pid' as a value other than -1. Furthermore, user
> cannot restart the unexpectedly terminated kdamond by executing 'echo
> Y/N > enabled' again.

Nice catch!

I guess this can easily reproducible?  Sharing detailed reproduction steps
would be nice.

> 
> Solution
> ========
> Introduce a 'thread_status' structure to link the internal kdamond
> state with module parameters ('enabled' and 'kdamond_pid').
> 
> Specifically:
> 1. Extend 'struct damon_ctx' to include pointers to the module's
>    parameters.
> 2. Initialize these pointers in damon_lru_sort_apply_parameters()
>    and damon_reclaim_apply_parameters() to point the respective
>    module variables.
> 3. Implement damon_update_thread_status() to reset 'enabled' to
>    false and 'kdamond_pid' to -1 when the kdamond thread finishes.

This feels too much extension of core API for a problem that can more simply be
fixed.  Can't we detect the unexpected termination of kdamond from the modules
and update the paramter values accordingly?

If we cannot due to a limitation of the DAMON core API, I'd like to extend the
API for letting the caller detects the unexpected termination.

Because this is an RFC and I already have question for the high level
direction, I will skip code level review.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-03-31  5:02 ` SeongJae Park
@ 2026-03-31  6:58   ` Liew Rui Yan
  2026-03-31 16:09     ` Liew Rui Yan
  2026-04-01  0:29     ` SeongJae Park
  0 siblings, 2 replies; 17+ messages in thread
From: Liew Rui Yan @ 2026-03-31  6:58 UTC (permalink / raw)
  To: sj; +Cc: aethernet65535, damon, linux-mm

Hi SeongJae,

On Mon, 30 Mar 2026 22:02:28 -0700 SeongJae Park <sj@kernel.org> wrote:

> Nice catch!
> 
> I guess this can easily reproducible?  Sharing detailed reproduction steps
> would be nice.

I will include the reproduction steps in the next version.

> > Solution
> > ========
> > Introduce a 'thread_status' structure to link the internal kdamond
> > state with module parameters ('enabled' and 'kdamond_pid').
> > 
> > Specifically:
> > 1. Extend 'struct damon_ctx' to include pointers to the module's
> >    parameters.
> > 2. Initialize these pointers in damon_lru_sort_apply_parameters()
> >    and damon_reclaim_apply_parameters() to point the respective
> >    module variables.
> > 3. Implement damon_update_thread_status() to reset 'enabled' to
> >    false and 'kdamond_pid' to -1 when the kdamond thread finishes.
> 
> This feels too much extension of core API for a problem that can more simply be
> fixed.  Can't we detect the unexpected termination of kdamond from the modules
> and update the paramter values accordingly?
> 
> If we cannot due to a limitation of the DAMON core API, I'd like to extend the
> API for letting the caller detects the unexpected termination.
> 
> Because this is an RFC and I already have question for the high level
> direction, I will skip code level review.
> I agree that the current approach was too heavy as it forced the DAMON
> Core to directly manipulate module-specific parameters, leading to tight
> coupling between layers.

To address this, I've brainstormed two alternative directions and would
love to hear your thoughts:

Option 1: Introduce a generic termination callback
==================================================
Add 'void *after_terminate_fn(void*)' and 'void *after_terminate_data'
to 'struct damon_ctx' or 'struct damon_operations'. While this extends
the Core API, it provides a clean notification mechanism. When kdamond
exits for any reason, the module can perform its own cleanup (e.g.,
resetting 'enabled' and 'kdamond_pid') within its own callback. This
keeps the core logic decoupled from module parameters.

Option 2: On-demand state correction in the module
==================================================
In damon_{lru_sort, reclaim}_enabled_store(), if damon_stop() fails, we
check is_kdamond_running(). If the kdamond is found to be terminated, we
forcibly reset 'enabled' and 'kdamond_pid'.

My perspective
--------------
I personally prefer OPTION-1 because it ensures the sysfs state in
synchronized actively.

OPTION-2 is simpler and avoids API changes, but it's a "passive" fix
that only triggers when user atttempts a write operation. User might
still see inconsistent value until they try to interact with the module.

I value your judgment on which path better aligns with DAMON's long-term
design philosophy, as your experience with real-world kernel maintenance
is far beyond mine. :>

Best regards,
Rui Yan

[...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-03-31  6:58   ` Liew Rui Yan
@ 2026-03-31 16:09     ` Liew Rui Yan
  2026-04-01  0:44       ` SeongJae Park
  2026-04-01  0:29     ` SeongJae Park
  1 sibling, 1 reply; 17+ messages in thread
From: Liew Rui Yan @ 2026-03-31 16:09 UTC (permalink / raw)
  To: aethernet65535, sj; +Cc: damon, linux-mm

Hi SeongJae,

On Tue, 31 Mar 2026 14:58:36 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> [...]
> Option 1: Introduce a generic termination callback
> ==================================================
> Add 'void *after_terminate_fn(void*)' and 'void *after_terminate_data'
> to 'struct damon_ctx' or 'struct damon_operations'. While this extends
> the Core API, it provides a clean notification mechanism. When kdamond
> exits for any reason, the module can perform its own cleanup (e.g.,
> resetting 'enabled' and 'kdamond_pid') within its own callback. This
> keeps the core logic decoupled from module parameters.
> 
> Option 2: On-demand state correction in the module
> ==================================================
> In damon_{lru_sort, reclaim}_enabled_store(), if damon_stop() fails, we
> check is_kdamond_running(). If the kdamond is found to be terminated, we
> forcibly reset 'enabled' and 'kdamond_pid'.
> 
> My perspective
> --------------
> I personally prefer OPTION-1 because it ensures the sysfs state in
> synchronized actively.
> 
> OPTION-2 is simpler and avoids API changes, but it's a "passive" fix
> that only triggers when user atttempts a write operation. User might
> still see inconsistent value until they try to interact with the module.
> [...]

To avoid over-complicating the Core API (Option 1 or current approach),
I've implemented a lightweight, on-demand synchronization mechanism in
the module layer.

By overriding the '.get' operator of the 'enabled' parameter, we can
detect if kdamond has terminated and reset the module-level state right
before the user reads them.

Since sysfs parameter access is protected by 'kernel_param_lock', this
approach is race-free and keeps the DAMON core decoupling intact.

Core Implementation:

	if (enabled && !damon_is_running(ctx)) {
		enabled = false;
		kdamond_pid = -1;
	}

	return param_get_bool(buffer, kp);

Test Log:

    # cd /sys/module/damon_lru_sort/parameters/
    # echo Y > enabled 
    # echo 3 > addr_unit 
    # echo Y > commit_inputs 
    # cat enabled 
    N
    # cat kdamond_pid 
    -1

I think this approach is better than my previous OPTION-1 and OPTION-2.
Does this looks good to you?

Best regards,
Rui Yan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-03-31  6:58   ` Liew Rui Yan
  2026-03-31 16:09     ` Liew Rui Yan
@ 2026-04-01  0:29     ` SeongJae Park
  2026-04-01  8:23       ` Liew Rui Yan
  1 sibling, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2026-04-01  0:29 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Tue, 31 Mar 2026 14:58:36 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> Hi SeongJae,
> 
> On Mon, 30 Mar 2026 22:02:28 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > Nice catch!
> > 
> > I guess this can easily reproducible?  Sharing detailed reproduction steps
> > would be nice.
> 
> I will include the reproduction steps in the next version.

Please completely answer questions.  You didn't answer the first question.

> 
> > > Solution
> > > ========
> > > Introduce a 'thread_status' structure to link the internal kdamond
> > > state with module parameters ('enabled' and 'kdamond_pid').
> > > 
> > > Specifically:
> > > 1. Extend 'struct damon_ctx' to include pointers to the module's
> > >    parameters.
> > > 2. Initialize these pointers in damon_lru_sort_apply_parameters()
> > >    and damon_reclaim_apply_parameters() to point the respective
> > >    module variables.
> > > 3. Implement damon_update_thread_status() to reset 'enabled' to
> > >    false and 'kdamond_pid' to -1 when the kdamond thread finishes.
> > 
> > This feels too much extension of core API for a problem that can more simply be
> > fixed.  Can't we detect the unexpected termination of kdamond from the modules
> > and update the paramter values accordingly?
> > 
> > If we cannot due to a limitation of the DAMON core API, I'd like to extend the
> > API for letting the caller detects the unexpected termination.
> > 
> > Because this is an RFC and I already have question for the high level
> > direction, I will skip code level review.
> > I agree that the current approach was too heavy as it forced the DAMON
> > Core to directly manipulate module-specific parameters, leading to tight
> > coupling between layers.

Please do not add '> ' to your reply.  This confuses me.  Also, again, please
completely answer questions.  You didn't answer if we can or cannot detect the
unexpected termination.

I will reply to below in another reply, as you added a few things to that part,
while cutting the above part.  When you reply to your reply, please consider
keeping the previous reply part.  Then I can reply to your two mails at one
place.  That would help my life :)


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-03-31 16:09     ` Liew Rui Yan
@ 2026-04-01  0:44       ` SeongJae Park
  2026-04-01  8:24         ` Liew Rui Yan
  0 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2026-04-01  0:44 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Wed,  1 Apr 2026 00:09:56 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> Hi SeongJae,
> 
> On Tue, 31 Mar 2026 14:58:36 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> 
> > [...]
> > Option 1: Introduce a generic termination callback
> > ==================================================
> > Add 'void *after_terminate_fn(void*)' and 'void *after_terminate_data'
> > to 'struct damon_ctx' or 'struct damon_operations'. While this extends
> > the Core API, it provides a clean notification mechanism. When kdamond
> > exits for any reason, the module can perform its own cleanup (e.g.,
> > resetting 'enabled' and 'kdamond_pid') within its own callback. This
> > keeps the core logic decoupled from module parameters.

Maybe this is the right long term direction.  But to my understanding the fix
should be backported to stable kernels.  Is that correct?  If so, I'd prefer
simple quick fix that can easily backported.

> > 
> > Option 2: On-demand state correction in the module
> > ==================================================
> > In damon_{lru_sort, reclaim}_enabled_store(), if damon_stop() fails, we
> > check is_kdamond_running(). If the kdamond is found to be terminated, we
> > forcibly reset 'enabled' and 'kdamond_pid'.

I think this can work, and simple.

> > 
> > My perspective
> > --------------
> > I personally prefer OPTION-1 because it ensures the sysfs state in
> > synchronized actively.
> > 
> > OPTION-2 is simpler and avoids API changes, but it's a "passive" fix
> > that only triggers when user atttempts a write operation. User might
> > still see inconsistent value until they try to interact with the module.

I agree your concern.

> > [...]
> 
> To avoid over-complicating the Core API (Option 1 or current approach),
> I've implemented a lightweight, on-demand synchronization mechanism in
> the module layer.
> 
> By overriding the '.get' operator of the 'enabled' parameter, we can
> detect if kdamond has terminated and reset the module-level state right
> before the user reads them.
> 
> Since sysfs parameter access is protected by 'kernel_param_lock', this
> approach is race-free and keeps the DAMON core decoupling intact.
> 
> Core Implementation:
> 
> 	if (enabled && !damon_is_running(ctx)) {
> 		enabled = false;
> 		kdamond_pid = -1;
> 	}
> 
> 	return param_get_bool(buffer, kp);
> 
> Test Log:
> 
>     # cd /sys/module/damon_lru_sort/parameters/
>     # echo Y > enabled 
>     # echo 3 > addr_unit 
>     # echo Y > commit_inputs 
>     # cat enabled 
>     N
>     # cat kdamond_pid 
>     -1
> 
> I think this approach is better than my previous OPTION-1 and OPTION-2.
> Does this looks good to you?

Looks not bad.  But, what if we read kdamond_pid before reading enabled?
Also, what if the user simply try writing N to enabled?  Wouldn't user still
see the partial-broken status?  So this doesn't seem gretly superior to the
option 2.

Based on your above reproducer, I understand this issue happens by the
damon_commit_ctx() failure.  If it is not wrong, can't we catch the
damon_commit_ctx() failure from the calling place and make appropriate updates?


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-01  0:29     ` SeongJae Park
@ 2026-04-01  8:23       ` Liew Rui Yan
  2026-04-02  0:40         ` SeongJae Park
  0 siblings, 1 reply; 17+ messages in thread
From: Liew Rui Yan @ 2026-04-01  8:23 UTC (permalink / raw)
  To: sj; +Cc: aethernet65535, damon, linux-mm

Hi SeongJae,

First of all, I apologize for not completely answering your question,
and I will be more careful next time.

On Tue, 31 Mar 2026 17:29:18 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Tue, 31 Mar 2026 14:58:36 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> 
> > Hi SeongJae,
> > 
> > On Mon, 30 Mar 2026 22:02:28 -0700 SeongJae Park <sj@kernel.org> wrote:
> > 
> > > Nice catch!
> > > 
> > > I guess this can easily reproducible?  Sharing detailed reproduction steps
> > > would be nice.
> > 
> > I will include the reproduction steps in the next version.
> 
> Please completely answer questions.  You didn't answer the first question.

Yes, this issue is easily reproducible. Specifically, setting
'addr_unit=3' and then 'commit_inputs=Y' triggers it. I will include the
detailed reproduction steps in the next version's commit message.

> > > [...]
> > > This feels too much extension of core API for a problem that can more simply be
> > > fixed.  Can't we detect the unexpected termination of kdamond from the modules
> > > and update the paramter values accordingly?

Yes, we can detect it. From the module, we can check '!
damon_is_running(ctx)' in conjunction with 'ctx->maybe_corrupted'. Since
'maybe_corrupted' persists until the next initialization, it
effectively indicates whether the kdamond terminated unexpectedly.

> [...]
> Please do not add '> ' to your reply.  This confuses me.  Also, again, please
> completely answer questions.  You didn't answer if we can or cannot detect the
> unexpected termination.

I'm very sorry for the formatting error. I accidentally miselected lines
in my editor while batch-prepending "> " to the quoted text. I will be
much more careful with my editor operations.

> I will reply to below in another reply, as you added a few things to that part,
> while cutting the above part.  When you reply to your reply, please consider
> keeping the previous reply part.  Then I can reply to your two mails at one
> place.  That would help my life :)

Understood! I will keep the previous context in my replies to make the
discussion easier to follow and reply. Thank you for your patience and
guidance!

Best regards,
Rui Yan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-01  0:44       ` SeongJae Park
@ 2026-04-01  8:24         ` Liew Rui Yan
  2026-04-01 15:41           ` SeongJae Park
  0 siblings, 1 reply; 17+ messages in thread
From: Liew Rui Yan @ 2026-04-01  8:24 UTC (permalink / raw)
  To: sj; +Cc: aethernet65535, damon, linux-mm

On Tue, 31 Mar 2026 17:44:00 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Wed,  1 Apr 2026 00:09:56 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> 
> > > [...]
> > > Option 1: Introduce a generic termination callback
> > > ==================================================
> > > Add 'void *after_terminate_fn(void*)' and 'void *after_terminate_data'
> > > to 'struct damon_ctx' or 'struct damon_operations'. While this extends
> > > the Core API, it provides a clean notification mechanism. When kdamond
> > > exits for any reason, the module can perform its own cleanup (e.g.,
> > > resetting 'enabled' and 'kdamond_pid') within its own callback. This
> > > keeps the core logic decoupled from module parameters.
> 
> Maybe this is the right long term direction.  But to my understanding the fix
> should be backported to stable kernels.  Is that correct?  If so, I'd prefer
> simple quick fix that can easily backported.

I agree. While I hadn't initially considered backporting, this bug
certainly warrants it. A simple and easily backportable fix should be
our priority for now.

> > > Option 2: On-demand state correction in the module
> > > ==================================================
> > > In damon_{lru_sort, reclaim}_enabled_store(), if damon_stop() fails, we
> > > check is_kdamond_running(). If the kdamond is found to be terminated, we
> > > forcibly reset 'enabled' and 'kdamond_pid'.
> 
> I think this can work, and simple.
> 
> > > 
> > > My perspective
> > > --------------
> > > I personally prefer OPTION-1 because it ensures the sysfs state in
> > > synchronized actively.
> > > 
> > > OPTION-2 is simpler and avoids API changes, but it's a "passive" fix
> > > that only triggers when user atttempts a write operation. User might
> > > still see inconsistent value until they try to interact with the module.
> 
> I agree your concern.
> 
> > > [...]
> > 
> > To avoid over-complicating the Core API (Option 1 or current approach),
> > I've implemented a lightweight, on-demand synchronization mechanism in
> > the module layer.
> > 
> > By overriding the '.get' operator of the 'enabled' parameter, we can
> > detect if kdamond has terminated and reset the module-level state right
> > before the user reads them.
> > 
> > Since sysfs parameter access is protected by 'kernel_param_lock', this
> > approach is race-free and keeps the DAMON core decoupling intact.
> > 
> > Core Implementation:
> > 
> > 	if (enabled && !damon_is_running(ctx)) {
> > 		enabled = false;
> > 		kdamond_pid = -1;
> > 	}
> > 
> > 	return param_get_bool(buffer, kp);
> > 
> > Test Log:
> > 
> >     # cd /sys/module/damon_lru_sort/parameters/
> >     # echo Y > enabled 
> >     # echo 3 > addr_unit 
> >     # echo Y > commit_inputs 
> >     # cat enabled 
> >     N
> >     # cat kdamond_pid 
> >     -1
> > 
> > I think this approach is better than my previous OPTION-1 and OPTION-2.
> > Does this looks good to you?
> 
> Looks not bad.  But, what if we read kdamond_pid before reading enabled?

You are right. To make this robust, I could apply similar '.get'
overrides to 'kdamond_pid' as well. This ensures that reading either
parameter would trigger the state synchronization.

Would that be too heavy? Or do you think it's unnecessary?

> Also, what if the user simply try writing N to enabled?  Wouldn't user still
> see the partial-broken status?  So this doesn't seem gretly superior to the
> option 2.

In that case, we could modify the damon_{lru_sort, reclaim}
_enabled_store(). Before processing the write, we check if kdamond is
still alive. If it has terminated, we reset the 'enabled' and
'kdamond_pid'.

> Based on your above reproducer, I understand this issue happens by the
> damon_commit_ctx() failure.  If it is not wrong, can't we catch the
> damon_commit_ctx() failure from the calling place and make appropriate updates?

Yes, we can. We could check 'ctx->maybe_corrupted' right after it
returns, and reset 'enabled' and 'kdamond_pid' immediately if it's set.

Do you think this approach is feasible?

Best regards,
Rui Yan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-01  8:24         ` Liew Rui Yan
@ 2026-04-01 15:41           ` SeongJae Park
  2026-04-02  5:34             ` Liew Rui Yan
  0 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2026-04-01 15:41 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Wed,  1 Apr 2026 16:24:39 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> On Tue, 31 Mar 2026 17:44:00 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > On Wed,  1 Apr 2026 00:09:56 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> > 
> > > > [...]
> > > > Option 1: Introduce a generic termination callback
> > > > ==================================================
> > > > Add 'void *after_terminate_fn(void*)' and 'void *after_terminate_data'
> > > > to 'struct damon_ctx' or 'struct damon_operations'. While this extends
> > > > the Core API, it provides a clean notification mechanism. When kdamond
> > > > exits for any reason, the module can perform its own cleanup (e.g.,
> > > > resetting 'enabled' and 'kdamond_pid') within its own callback. This
> > > > keeps the core logic decoupled from module parameters.
> > 
> > Maybe this is the right long term direction.  But to my understanding the fix
> > should be backported to stable kernels.  Is that correct?  If so, I'd prefer
> > simple quick fix that can easily backported.
> 
> I agree. While I hadn't initially considered backporting, this bug
> certainly warrants it. A simple and easily backportable fix should be
> our priority for now.

Thank you for generously accepting my concern.

> 
> > > > Option 2: On-demand state correction in the module
> > > > ==================================================
> > > > In damon_{lru_sort, reclaim}_enabled_store(), if damon_stop() fails, we
> > > > check is_kdamond_running(). If the kdamond is found to be terminated, we
> > > > forcibly reset 'enabled' and 'kdamond_pid'.
> > 
> > I think this can work, and simple.
> > 
> > > > 
> > > > My perspective
> > > > --------------
> > > > I personally prefer OPTION-1 because it ensures the sysfs state in
> > > > synchronized actively.
> > > > 
> > > > OPTION-2 is simpler and avoids API changes, but it's a "passive" fix
> > > > that only triggers when user atttempts a write operation. User might
> > > > still see inconsistent value until they try to interact with the module.
> > 
> > I agree your concern.
> > 
> > > > [...]
> > > 
> > > To avoid over-complicating the Core API (Option 1 or current approach),
> > > I've implemented a lightweight, on-demand synchronization mechanism in
> > > the module layer.
> > > 
> > > By overriding the '.get' operator of the 'enabled' parameter, we can
> > > detect if kdamond has terminated and reset the module-level state right
> > > before the user reads them.
> > > 
> > > Since sysfs parameter access is protected by 'kernel_param_lock', this
> > > approach is race-free and keeps the DAMON core decoupling intact.
> > > 
> > > Core Implementation:
> > > 
> > > 	if (enabled && !damon_is_running(ctx)) {
> > > 		enabled = false;
> > > 		kdamond_pid = -1;
> > > 	}
> > > 
> > > 	return param_get_bool(buffer, kp);
> > > 
> > > Test Log:
> > > 
> > >     # cd /sys/module/damon_lru_sort/parameters/
> > >     # echo Y > enabled 
> > >     # echo 3 > addr_unit 
> > >     # echo Y > commit_inputs 
> > >     # cat enabled 
> > >     N
> > >     # cat kdamond_pid 
> > >     -1
> > > 
> > > I think this approach is better than my previous OPTION-1 and OPTION-2.
> > > Does this looks good to you?
> > 
> > Looks not bad.  But, what if we read kdamond_pid before reading enabled?
> 
> You are right. To make this robust, I could apply similar '.get'
> overrides to 'kdamond_pid' as well. This ensures that reading either
> parameter would trigger the state synchronization.
> 
> Would that be too heavy? Or do you think it's unnecessary?

I feel it's bit heavy, or duplication of code that could be avoided in a better
way.

> 
> > Also, what if the user simply try writing N to enabled?  Wouldn't user still
> > see the partial-broken status?  So this doesn't seem gretly superior to the
> > option 2.
> 
> In that case, we could modify the damon_{lru_sort, reclaim}
> _enabled_store(). Before processing the write, we check if kdamond is
> still alive. If it has terminated, we reset the 'enabled' and
> 'kdamond_pid'.

I agree this would work, but again I feel like this is a duplication of code
that could be avoided in a better way.

> 
> > Based on your above reproducer, I understand this issue happens by the
> > damon_commit_ctx() failure.  If it is not wrong, can't we catch the
> > damon_commit_ctx() failure from the calling place and make appropriate updates?
> 
> Yes, we can. We could check 'ctx->maybe_corrupted' right after it
> returns, and reset 'enabled' and 'kdamond_pid' immediately if it's set.
> 
> Do you think this approach is feasible?

I think this is more feasible.  But 'ctx->maybe_corrupted' is a private field
that supposed to be used by only core.c.

What about checking if damon_call() and damon_commit_ctx() failures via therir
return value?  It seems damon_call() fails only when kdamond goes to the
termination path.  damon_commit_ctx() failure always causes the kdamond be
terminated.  So if I didn't miss something that could be a path forward.  What
do you think?


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-01  8:23       ` Liew Rui Yan
@ 2026-04-02  0:40         ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2026-04-02  0:40 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Wed,  1 Apr 2026 16:23:54 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> Hi SeongJae,
> 
> First of all, I apologize for not completely answering your question,
> and I will be more careful next time.

No worries, we all make mistakes, or simply have different tastes.  Thank you
for adding clear answers to my questions and generously accepting my
suggestions.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-01 15:41           ` SeongJae Park
@ 2026-04-02  5:34             ` Liew Rui Yan
  2026-04-02 13:54               ` SeongJae Park
  0 siblings, 1 reply; 17+ messages in thread
From: Liew Rui Yan @ 2026-04-02  5:34 UTC (permalink / raw)
  To: sj; +Cc: aethernet65535, damon, linux-mm

Hi SeongJae,

On Wed,  1 Apr 2026 08:41:18 -0700 SeongJae Park <sj@kernel.org> wrote:

> [...]
> What about checking if damon_call() and damon_commit_ctx() failures via therir
> return value?  It seems damon_call() fails only when kdamond goes to the
> termination path.  damon_commit_ctx() failure always causes the kdamond be
> terminated.  So if I didn't miss something that could be a path forward.  What
> do you think?

Thank you for the suggestion. I looked into the code and I think
focusing on damon_commit_ctx() failures makes sense.

Specifically, damon_call() is only used in the damon_{lru_sort, reclaim}
_enabled_store() path, which doesn't trigger unexpected termination. The
main risk comes from 'commit_inputs', which calls damon_{lru_sort,
reclaim}_apply_parameters() . When damon_commit_ctx() fails there, it
reliably indicates that kdamond is terminating.

My plan is to add the error check directly in apply_parameters(), right
after damon_commit_ctx():

    err = damon_commit_ctx(ctx, param_ctx);
    if (err) {
        enabled = false;
        kdamond_pid = -1;
    }

This covers the reproducible case (addr_unit=3 + commit_inputs=Y).

For truly unexpected termination (e.g., memory allocation failure inside
kdamond), I'm considering a fallback mechanism in enabled_store():

- When the user writes 'N' to 'enabled', if damon_stop() fails but
  kdamond is actually terminated, we reset enabled and kdamond_pid.

- When the user writes 'Y' to 'enabled', if enabled is already 'Y' but
  kdamond is terminated, we treat this as a restart request.

This way, even if kdamond terminates unexpectedly, the next user
interaction will recover the state automatically.

Does this approach sound reasonable?

Best regards,
Rui Yan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-02  5:34             ` Liew Rui Yan
@ 2026-04-02 13:54               ` SeongJae Park
  2026-04-03  4:34                 ` Liew Rui Yan
  0 siblings, 1 reply; 17+ messages in thread
From: SeongJae Park @ 2026-04-02 13:54 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Thu,  2 Apr 2026 13:34:58 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> Hi SeongJae,
> 
> On Wed,  1 Apr 2026 08:41:18 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > [...]
> > What about checking if damon_call() and damon_commit_ctx() failures via therir
> > return value?  It seems damon_call() fails only when kdamond goes to the
> > termination path.  damon_commit_ctx() failure always causes the kdamond be
> > terminated.  So if I didn't miss something that could be a path forward.  What
> > do you think?
> 
> Thank you for the suggestion. I looked into the code and I think
> focusing on damon_commit_ctx() failures makes sense.
> 
> Specifically, damon_call() is only used in the damon_{lru_sort, reclaim}
> _enabled_store() path, which doesn't trigger unexpected termination. The
> main risk comes from 'commit_inputs', which calls damon_{lru_sort,
> reclaim}_apply_parameters() . When damon_commit_ctx() fails there, it
> reliably indicates that kdamond is terminating.
> 
> My plan is to add the error check directly in apply_parameters(), right
> after damon_commit_ctx():
> 
>     err = damon_commit_ctx(ctx, param_ctx);
>     if (err) {
>         enabled = false;
>         kdamond_pid = -1;
>     }
> 
> This covers the reproducible case (addr_unit=3 + commit_inputs=Y).
> 
> For truly unexpected termination (e.g., memory allocation failure inside
> kdamond), I'm considering a fallback mechanism in enabled_store():
> 
> - When the user writes 'N' to 'enabled', if damon_stop() fails but
>   kdamond is actually terminated, we reset enabled and kdamond_pid.

Sounds good.

> 
> - When the user writes 'Y' to 'enabled', if enabled is already 'Y' but
>   kdamond is terminated, we treat this as a restart request.

This sounds bit odd to me.  User shows enabled=Y and kdamond_pid!=-1.  Why they
would write 'Y' again?  Yes, they would notice kdamond is terminated in real,
using 'ps' like tool.  But even in the case, I'd imagine users would write N to
enabled first, and then write Y to enabled.  In other words, this could improve
the user experience, but seems not really necessary to me, at least as a
hotfix.  What do you think?

> 
> This way, even if kdamond terminates unexpectedly, the next user
> interaction will recover the state automatically.
> 
> Does this approach sound reasonable?

Yes, the plan for the next version sounds good to me.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-02 13:54               ` SeongJae Park
@ 2026-04-03  4:34                 ` Liew Rui Yan
  2026-04-03 14:06                   ` SeongJae Park
  0 siblings, 1 reply; 17+ messages in thread
From: Liew Rui Yan @ 2026-04-03  4:34 UTC (permalink / raw)
  To: sj; +Cc: aethernet65535, damon, linux-mm

On Thu,  2 Apr 2026 06:54:39 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Thu,  2 Apr 2026 13:34:58 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> 
> > [...]
> > My plan is to add the error check directly in apply_parameters(), right
> > after damon_commit_ctx():
> > 
> >     err = damon_commit_ctx(ctx, param_ctx);
> >     if (err) {
> >         enabled = false;
> >         kdamond_pid = -1;
> >     }
> > 
> > This covers the reproducible case (addr_unit=3 + commit_inputs=Y).
> > 
> > For truly unexpected termination (e.g., memory allocation failure inside
> > kdamond), I'm considering a fallback mechanism in enabled_store():
> > 
> > - When the user writes 'N' to 'enabled', if damon_stop() fails but
> >   kdamond is actually terminated, we reset enabled and kdamond_pid.
> 
> Sounds good.
> 
> > 
> > - When the user writes 'Y' to 'enabled', if enabled is already 'Y' but
> >   kdamond is terminated, we treat this as a restart request.
> 
> This sounds bit odd to me.  User shows enabled=Y and kdamond_pid!=-1.  Why they
> would write 'Y' again?  Yes, they would notice kdamond is terminated in real,
> using 'ps' like tool.  But even in the case, I'd imagine users would write N to
> enabled first, and then write Y to enabled.  In other words, this could improve
> the user experience, but seems not really necessary to me, at least as a
> hotfix.  What do you think?

I see your point. I wasn't thinking of this as a hotfix at first, but
you're right - for a bug like this, simple and backportable is the way
to go.

So I will drop the "Write Y to Restart" logic in the next version. I
will only fix the state in the 'N' path and after commit failures.

> 
> > 
> > This way, even if kdamond terminates unexpectedly, the next user
> > interaction will recover the state automatically.
> > 
> > Does this approach sound reasonable?
> 
> Yes, the plan for the next version sounds good to me.

Great! Thank you for helping me scope this properly! :>

Best regards,
Rui Yan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
  2026-04-03  4:34                 ` Liew Rui Yan
@ 2026-04-03 14:06                   ` SeongJae Park
  0 siblings, 0 replies; 17+ messages in thread
From: SeongJae Park @ 2026-04-03 14:06 UTC (permalink / raw)
  To: Liew Rui Yan; +Cc: SeongJae Park, damon, linux-mm

On Fri,  3 Apr 2026 12:34:48 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> On Thu,  2 Apr 2026 06:54:39 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > On Thu,  2 Apr 2026 13:34:58 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
[...]
> > > - When the user writes 'Y' to 'enabled', if enabled is already 'Y' but
> > >   kdamond is terminated, we treat this as a restart request.
> > 
> > This sounds bit odd to me.  User shows enabled=Y and kdamond_pid!=-1.  Why they
> > would write 'Y' again?  Yes, they would notice kdamond is terminated in real,
> > using 'ps' like tool.  But even in the case, I'd imagine users would write N to
> > enabled first, and then write Y to enabled.  In other words, this could improve
> > the user experience, but seems not really necessary to me, at least as a
> > hotfix.  What do you think?
> 
> I see your point. I wasn't thinking of this as a hotfix at first, but
> you're right - for a bug like this, simple and backportable is the way
> to go.
> 
> So I will drop the "Write Y to Restart" logic in the next version. I
> will only fix the state in the 'N' path and after commit failures.

Thank you for accepting my suggestion.

> 
> > 
> > > 
> > > This way, even if kdamond terminates unexpectedly, the next user
> > > interaction will recover the state automatically.
> > > 
> > > Does this approach sound reasonable?
> > 
> > Yes, the plan for the next version sounds good to me.
> 
> Great! Thank you for helping me scope this properly! :>

You're welcome :)


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-04-03 14:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 16:43 [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
2026-03-30 18:53 ` (sashiko review) " Liew Rui Yan
2026-03-30 19:51   ` Liew Rui Yan
2026-03-30 22:42     ` Liew Rui Yan
2026-03-31  5:02 ` SeongJae Park
2026-03-31  6:58   ` Liew Rui Yan
2026-03-31 16:09     ` Liew Rui Yan
2026-04-01  0:44       ` SeongJae Park
2026-04-01  8:24         ` Liew Rui Yan
2026-04-01 15:41           ` SeongJae Park
2026-04-02  5:34             ` Liew Rui Yan
2026-04-02 13:54               ` SeongJae Park
2026-04-03  4:34                 ` Liew Rui Yan
2026-04-03 14:06                   ` SeongJae Park
2026-04-01  0:29     ` SeongJae Park
2026-04-01  8:23       ` Liew Rui Yan
2026-04-02  0:40         ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox