Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination

All of lore.kernel.org
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: Liew Rui Yan <aethernet65535@gmail.com>
Cc: SeongJae Park <sj@kernel.org>, damon@lists.linux.dev, linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
Date: Wed,  1 Apr 2026 08:41:18 -0700	[thread overview]
Message-ID: <20260401154119.67874-1-sj@kernel.org> (raw)
In-Reply-To: <20260401082439.12612-1-aethernet65535@gmail.com>

On Wed,  1 Apr 2026 16:24:39 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:

> On Tue, 31 Mar 2026 17:44:00 -0700 SeongJae Park <sj@kernel.org> wrote:
> 
> > On Wed,  1 Apr 2026 00:09:56 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> > 
> > > > [...]
> > > > Option 1: Introduce a generic termination callback
> > > > ==================================================
> > > > Add 'void *after_terminate_fn(void*)' and 'void *after_terminate_data'
> > > > to 'struct damon_ctx' or 'struct damon_operations'. While this extends
> > > > the Core API, it provides a clean notification mechanism. When kdamond
> > > > exits for any reason, the module can perform its own cleanup (e.g.,
> > > > resetting 'enabled' and 'kdamond_pid') within its own callback. This
> > > > keeps the core logic decoupled from module parameters.
> > 
> > Maybe this is the right long term direction.  But to my understanding the fix
> > should be backported to stable kernels.  Is that correct?  If so, I'd prefer
> > simple quick fix that can easily backported.
> 
> I agree. While I hadn't initially considered backporting, this bug
> certainly warrants it. A simple and easily backportable fix should be
> our priority for now.

Thank you for generously accepting my concern.

> 
> > > > Option 2: On-demand state correction in the module
> > > > ==================================================
> > > > In damon_{lru_sort, reclaim}_enabled_store(), if damon_stop() fails, we
> > > > check is_kdamond_running(). If the kdamond is found to be terminated, we
> > > > forcibly reset 'enabled' and 'kdamond_pid'.
> > 
> > I think this can work, and simple.
> > 
> > > > 
> > > > My perspective
> > > > --------------
> > > > I personally prefer OPTION-1 because it ensures the sysfs state in
> > > > synchronized actively.
> > > > 
> > > > OPTION-2 is simpler and avoids API changes, but it's a "passive" fix
> > > > that only triggers when user atttempts a write operation. User might
> > > > still see inconsistent value until they try to interact with the module.
> > 
> > I agree your concern.
> > 
> > > > [...]
> > > 
> > > To avoid over-complicating the Core API (Option 1 or current approach),
> > > I've implemented a lightweight, on-demand synchronization mechanism in
> > > the module layer.
> > > 
> > > By overriding the '.get' operator of the 'enabled' parameter, we can
> > > detect if kdamond has terminated and reset the module-level state right
> > > before the user reads them.
> > > 
> > > Since sysfs parameter access is protected by 'kernel_param_lock', this
> > > approach is race-free and keeps the DAMON core decoupling intact.
> > > 
> > > Core Implementation:
> > > 
> > > 	if (enabled && !damon_is_running(ctx)) {
> > > 		enabled = false;
> > > 		kdamond_pid = -1;
> > > 	}
> > > 
> > > 	return param_get_bool(buffer, kp);
> > > 
> > > Test Log:
> > > 
> > >     # cd /sys/module/damon_lru_sort/parameters/
> > >     # echo Y > enabled 
> > >     # echo 3 > addr_unit 
> > >     # echo Y > commit_inputs 
> > >     # cat enabled 
> > >     N
> > >     # cat kdamond_pid 
> > >     -1
> > > 
> > > I think this approach is better than my previous OPTION-1 and OPTION-2.
> > > Does this looks good to you?
> > 
> > Looks not bad.  But, what if we read kdamond_pid before reading enabled?
> 
> You are right. To make this robust, I could apply similar '.get'
> overrides to 'kdamond_pid' as well. This ensures that reading either
> parameter would trigger the state synchronization.
> 
> Would that be too heavy? Or do you think it's unnecessary?

I feel it's bit heavy, or duplication of code that could be avoided in a better
way.

> 
> > Also, what if the user simply try writing N to enabled?  Wouldn't user still
> > see the partial-broken status?  So this doesn't seem gretly superior to the
> > option 2.
> 
> In that case, we could modify the damon_{lru_sort, reclaim}
> _enabled_store(). Before processing the write, we check if kdamond is
> still alive. If it has terminated, we reset the 'enabled' and
> 'kdamond_pid'.

I agree this would work, but again I feel like this is a duplication of code
that could be avoided in a better way.

> 
> > Based on your above reproducer, I understand this issue happens by the
> > damon_commit_ctx() failure.  If it is not wrong, can't we catch the
> > damon_commit_ctx() failure from the calling place and make appropriate updates?
> 
> Yes, we can. We could check 'ctx->maybe_corrupted' right after it
> returns, and reset 'enabled' and 'kdamond_pid' immediately if it's set.
> 
> Do you think this approach is feasible?

I think this is more feasible.  But 'ctx->maybe_corrupted' is a private field
that supposed to be used by only core.c.

What about checking if damon_call() and damon_commit_ctx() failures via therir
return value?  It seems damon_call() fails only when kdamond goes to the
termination path.  damon_commit_ctx() failure always causes the kdamond be
terminated.  So if I didn't miss something that could be a path forward.  What
do you think?


Thanks,
SJ

[...]

next prev parent reply	other threads:[~2026-04-01 15:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 16:43 [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
2026-03-30 18:53 ` (sashiko review) " Liew Rui Yan
2026-03-30 19:51   ` Liew Rui Yan
2026-03-30 22:42     ` Liew Rui Yan
2026-03-31  5:02 ` SeongJae Park
2026-03-31  6:58   ` Liew Rui Yan
2026-03-31 16:09     ` Liew Rui Yan
2026-04-01  0:44       ` SeongJae Park
2026-04-01  8:24         ` Liew Rui Yan
2026-04-01 15:41           ` SeongJae Park [this message]
2026-04-02  5:34             ` Liew Rui Yan
2026-04-02 13:54               ` SeongJae Park
2026-04-03  4:34                 ` Liew Rui Yan
2026-04-03 14:06                   ` SeongJae Park
2026-04-01  0:29     ` SeongJae Park
2026-04-01  8:23       ` Liew Rui Yan
2026-04-02  0:40         ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260401154119.67874-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=aethernet65535@gmail.com \
    --cc=damon@lists.linux.dev \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.