From: SeongJae Park <sj@kernel.org>
To: Liew Rui Yan <aethernet65535@gmail.com>
Cc: SeongJae Park <sj@kernel.org>, damon@lists.linux.dev, linux-mm@kvack.org
Subject: Re: [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination
Date: Wed, 1 Apr 2026 08:41:18 -0700 [thread overview]
Message-ID: <20260401154119.67874-1-sj@kernel.org> (raw)
In-Reply-To: <20260401082439.12612-1-aethernet65535@gmail.com>
On Wed, 1 Apr 2026 16:24:39 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> On Tue, 31 Mar 2026 17:44:00 -0700 SeongJae Park <sj@kernel.org> wrote:
>
> > On Wed, 1 Apr 2026 00:09:56 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> >
> > > > [...]
> > > > Option 1: Introduce a generic termination callback
> > > > ==================================================
> > > > Add 'void *after_terminate_fn(void*)' and 'void *after_terminate_data'
> > > > to 'struct damon_ctx' or 'struct damon_operations'. While this extends
> > > > the Core API, it provides a clean notification mechanism. When kdamond
> > > > exits for any reason, the module can perform its own cleanup (e.g.,
> > > > resetting 'enabled' and 'kdamond_pid') within its own callback. This
> > > > keeps the core logic decoupled from module parameters.
> >
> > Maybe this is the right long term direction. But to my understanding the fix
> > should be backported to stable kernels. Is that correct? If so, I'd prefer
> > simple quick fix that can easily backported.
>
> I agree. While I hadn't initially considered backporting, this bug
> certainly warrants it. A simple and easily backportable fix should be
> our priority for now.
Thank you for generously accepting my concern.
>
> > > > Option 2: On-demand state correction in the module
> > > > ==================================================
> > > > In damon_{lru_sort, reclaim}_enabled_store(), if damon_stop() fails, we
> > > > check is_kdamond_running(). If the kdamond is found to be terminated, we
> > > > forcibly reset 'enabled' and 'kdamond_pid'.
> >
> > I think this can work, and simple.
> >
> > > >
> > > > My perspective
> > > > --------------
> > > > I personally prefer OPTION-1 because it ensures the sysfs state in
> > > > synchronized actively.
> > > >
> > > > OPTION-2 is simpler and avoids API changes, but it's a "passive" fix
> > > > that only triggers when user atttempts a write operation. User might
> > > > still see inconsistent value until they try to interact with the module.
> >
> > I agree your concern.
> >
> > > > [...]
> > >
> > > To avoid over-complicating the Core API (Option 1 or current approach),
> > > I've implemented a lightweight, on-demand synchronization mechanism in
> > > the module layer.
> > >
> > > By overriding the '.get' operator of the 'enabled' parameter, we can
> > > detect if kdamond has terminated and reset the module-level state right
> > > before the user reads them.
> > >
> > > Since sysfs parameter access is protected by 'kernel_param_lock', this
> > > approach is race-free and keeps the DAMON core decoupling intact.
> > >
> > > Core Implementation:
> > >
> > > if (enabled && !damon_is_running(ctx)) {
> > > enabled = false;
> > > kdamond_pid = -1;
> > > }
> > >
> > > return param_get_bool(buffer, kp);
> > >
> > > Test Log:
> > >
> > > # cd /sys/module/damon_lru_sort/parameters/
> > > # echo Y > enabled
> > > # echo 3 > addr_unit
> > > # echo Y > commit_inputs
> > > # cat enabled
> > > N
> > > # cat kdamond_pid
> > > -1
> > >
> > > I think this approach is better than my previous OPTION-1 and OPTION-2.
> > > Does this looks good to you?
> >
> > Looks not bad. But, what if we read kdamond_pid before reading enabled?
>
> You are right. To make this robust, I could apply similar '.get'
> overrides to 'kdamond_pid' as well. This ensures that reading either
> parameter would trigger the state synchronization.
>
> Would that be too heavy? Or do you think it's unnecessary?
I feel it's bit heavy, or duplication of code that could be avoided in a better
way.
>
> > Also, what if the user simply try writing N to enabled? Wouldn't user still
> > see the partial-broken status? So this doesn't seem gretly superior to the
> > option 2.
>
> In that case, we could modify the damon_{lru_sort, reclaim}
> _enabled_store(). Before processing the write, we check if kdamond is
> still alive. If it has terminated, we reset the 'enabled' and
> 'kdamond_pid'.
I agree this would work, but again I feel like this is a duplication of code
that could be avoided in a better way.
>
> > Based on your above reproducer, I understand this issue happens by the
> > damon_commit_ctx() failure. If it is not wrong, can't we catch the
> > damon_commit_ctx() failure from the calling place and make appropriate updates?
>
> Yes, we can. We could check 'ctx->maybe_corrupted' right after it
> returns, and reset 'enabled' and 'kdamond_pid' immediately if it's set.
>
> Do you think this approach is feasible?
I think this is more feasible. But 'ctx->maybe_corrupted' is a private field
that supposed to be used by only core.c.
What about checking if damon_call() and damon_commit_ctx() failures via therir
return value? It seems damon_call() fails only when kdamond goes to the
termination path. damon_commit_ctx() failure always causes the kdamond be
terminated. So if I didn't miss something that could be a path forward. What
do you think?
Thanks,
SJ
[...]
next prev parent reply other threads:[~2026-04-01 15:41 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-30 16:43 [RFC PATCH] mm/damon: reset thread status parameters upon kdamond termination Liew Rui Yan
2026-03-30 18:53 ` (sashiko review) " Liew Rui Yan
2026-03-30 19:51 ` Liew Rui Yan
2026-03-30 22:42 ` Liew Rui Yan
2026-03-31 5:02 ` SeongJae Park
2026-03-31 6:58 ` Liew Rui Yan
2026-03-31 16:09 ` Liew Rui Yan
2026-04-01 0:44 ` SeongJae Park
2026-04-01 8:24 ` Liew Rui Yan
2026-04-01 15:41 ` SeongJae Park [this message]
2026-04-02 5:34 ` Liew Rui Yan
2026-04-02 13:54 ` SeongJae Park
2026-04-03 4:34 ` Liew Rui Yan
2026-04-03 14:06 ` SeongJae Park
2026-04-01 0:29 ` SeongJae Park
2026-04-01 8:23 ` Liew Rui Yan
2026-04-02 0:40 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260401154119.67874-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=aethernet65535@gmail.com \
--cc=damon@lists.linux.dev \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox