Re: (sashiko review) [RFC v5] mm/damon: add synchronous commit for commit_inputs

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: Liew Rui Yan <aethernet65535@gmail.com>
To: sj@kernel.org
Cc: aethernet65535@gmail.com, damon@lists.linux.dev, linux-mm@kvack.org
Subject: Re: (sashiko review) [RFC v5] mm/damon: add synchronous commit for commit_inputs
Date: Thu, 26 Mar 2026 14:15:54 +0800	[thread overview]
Message-ID: <20260326061554.20466-1-aethernet65535@gmail.com> (raw)
In-Reply-To: <20260325141956.87144-1-sj@kernel.org>

Hi SeongJae,

On Wed, 25 Mar 2026 07:19:56 -0700, SeongJae Park <sj@kernel.org> wrote:

> Hi Liew,
> 
> On Wed, 25 Mar 2026 15:17:09 +0800 Liew Rui Yan <aethernet65535@gmail.com> wrote:
> 
> > Hi SeongJae,
> > 
> > > Forwarding Sashiko review for doing discussions via mails.
> > > 
> > > # review url: https://sashiko.dev/#/patchset/20260325013939.18167-1-aethernet65535@gmail.com
> > > # start of sashiko.dev inline review
> > > commit 60ccea4154b0c58741fae2323454a5a9496b67fa
> > > Author: Liew Rui Yan <aethernet65535@gmail.com>
> > > 
> > > mm/damon: add synchronous commit for commit_inputs
> > > 
> > > Writing invalid parameters to sysfs followed by commit_inputs=Y previously
> > > failed silently. This patch uses damon_call() to synchronously commit
> > > parameters in the kdamond thread's safe context, returning validation errors
> > > immediately to userspace.
> > > 
> > > Link: https://lore.kernel.org/20260318153731.97470-1-aethernet65535@gmail.com
> > > 
> > > > diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
> > > > index 554559d72976..a2410f648b51 100644
> > > > --- a/mm/damon/lru_sort.c
> > > > +++ b/mm/damon/lru_sort.c
> > 
> > Thank you for forwarding the review from Sashiko.dev!
> 
> Thank you for reviewing the review.
> 
> > 
> > > > +static int damon_lru_sort_commit_inputs_store(const char *val,
> > > > +		const struct kernel_param *kp)
> > > > +{
> > > > +	bool commit_inputs_request;
> > > >  	int err;
> > > > +	struct damon_call_control control = {
> > > > +		.fn = damon_lru_sort_commit_inputs_fn,
> > > > +		.data = ctx,
> > > > +		.repeat = false,
> > > > +	};
> > > >  
> > > > -	if (!commit_inputs)
> > > > +	err = kstrtobool(val, &commit_inputs_request);
> > > > +	if (err)
> > > > +		return err;
> > > > +
> > > > +	if (!commit_inputs_request)
> > > >  		return 0;
> > > >  
> > > > -	err = damon_lru_sort_apply_parameters();
> > > > -	commit_inputs = false;
> > > > -	return err;
> > > > +	/*
> > > > +	 * Skip damon_call() during early boot or when kdamond is
> > > > +	 * not running to avoid NULL pointer dereference.
> > > > +	 */
> > > > +	if (!ctx)
> > > > +		return -EBUSY;
> > > > +
> > > > +	err = damon_call(ctx, &control);
> > > 
> > > Can this sequence lead to a system-wide deadlock on kernel_param_lock?
> > > 
> > > Because damon_lru_sort_commit_inputs_store() is a sysfs .set callback, it
> > > executes with the global kernel_param_lock held.
> > > 
> > > If kdamond_fn terminates (e.g., due to an invalid parameter like addr_unit=3
> > > causing a non-power-of-2 min_region_sz), it processes its exit sequence by
> > > calling kdamond_call(ctx, true) to flush pending requests. After flushing, it
> > > eventually sets ctx->kdamond = NULL at the very end of the thread.
> > > 
> > > If a user writes to commit_inputs exactly in the window after
> > > kdamond_call() finishes but before ctx->kdamond = NULL is set:
> > > 
> > > Thread 1 (kdamond_fn):
> > >     kdamond_call(ctx, true);
> > > 
> > > Thread 2 (sysfs write):
> > >     damon_call(ctx, &control);
> > >         if (damon_is_running(ctx))
> > >             wait_for_completion(&control.completion);
> > > 
> > > Thread 1 (kdamond_fn):
> > >     ctx->kdamond = NULL;
> > > 
> > > Thread 2 will sleep indefinitely in wait_for_completion(). Because Thread 2
> > > holds the global kernel_param_lock, this permanently freezes all module
> > > parameter sysfs interfaces system-wide.
> > 
> > I have verified this behavior with the following test case:
> > 
> >     # cd /sys/module/damon_lru_sort/parameters/
> >     # echo Y > enabled
> >     # ps aux | grep "[k]damond"
> >     root   70  0.0  0.0  0  0 ?  I  12:16  0:00 [kdamond.0]
> 
> FYI, you can read kdamond_pid parameter.
> 
> >     # echo 3 > addr_unit
> >     # echo Y > commit_inputs
> >     bash: echo: write error: Invalid argument
> >     # ps aux | grep "[k]damond"
> > 
> >     ... kdamond has exited unexpectedly
> > 
> > I will add a patch in v6 to validate 'addr_unit' in addr_unit_store() to
> > reject non-power-of-2 inputs immediately.
> 
> That's because kdamond_fn() has recently changed to exit the loop if
> damon_commit_ctx() fails in the loop for any reason.  Refer to commit
> 26f775a054c3 ("mm/damon/core: avoid use of half-online-committed context") for
> more details.
> 
> Adding the validation in addr_unit_store() would work, but it could be a kind
> of whack-a-mole game.  I'd prefer doing the validation of all inputs in one
> place, just before the last damon_commit_ctx() call.  DAMON_SYSFS is doing so,
> by making a test context and try committing user input to the test context
> before doing that to the running context.  How about doing so here, too?

Thank you for the suggestion. I have now added the check
'!src->addr_unit || ! is_power_of_2(src->addr_unit)' to
damon_commit_ctx().

> > 
> > > > +	if (err) {
> > > > +		/*
> > > > +		 * If kdamond is not running, damon_call() returns -EINVAL.
> > > > +		 * Remap it to -EBUSY to better reflect that the operation
> > > > +		 * failed due to the current state of the system.
> > > > +		 */
> > > > +		if (err == -EINVAL && !damon_is_running(ctx))
> > > > +			return -EBUSY;
> > > > +
> > > > +		return err;
> > > > +	}
> > > > +
> > > > +	return control.return_code;
> > > >  }
> > > 
> > > > diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
> > > > index 86da14778658..4cbd67e08d56 100644
> > > > --- a/mm/damon/reclaim.c
> > > > +++ b/mm/damon/reclaim.c
> > > 
> > > [ ... ]
> > > 
> > > > +	/*
> > > > +	 * Skip damon_call() during early boot or when kdamond is
> > > > +	 * not running to avoid NULL pointer dereference.
> > > > +	 */
> > > > +	if (!ctx)
> > > > +		return -EBUSY;
> > > > +
> > > > +	err = damon_call(ctx, &control);
> > > 
> > > Does this code share the same kernel_param_lock deadlock potential as
> > > damon_lru_sort_commit_inputs_store() described above?
> > 
> > As we discussed in the RFC-v4 thread [1], this is a false positive.
> > Since 'enabled=N' and 'commit_inputs=Y' are both serialized by the
> > global 'kernel_param_lock', and kthreads cannot be forcibly terminated
> > by userspace signals, the completion signal in damon_call() will always
> > be reached in a controlled manner.
> 
> TL; DR: I think the deadlock can happen.  But that's a separate issue.
> 
> Let's suppose the user writes 'Y' to commit_inputs, and damon_commit_ctx()
> fails due to an allocation failure.  The write to commit_inputs will complete,
> and kdmond_fn() will start its exit sequence.
> 
> The user can write 'Y' to commit_inputs again, when kdamond_fn() finished the
> last kdamond_call() with 'cancel=True', but before it unset the ->kdamond
> field.
> 
> Then, the deadlock that sashiko pointed out could happen.
> 
> The deadlock is due to the incomplete design of damon_call() cleanup.  And I
> think the same deadlock is already available with DAMON_SYSFS.  I will work on
> fixing this.

I really appreciate your dedication to DAMON.

> So the issue already exists.  But let's ensure the fix is merged before this
> patch, since this patch adds another exploitable path that can consequence in
> whole param_lock deadlock.

Should I wait for your fix to be merged into damon/next before I post
the next version?

Best regards,
Rui Yan

next prev parent reply	other threads:[~2026-03-26  6:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25  1:39 [RFC v5] mm/damon: add synchronous commit for commit_inputs Liew Rui Yan
2026-03-25  2:53 ` (sashiko review) " SeongJae Park
2026-03-25  7:17   ` Liew Rui Yan
2026-03-25 14:19     ` SeongJae Park
2026-03-26  6:15       ` Liew Rui Yan [this message]
2026-03-25 14:29 ` SeongJae Park
2026-03-26  6:16   ` Liew Rui Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260326061554.20466-1-aethernet65535@gmail.com \
    --to=aethernet65535@gmail.com \
    --cc=damon@lists.linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox