All of lore.kernel.org
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: JaeJoon Jung <rgbi3307@gmail.com>
Cc: SeongJae Park <sj@kernel.org>,
	damon@lists.linux.dev, linux-mm@kvack.org, rgbi3307@nate.com
Subject: Re: [PATCH] mm/damon/sysfs: preventing duplicated list_add_tail() at the damon_call()
Date: Thu, 25 Dec 2025 11:49:58 -0800	[thread overview]
Message-ID: <20251225194959.937-1-sj@kernel.org> (raw)
In-Reply-To: <CAHOvCC4zwbiLXRTHav67GcngiVZ-5GESqkY9juLtqAfWQW1UQg@mail.gmail.com>

On Thu, 25 Dec 2025 11:35:33 +0900 JaeJoon Jung <rgbi3307@gmail.com> wrote:

> On Thu, 25 Dec 2025 at 09:32, SeongJae Park <sj@kernel.org> wrote:
> >
> > Hello JaeJoon,
> >
> > On Wed, 24 Dec 2025 18:43:58 +0900 JaeJoon Jung <rgbi3307@gmail.com> wrote:
> >
> > >  cd /sys/kernel/mm/damon/admin
> > >  echo "off" > kdamonds/0/state
> > >
> > >  echo "commit" > kdamonds/0/state
> > >  echo "commit" > kdamonds/0/state
> > >
> > > If you repeat "commit" twice with the kdamonds/0/state set to "off"
> > > with the above command, list_add corruption error occurs as follows:
> > >
> > > 4-page vmalloc region starting at 0xffffffc600a38000 allocated at
> > > kernel_clone+0x44/0x41e
> > > ------------[ cut here ]------------
> > > list_add corruption. prev->next should be next (ffffffd6c7c5a6a8),
> > >          but was ffffffc600a3bcc8. (prev=ffffffc600a3bcc8).
> > > WARNING: lib/list_debug.c:32 at __list_add_valid_or_report+
> > > 0xd8/0xe2, CPU#0: bash/466
> > > Modules linked in: dwmac_starfive stmmac_platform stmmac pcs_xpcs phylink
> > > CPU: 0 UID: 0 PID: 466 Comm: bash Tainted: G W 6.19.0-rc2+ #1 PREEMPTLAZY
> > > Tainted: [W]=WARN
> > > Hardware name: StarFive VisionFive 2 v1.3B (DT)
> > >  epc : __list_add_valid_or_report+0xd8/0xe2
> > >  ra : __list_add_valid_or_report+0xd8/0xe2
> > >  epc : ffffffff80540bce ra : ffffffff80540bce sp : ffffffc600a3bc00
> > >  gp : ffffffff81caec40 tp : ffffffd6c036f080 t0 : 0000000000000000
> > >  t1 : 0000000000006000 t2 : 0000000000000002 s0 : ffffffc600a3bc30
> > >  s1 : ffffffc600a3bcc8 a0 : ffffffd6fbf49a40 a1 : ffffffd6c036f080
> > >  a2 : 0000000000000000 a3 : 0000000000000001 a4 : 0000000000000000
> > >  a5 : 0000000000000000 a6 : 0000000020000000 a7 : 0000000000000001
> > >  s2 : ffffffd6c7c5a6a8 s3 : ffffffc600a3bcc8 s4 : ffffffc600a3bcc8
> > >  s5 : ffffffd6c7c5a6b8 s6 : ffffffd6c7c5a6a8 s7 : 0000003ff3f32794
> > >  s8 : 0000002ab38c9118 s9 : 0000000000000065 s10: 0000003f823a5cb8
> > >  s11: 0000003f823264e8 t3 : 0000000000000001 t4 : 0000000000000000
> > >  t5 : 00000000fa83b2da t6 : 000000000051df90
> > > status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
> > > [<ffffffff80540bce>] __list_add_valid_or_report+0xd8/0xe2
> > > [<ffffffff80255c86>] damon_call+0x52/0xe8
> > > [<ffffffff8025c9a8>] damon_sysfs_damon_call+0x60/0x8a
> > > [<ffffffff8025daf4>] state_store+0xfc/0x294
> > > [<ffffffff80dbf1fa>] kobj_attr_store+0xe/0x1a
> > > [<ffffffff802f070c>] sysfs_kf_write+0x42/0x56
> > > [<ffffffff802eef4e>] kernfs_fop_write_iter+0xf4/0x178
> > > [<ffffffff8026545c>] vfs_write+0x1b6/0x3b2
> > > [<ffffffff80265782>] ksys_write+0x52/0xbc
> > > [<ffffffff80265800>] __riscv_sys_write+0x14/0x1c
> > > [<ffffffff80ddf124>] do_trap_ecall_u+0x19c/0x26e
> > > [<ffffffff80deaa38>] handle_exception+0x150/0x15c
> > > ---[ end trace 0000000000000000 ]---
> > > -bash: echo: write error: Invalid argument
> >
> > Thank you for finding issue!
> >
> > Also appreciate for sharing your detailed reproducer.  Nevertheless, I think
> > the reproducer can be more detailed.  E.g., you could explicitly explain the
> > fact that the reproduction step should be executed only after starting DAMON
> > with the kdamond, and the kernel should run with CONFIG_lIST_HARDENED to get
> > the output from the kernel log.
> 
> Yes, as you said, I ran it under CONFIG_LIST_HARDENED=y condition.

Thank you for confirming.

> 
> >
> > >
> > > The cause of the above error is that list_add_tail() is executed
> > > repeatedly while executing damon_call(ctx, control)
> > > in damon_sysfs_damon_call().  The execution flow is summarized below:
> > >
> > > damon_sysfs_damon_call()
> > >         --> damon_call(ctx, control)
> > >                 list_add_tail(control, ctx->call_contols);
> > >                         --> /* list_add corruption error */
> > >                 if (!damon_is_running)
> > >                         return -EINVAL;
> > >
> > > If you execute damon_call() when damon_sysfs_kdamond_running() is true,
> > > you can prevent the error of duplicate execution of list_add_tail().
> >
> > The kdamond might be terminated between the damon_call() call and the
> > damon_is_running() check inside the damon_call() execution.  In the case, the
> > problem may still happen.
> >
> > The problem happens because damon_call() is not removing the damon_call_control
> > object before returning the error, right?  What about removing the object
> > before returning the error?
> 
> damon_call() is called after damon_start() --> kdamond_fn() is executed,
> This is a problem because damon_call() also occurs when kdamond is "off"
> only in damon/sysfs.  So, my first patch solved the problem, but the
> following also worked. I tested it.
> 
> And it seems better to keep the existing method of releasing
> damon_call_control.  Since the damon_call_control structure uses both
> static and kmalloc(),  it's appropriate to release it in kdamond_fn() according
> to the condition control->canceled && control->dealloc_on_cancel.
> 
> My previous suggestion regarding this:
> https://lore.kernel.org/damon/20251206224724.13832-1-rgbi3307@gmail.com/
> 
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index babad37719b6..2ead0bb3c462 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
> @@ -1462,6 +1462,9 @@ bool damon_is_running(struct damon_ctx *ctx)
>   */
>  int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
>  {
> +       if (!damon_is_running(ctx))
> +               return -EINVAL;
> +
>         if (!control->repeat)
>                 init_completion(&control->completion);
>         control->canceled = false;
> @@ -1470,8 +1473,6 @@ int damon_call(struct damon_ctx *ctx, struct
> damon_call_control *control)
>         mutex_lock(&ctx->call_controls_lock);
>         list_add_tail(&control->list, &ctx->call_controls);
>         mutex_unlock(&ctx->call_controls_lock);
> -       if (!damon_is_running(ctx))
> -               return -EINVAL;
>         if (control->repeat)
>                 return 0;
>         wait_for_completion(&control->completion);

Let's assume DAMON is terminated between the damon_is_running() and
list_add_tail().  In the case, the control->fn() will never be called back.  If
control->repeat is false, this function will even inifnitely wait.

I think we should keep the damon_is_running() as is, but further check if it
was terminated without handling the control object, and remove it from the list
in the case.  Like below.

diff --git a/mm/damon/core.c b/mm/damon/core.c
index c3ae55b940cc..23d8602f5a49 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1649,6 +1649,35 @@ bool damon_is_running(struct damon_ctx *ctx)
        return running;
 }

+/*
+ * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
+ *                                   an inactive context.
+ * @ctx:       The inactive DAMON context.
+ * @control:   Control variable of the call request.
+ *
+ * This function is called in a case that @control is added to @ctx but @ctx is
+ * not running (inactive).  See if @ctx handled @control or not, and cleanup
+ * @control if it was not handled.
+ *
+ * Returns 0 if @control was handled by @ctx, negative error code otherwise.
+ */
+static int damon_call_handle_inactive_ctx(
+               struct damon_ctx *ctx, struct damon_call_control *control)
+{
+       struct damon_call_control *c;
+
+       mutex_lock(&ctx->call_controls_lock);
+       list_for_each_entry(c, &ctx->call_controls, list) {
+               if (c == control) {
+                       list_del(&control->list);
+                       mutex_unlock(&ctx->call_controls_lock);
+                       return -EINVAL;
+               }
+       }
+       mutex_unlock(&ctx->call_controls_lock);
+       return 0;
+}
+
 /**
  * damon_call() - Invoke a given function on DAMON worker thread (kdamond).
  * @ctx:       DAMON context to call the function for.
@@ -1679,7 +1708,7 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
        list_add_tail(&control->list, &ctx->call_controls);
        mutex_unlock(&ctx->call_controls_lock);
        if (!damon_is_running(ctx))
-               return -EINVAL;
+               return damon_call_handle_inactive_ctx(ctx, control);
        if (control->repeat)
                return 0;
        wait_for_completion(&control->completion);

If you don't mind, I'll post the above diff as a patch, adding a 'Reported-by:'
tag for you.

> 
> >
> > >
> > > Signed-off-by: JaeJoon Jung <rgbi3307@gmail.com>
> >
> > Could you please also add Fixes: and Cc: stable@ ?
> 
> I don't have much experience with this, so I'm sorry,
> but could you please give me an example about this?

No problem.

Please refer to https://docs.kernel.org/process/stable-kernel-rules.html


Thanks,
SJ

[...]

  reply	other threads:[~2025-12-25 19:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-24  9:43 [PATCH] mm/damon/sysfs: preventing duplicated list_add_tail() at the damon_call() JaeJoon Jung
2025-12-25  0:32 ` SeongJae Park
2025-12-25  2:35   ` JaeJoon Jung
2025-12-25 19:49     ` SeongJae Park [this message]
2025-12-26  1:48       ` JaeJoon Jung
2025-12-26 18:41         ` SeongJae Park
2025-12-26 23:53           ` JaeJoon Jung
2025-12-27 17:42             ` SeongJae Park
2025-12-29  3:38               ` JaeJoon Jung
2025-12-29 15:14                 ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251225194959.937-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=linux-mm@kvack.org \
    --cc=rgbi3307@gmail.com \
    --cc=rgbi3307@nate.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.