From: SeongJae Park <sj@kernel.org>
To: Yunjeong Mun <yunjeong.mun@sk.com>
Cc: SeongJae Park <sj@kernel.org>,
damon@lists.linux.dev, honggyu.kim@sk.com,
kernel_team@skhynix.com
Subject: Re: [BUG] 'damo stop' causes kernel crash in v6.17-rc3
Date: Thu, 4 Sep 2025 20:54:11 -0700 [thread overview]
Message-ID: <20250905035411.39501-1-sj@kernel.org> (raw)
In-Reply-To: <20250904082946.949-1-yunjeong.mun@sk.com>
On Thu, 4 Sep 2025 17:29:45 +0900 Yunjeong Mun <yunjeong.mun@sk.com> wrote:
> On Wed, 3 Sep 2025 21:02:03 -0700 SeongJae Park <sj@kernel.org> wrote:
> > Hi Yunjeong,
> >
> > On Thu, 4 Sep 2025 10:17:38 +0900 Yunjeong Mun <yunjeong.mun@sk.com> wrote:
> >
> > > Hi!
> > >
> > > I encountered a kernel crash when running 'damo stop' in kernel v6.17-rc3,
> > > I tested and confirmed that this issue also occurs in v6.17-rc1.
[...]
> > I found commit d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file
> > internal work") is the first bad commit, according to 'git bisect'. And
> > actually the code is broken for multiple kdamonds case, since it is sharing one
> > damon_call_control object for multiple kdamonds while overwriting the data
> > field to later-called one.
More problematically, the damon_call_control->list is continuously overwritten
by the multiple kdamond threads. This corrupts the damon_call_control lists of
the contexts, and as a result kdamond_call() infinitely loops. Hence
kdamond_fn() cannot catch the termination request and the hang happens.
> >
> > I haven't yet deep dive into by what code path the issue happens, but sharing
> > this first, since I have to go out soon. I'll further take a look later.
> > Meanwhile, could you please also confirm if it is the first bad commit for your
> > issue, too?
>
> Thanks for sharing your analysis!
> I also confirmed that first bad commit is d809a7c64ba8.
> The previous commit b907494768e5 doesn't cause the above issue.
Thank you for confirming. I confirmed attaching patch fixes the problem with
your repro on my setup. Could you please also test that on your machines and
confirm if it fixes the issues on your setups, too? If you confirm, I will
post it soon.
[...]
> > 'scripts/decode_stacktrace.sh' can show which line of what source file each of
> > the above line points. So if you could share the output of the script from
> > your next bug reports, it would be pretty helpful.
>
> I'm not aware of this tool, thank you for letting me know!
> I'll trying using it next time.
No worry, and let me know if you need any help for that.
And please don't delay or hesitate reporting new issues in future for learning
of a tool. I'd prefer getting early and incomplete issue reports much more
than late and complete reports. :)
Thanks,
SJ
[...]
=== >8 ===
From 6754cdb95c03313fd7d9f104b9dbe851ecef237e Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj@kernel.org>
Date: Thu, 4 Sep 2025 20:18:46 -0700
Subject: [PATCH] mm/damon/sysfs: use dynamically allocated repeat mode
damon_call_control
For testing.
Fixes: d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file internal work") # v6.17.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
include/linux/damon.h | 2 ++
mm/damon/core.c | 8 ++++++--
mm/damon/sysfs.c | 23 +++++++++++++++--------
3 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index ec8716292c09..aa7381be388c 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -636,6 +636,7 @@ struct damon_operations {
* @data: Data that will be passed to @fn.
* @repeat: Repeat invocations.
* @return_code: Return code from @fn invocation.
+ * @dealloc_on_cancel: De-allocate when canceled.
*
* Control damon_call(), which requests specific kdamond to invoke a given
* function. Refer to damon_call() for more details.
@@ -645,6 +646,7 @@ struct damon_call_control {
void *data;
bool repeat;
int return_code;
+ bool dealloc_on_cancel;
/* private: internal use only */
/* informs if the kdamond finished handling of the request */
struct completion completion;
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 7aeb3f24aae8..be5942435d78 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -2510,10 +2510,14 @@ static void kdamond_call(struct damon_ctx *ctx, bool cancel)
mutex_lock(&ctx->call_controls_lock);
list_del(&control->list);
mutex_unlock(&ctx->call_controls_lock);
- if (!control->repeat)
+ if (!control->repeat) {
complete(&control->completion);
- else
+ } else if (control->canceled && control->dealloc_on_cancel) {
+ kfree(control);
+ continue;
+ } else {
list_add(&control->list, &repeat_controls);
+ }
}
control = list_first_entry_or_null(&repeat_controls,
struct damon_call_control, list);
diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
index 0ed404c89f80..a182670493bb 100644
--- a/mm/damon/sysfs.c
+++ b/mm/damon/sysfs.c
@@ -1565,14 +1565,10 @@ static int damon_sysfs_repeat_call_fn(void *data)
return 0;
}
-static struct damon_call_control damon_sysfs_repeat_call_control = {
- .fn = damon_sysfs_repeat_call_fn,
- .repeat = true,
-};
-
static int damon_sysfs_turn_damon_on(struct damon_sysfs_kdamond *kdamond)
{
struct damon_ctx *ctx;
+ struct damon_call_control *repeat_call_control;
int err;
if (damon_sysfs_kdamond_running(kdamond))
@@ -1585,18 +1581,29 @@ static int damon_sysfs_turn_damon_on(struct damon_sysfs_kdamond *kdamond)
damon_destroy_ctx(kdamond->damon_ctx);
kdamond->damon_ctx = NULL;
+ repeat_call_control = kmalloc(sizeof(*repeat_call_control),
+ GFP_KERNEL);
+ if (!repeat_call_control)
+ return -ENOMEM;
+
ctx = damon_sysfs_build_ctx(kdamond->contexts->contexts_arr[0]);
- if (IS_ERR(ctx))
+ if (IS_ERR(ctx)) {
+ kfree(repeat_call_control);
return PTR_ERR(ctx);
+ }
err = damon_start(&ctx, 1, false);
if (err) {
+ kfree(repeat_call_control);
damon_destroy_ctx(ctx);
return err;
}
kdamond->damon_ctx = ctx;
- damon_sysfs_repeat_call_control.data = kdamond;
- damon_call(ctx, &damon_sysfs_repeat_call_control);
+ repeat_call_control->fn = damon_sysfs_repeat_call_fn;
+ repeat_call_control->data = kdamond;
+ repeat_call_control->repeat = true;
+ repeat_call_control->dealloc_on_cancel = true;
+ damon_call(ctx, repeat_call_control);
return err;
}
--
2.39.5
next prev parent reply other threads:[~2025-09-05 3:54 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-04 1:17 [BUG] 'damo stop' causes kernel crash in v6.17-rc3 Yunjeong Mun
2025-09-04 4:02 ` SeongJae Park
2025-09-04 8:29 ` Yunjeong Mun
2025-09-05 3:54 ` SeongJae Park [this message]
2025-09-05 9:08 ` Yunjeong Mun
2025-09-05 20:07 ` SeongJae Park
2025-09-08 4:36 ` Yunjeong Mun
2025-09-08 20:18 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250905035411.39501-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=damon@lists.linux.dev \
--cc=honggyu.kim@sk.com \
--cc=kernel_team@skhynix.com \
--cc=yunjeong.mun@sk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.