* [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race
@ 2026-03-27 23:33 SeongJae Park
2026-03-27 23:33 ` [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() " SeongJae Park
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: SeongJae Park @ 2026-03-27 23:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: SeongJae Park, # 6 . 14 . x, damon, linux-kernel, linux-mm
damon_call() and damos_walk() can leak memory and/or deadlock when they
race with kdamond terminations. Fix those.
Changes from RFC v3
(https://lore.kernel.org/20260327142605.4834-1-sj@kernel.org)
- Drop RFC tag.
- Rebase to latest mm-new.
Changes from RFC v2
(https://lore.kernel.org/20260327004952.58266-1-sj@kernel.org)
- Update and wordsmith commit message.
- Add damos_walk() race fix.
Changes from RFC v1
(https://lore.kernel.org/20260326062347.88569-3-sj@kernel.org)
- Clarify damon_call() call condition.
- Init call_controls_obsolete before kdamond_started completion.
- Wordsmith commit message.
- Split out repeat_call_control leak fix from the series.
SeongJae Park (2):
mm/damon/core: fix damon_call() vs kdamond_fn() exit race
mm/damon/core: fix damos_walk() vs kdamond_fn() exit race
include/linux/damon.h | 2 ++
mm/damon/core.c | 66 ++++++++++++++++++-------------------------
2 files changed, 30 insertions(+), 38 deletions(-)
base-commit: 305aff97ab8306284a0aa85f9128403b50c89019
--
2.47.3
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race
2026-03-27 23:33 [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race SeongJae Park
@ 2026-03-27 23:33 ` SeongJae Park
2026-03-28 0:43 ` (sashiko review) " SeongJae Park
2026-03-27 23:33 ` [PATCH 2/2] mm/damon/core: fix damos_walk() " SeongJae Park
2026-03-28 0:42 ` (sashiko status) [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond " SeongJae Park
2 siblings, 1 reply; 7+ messages in thread
From: SeongJae Park @ 2026-03-27 23:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: SeongJae Park, # 6 . 14 . x, damon, linux-kernel, linux-mm
When kdamond_fn() main loop is finished, the function cancels all
remaining damon_call() requests and unset the damon_ctx->kdamond so that
API callers and API functions themselves can know the context is
terminated. damon_call() adds the caller's request to the queue first.
After that, it shows if the kdamond of the damon_ctx is still running
(damon_ctx->kdamond is set). Only if the kdamond is running,
damon_call() starts waiting for the kdamond's handling of the newly
added request.
The damon_call() requests registration and damon_ctx->kdamond unset are
protected by different mutexes, though. Hence, damon_call() could race
with damon_ctx->kdamond unset, and result in deadlocks.
For example, let's suppose kdamond successfully finished the
damon_call() requests cancelling. Right after that, damon_call() is
called for the context. It registers the new request, and shows the
context is still running, because damon_ctx->kdamond unset is not yet
done. Hence the damon_call() caller starts waiting for the handling of
the request. However, the kdamond is already on the termination steps,
so it never handles the new request. As a result, the damon_call()
caller threads infinitely waits.
Fix this by introducing another damon_ctx field, namely
call_controls_obsolete. It is protected by the
damon_ctx->call_controls_lock, which protects damon_call() requests
registration. Initialize (unset) it in kdamond_fn() before letting
damon_start() returns and set it just before the cancelling of remaining
damon_call() requests is executed. damon_call() reads the obsolete
field under the lock and avoids adding a new request.
After this change, only requests that are guaranteed to be handled or
cancelled are registered. Hence the after-registration DAMON context
termination check is no longer needed. Remove it together.
Note that the deadlock will not happen when damon_call() is called for
repeat mode request. In tis case, damon_call() returns instead of
waiting for the handling when the request registration succeeds and it
shows the kdamond is running. However, if the request also has
dealloc_on_cancel, the request memory would be leaked.
The issue is found by sashiko [1].
[1] https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org
Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()")
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
include/linux/damon.h | 1 +
mm/damon/core.c | 45 ++++++++++++++-----------------------------
2 files changed, 15 insertions(+), 31 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index d9a3babbafc1..5129de70e7b7 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -818,6 +818,7 @@ struct damon_ctx {
/* lists of &struct damon_call_control */
struct list_head call_controls;
+ bool call_controls_obsolete;
struct mutex call_controls_lock;
struct damos_walk_control *walk_control;
diff --git a/mm/damon/core.c b/mm/damon/core.c
index db6c67e52d2b..9bcda2765ac9 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1573,35 +1573,6 @@ int damon_kdamond_pid(struct damon_ctx *ctx)
return pid;
}
-/*
- * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
- * an inactive context.
- * @ctx: The inactive DAMON context.
- * @control: Control variable of the call request.
- *
- * This function is called in a case that @control is added to @ctx but @ctx is
- * not running (inactive). See if @ctx handled @control or not, and cleanup
- * @control if it was not handled.
- *
- * Returns 0 if @control was handled by @ctx, negative error code otherwise.
- */
-static int damon_call_handle_inactive_ctx(
- struct damon_ctx *ctx, struct damon_call_control *control)
-{
- struct damon_call_control *c;
-
- mutex_lock(&ctx->call_controls_lock);
- list_for_each_entry(c, &ctx->call_controls, list) {
- if (c == control) {
- list_del(&control->list);
- mutex_unlock(&ctx->call_controls_lock);
- return -EINVAL;
- }
- }
- mutex_unlock(&ctx->call_controls_lock);
- return 0;
-}
-
/**
* damon_call() - Invoke a given function on DAMON worker thread (kdamond).
* @ctx: DAMON context to call the function for.
@@ -1619,6 +1590,10 @@ static int damon_call_handle_inactive_ctx(
* synchronization. The return value of the function will be saved in
* &damon_call_control->return_code.
*
+ * Note that this function should be called only after damon_start() with the
+ * @ctx has succeeded. Otherwise, this function could fall into an indefinite
+ * wait.
+ *
* Return: 0 on success, negative error code otherwise.
*/
int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
@@ -1629,10 +1604,12 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
INIT_LIST_HEAD(&control->list);
mutex_lock(&ctx->call_controls_lock);
+ if (ctx->call_controls_obsolete) {
+ mutex_unlock(&ctx->call_controls_lock);
+ return -ECANCELED;
+ }
list_add_tail(&control->list, &ctx->call_controls);
mutex_unlock(&ctx->call_controls_lock);
- if (!damon_is_running(ctx))
- return damon_call_handle_inactive_ctx(ctx, control);
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
@@ -2952,6 +2929,9 @@ static int kdamond_fn(void *data)
pr_debug("kdamond (%d) starts\n", current->pid);
+ mutex_lock(&ctx->call_controls_lock);
+ ctx->call_controls_obsolete = false;
+ mutex_unlock(&ctx->call_controls_lock);
complete(&ctx->kdamond_started);
kdamond_init_ctx(ctx);
@@ -3062,6 +3042,9 @@ static int kdamond_fn(void *data)
damon_destroy_targets(ctx);
kfree(ctx->regions_score_histogram);
+ mutex_lock(&ctx->call_controls_lock);
+ ctx->call_controls_obsolete = true;
+ mutex_unlock(&ctx->call_controls_lock);
kdamond_call(ctx, true);
damos_walk_cancel(ctx);
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: (sashiko review) [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race
2026-03-27 23:33 ` [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() " SeongJae Park
@ 2026-03-28 0:43 ` SeongJae Park
2026-03-28 0:51 ` SeongJae Park
0 siblings, 1 reply; 7+ messages in thread
From: SeongJae Park @ 2026-03-28 0:43 UTC (permalink / raw)
To: SeongJae Park; +Cc: Andrew Morton, # 6 . 14 . x, damon, linux-kernel, linux-mm
Forwarding Sashiko review for doing discussions via mails.
# review url: https://sashiko.dev/#/patchset/20260327233319.3528-2-sj@kernel.org
# start of sashiko.dev inline review
commit 9cf54c95f0c4bda07c5dbaf4019d0d731f6c58fd
Author: SeongJae Park <sj@kernel.org>
mm/damon/core: fix damon_call() vs kdamond_fn() exit race
This commit introduces a new call_controls_obsolete flag to prevent
damon_call() from adding requests to a terminating kdamond context.
It removes the after-registration DAMON context termination check.
Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org
> diff --git a/mm/damon/core.c b/mm/damon/core.c
> index db6c67e52d2b..9bcda2765ac9 100644
> --- a/mm/damon/core.c
> +++ b/mm/damon/core.c
[ ... ]
> @@ -1629,10 +1604,12 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
> INIT_LIST_HEAD(&control->list);
>
> mutex_lock(&ctx->call_controls_lock);
> + if (ctx->call_controls_obsolete) {
> + mutex_unlock(&ctx->call_controls_lock);
> + return -ECANCELED;
> + }
> list_add_tail(&control->list, &ctx->call_controls);
> mutex_unlock(&ctx->call_controls_lock);
> - if (!damon_is_running(ctx))
> - return damon_call_handle_inactive_ctx(ctx, control);
> if (control->repeat)
> return 0;
> wait_for_completion(&control->completion);
Can this code trigger a use-after-free on control?
This appears to be a pre-existing issue that remains unfixed. If
kdamond_fn() is terminating, it executes kdamond_call(ctx, true). This
function removes the request, sets it as canceled, and calls kfree(control)
if control->dealloc_on_cancel is true.
Since ctx->call_controls_lock is dropped right before the control->repeat
check:
CPU1 (kdamond_fn)
kdamond_call(ctx, true)
list_del(&control->list);
kfree(control);
CPU2 (damon_call)
mutex_unlock(&ctx->call_controls_lock);
if (control->repeat)
Could kdamond_call() free control before damon_call() evaluates
control->repeat?
Furthermore, if memory poisoning causes the freed memory to evaluate
control->repeat as false, would this proceed to call wait_for_completion()
on the freed memory and cause wait queue corruption?
# end of sashiko.dev inline review
# review url: https://sashiko.dev/#/patchset/20260327233319.3528-2-sj@kernel.org
#
# hkml [1] generated a draft of this mail. It can be regenerated
# using below command:
#
# hkml patch sashiko_dev --for_forwarding \
# 20260327233319.3528-2-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: (sashiko review) [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race
2026-03-28 0:43 ` (sashiko review) " SeongJae Park
@ 2026-03-28 0:51 ` SeongJae Park
0 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-03-28 0:51 UTC (permalink / raw)
To: SeongJae Park; +Cc: Andrew Morton, # 6 . 14 . x, damon, linux-kernel, linux-mm
On Fri, 27 Mar 2026 17:43:06 -0700 SeongJae Park <sj@kernel.org> wrote:
> Forwarding Sashiko review for doing discussions via mails.
>
> # review url: https://sashiko.dev/#/patchset/20260327233319.3528-2-sj@kernel.org
> # start of sashiko.dev inline review
> commit 9cf54c95f0c4bda07c5dbaf4019d0d731f6c58fd
> Author: SeongJae Park <sj@kernel.org>
>
> mm/damon/core: fix damon_call() vs kdamond_fn() exit race
>
> This commit introduces a new call_controls_obsolete flag to prevent
> damon_call() from adding requests to a terminating kdamond context.
> It removes the after-registration DAMON context termination check.
>
> Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org
>
> > diff --git a/mm/damon/core.c b/mm/damon/core.c
> > index db6c67e52d2b..9bcda2765ac9 100644
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
>
> [ ... ]
>
> > @@ -1629,10 +1604,12 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
> > INIT_LIST_HEAD(&control->list);
> >
> > mutex_lock(&ctx->call_controls_lock);
> > + if (ctx->call_controls_obsolete) {
> > + mutex_unlock(&ctx->call_controls_lock);
> > + return -ECANCELED;
> > + }
> > list_add_tail(&control->list, &ctx->call_controls);
> > mutex_unlock(&ctx->call_controls_lock);
> > - if (!damon_is_running(ctx))
> > - return damon_call_handle_inactive_ctx(ctx, control);
> > if (control->repeat)
> > return 0;
> > wait_for_completion(&control->completion);
>
> Can this code trigger a use-after-free on control?
>
> This appears to be a pre-existing issue that remains unfixed. If
> kdamond_fn() is terminating, it executes kdamond_call(ctx, true). This
> function removes the request, sets it as canceled, and calls kfree(control)
> if control->dealloc_on_cancel is true.
>
> Since ctx->call_controls_lock is dropped right before the control->repeat
> check:
>
> CPU1 (kdamond_fn)
> kdamond_call(ctx, true)
> list_del(&control->list);
> kfree(control);
>
> CPU2 (damon_call)
> mutex_unlock(&ctx->call_controls_lock);
> if (control->repeat)
>
> Could kdamond_call() free control before damon_call() evaluates
> control->repeat?
No. kdamond_call(ctx, true) is called only after call_controls_obsolete is
set, under the call_control_lock. And damon_call() reads the
call_controls_obsolete just after acquiring the call_control_lock, and return
if it is set.
Hence CPU2 in this scenario cannot execute the mutex_unlock() part after CPU1
entered kdamond_call().
>
> Furthermore, if memory poisoning causes the freed memory to evaluate
> control->repeat as false, would this proceed to call wait_for_completion()
> on the freed memory and cause wait queue corruption?
So this is also wrong.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] mm/damon/core: fix damos_walk() vs kdamond_fn() exit race
2026-03-27 23:33 [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race SeongJae Park
2026-03-27 23:33 ` [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() " SeongJae Park
@ 2026-03-27 23:33 ` SeongJae Park
2026-03-28 0:42 ` (sashiko status) [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond " SeongJae Park
2 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-03-27 23:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: SeongJae Park, # 6 . 14 . x, damon, linux-kernel, linux-mm
When kdamond_fn() main loop is finished, the function cancels remaining
damos_walk() request and unset the damon_ctx->kdamond so that API
callers and API functions themselves can show the context is terminated.
damos_walk() adds the caller's request to the queue first. After that,
it shows if the kdamond of the damon_ctx is still running
(damon_ctx->kdamond is set). Only if the kdamond is running,
damos_walk() starts waiting for the kdamond's handling of the newly
added request.
The damos_walk() requests registration and damon_ctx->kdamond unset are
protected by different mutexes, though. Hence, damos_walk() could race
with damon_ctx->kdamond unset, and result in deadlocks.
For example, let's suppose kdamond successfully finished the
damow_walk() request cancelling. Right after that, damos_walk() is
called for the context. It registers the new request, and shows the
context is still running, because damon_ctx->kdamond unset is not yet
done. Hence the damos_walk() caller starts waiting for the handling of
the request. However, the kdamond is already on the termination steps,
so it never handles the new request. As a result, the damos_walk()
caller thread infinitely waits.
Fix this by introducing another damon_ctx field, namely
walk_control_obsolete. It is protected by the
damon_ctx->walk_control_lock, which protects damos_walk() request
registration. Initialize (unset) it in kdamond_fn() before letting
damon_start() returns and set it just before the cancelling of the
remaining damos_walk() request is executed. damos_walk() reads the
obsolete field under the lock and avoids adding a new request.
After this change, only requests that are guaranteed to be handled or
cancelled are registered. Hence the after-registration DAMON context
termination check is no longer needed. Remove it together.
The issue is found by sashiko [1].
[1] https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org
Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()")
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj@kernel.org>
---
include/linux/damon.h | 1 +
mm/damon/core.c | 21 ++++++++++++++-------
2 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 5129de70e7b7..f2cdb7c3f5e6 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -822,6 +822,7 @@ struct damon_ctx {
struct mutex call_controls_lock;
struct damos_walk_control *walk_control;
+ bool walk_control_obsolete;
struct mutex walk_control_lock;
/*
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 9bcda2765ac9..ddabb93f2377 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1637,6 +1637,10 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
* passed at least one &damos->apply_interval_us, kdamond marks the request as
* completed so that damos_walk() can wakeup and return.
*
+ * Note that this function should be called only after damon_start() with the
+ * @ctx has succeeded. Otherwise, this function could fall into an indefinite
+ * wait.
+ *
* Return: 0 on success, negative error code otherwise.
*/
int damos_walk(struct damon_ctx *ctx, struct damos_walk_control *control)
@@ -1644,19 +1648,16 @@ int damos_walk(struct damon_ctx *ctx, struct damos_walk_control *control)
init_completion(&control->completion);
control->canceled = false;
mutex_lock(&ctx->walk_control_lock);
+ if (ctx->walk_control_obsolete) {
+ mutex_unlock(&ctx->walk_control_lock);
+ return -ECANCELED;
+ }
if (ctx->walk_control) {
mutex_unlock(&ctx->walk_control_lock);
return -EBUSY;
}
ctx->walk_control = control;
mutex_unlock(&ctx->walk_control_lock);
- if (!damon_is_running(ctx)) {
- mutex_lock(&ctx->walk_control_lock);
- if (ctx->walk_control == control)
- ctx->walk_control = NULL;
- mutex_unlock(&ctx->walk_control_lock);
- return -EINVAL;
- }
wait_for_completion(&control->completion);
if (control->canceled)
return -ECANCELED;
@@ -2932,6 +2933,9 @@ static int kdamond_fn(void *data)
mutex_lock(&ctx->call_controls_lock);
ctx->call_controls_obsolete = false;
mutex_unlock(&ctx->call_controls_lock);
+ mutex_lock(&ctx->walk_control_lock);
+ ctx->walk_control_obsolete = false;
+ mutex_unlock(&ctx->walk_control_lock);
complete(&ctx->kdamond_started);
kdamond_init_ctx(ctx);
@@ -3046,6 +3050,9 @@ static int kdamond_fn(void *data)
ctx->call_controls_obsolete = true;
mutex_unlock(&ctx->call_controls_lock);
kdamond_call(ctx, true);
+ mutex_lock(&ctx->walk_control_lock);
+ ctx->walk_control_obsolete = true;
+ mutex_unlock(&ctx->walk_control_lock);
damos_walk_cancel(ctx);
pr_debug("kdamond (%d) finishes\n", current->pid);
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: (sashiko status) [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race
2026-03-27 23:33 [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race SeongJae Park
2026-03-27 23:33 ` [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() " SeongJae Park
2026-03-27 23:33 ` [PATCH 2/2] mm/damon/core: fix damos_walk() " SeongJae Park
@ 2026-03-28 0:42 ` SeongJae Park
2 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-03-28 0:42 UTC (permalink / raw)
To: SeongJae Park; +Cc: Andrew Morton, # 6 . 14 . x, damon, linux-kernel, linux-mm
Forwarding sashiko.dev review status for this thread.
# review url: https://sashiko.dev/#/patchset/20260327233319.3528-1-sj@kernel.org
- [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race
- status: Reviewed
- review: ISSUES MAY FOUND
- [PATCH 2/2] mm/damon/core: fix damos_walk() vs kdamond_fn() exit race
- status: Reviewed
- review: No issues found.
# hkml [1] generated a draft of this mail. It can be regenerated
# using below command:
#
# hkml patch sashiko_dev --thread_status --for_forwarding \
# 20260327233319.3528-1-sj@kernel.org
#
# [1] https://github.com/sjp38/hackermail
Sent using hkml (https://github.com/sjp38/hackermail)
^ permalink raw reply [flat|nested] 7+ messages in thread
* FAILED: patch "[PATCH] mm/damon/core: fix damos_walk() vs kdamond_fn() exit race" failed to apply to 6.18-stable tree
@ 2026-05-01 11:01 gregkh
2026-05-02 3:02 ` [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race SeongJae Park
0 siblings, 1 reply; 7+ messages in thread
From: gregkh @ 2026-05-01 11:01 UTC (permalink / raw)
To: sj, akpm, stable; +Cc: stable
The patch below does not apply to the 6.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.18.y
git checkout FETCH_HEAD
git cherry-pick -x 33c3f6c2b48cd84b441dba1ee3e62290e53930f4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable@vger.kernel.org>' --in-reply-to '2026050141-likewise-trapeze-f1cc@gregkh' --subject-prefix 'PATCH 6.18.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 33c3f6c2b48cd84b441dba1ee3e62290e53930f4 Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj@kernel.org>
Date: Fri, 27 Mar 2026 16:33:15 -0700
Subject: [PATCH] mm/damon/core: fix damos_walk() vs kdamond_fn() exit race
When kdamond_fn() main loop is finished, the function cancels remaining
damos_walk() request and unset the damon_ctx->kdamond so that API callers
and API functions themselves can show the context is terminated.
damos_walk() adds the caller's request to the queue first. After that, it
shows if the kdamond of the damon_ctx is still running (damon_ctx->kdamond
is set). Only if the kdamond is running, damos_walk() starts waiting for
the kdamond's handling of the newly added request.
The damos_walk() requests registration and damon_ctx->kdamond unset are
protected by different mutexes, though. Hence, damos_walk() could race
with damon_ctx->kdamond unset, and result in deadlocks.
For example, let's suppose kdamond successfully finished the damow_walk()
request cancelling. Right after that, damos_walk() is called for the
context. It registers the new request, and shows the context is still
running, because damon_ctx->kdamond unset is not yet done. Hence the
damos_walk() caller starts waiting for the handling of the request.
However, the kdamond is already on the termination steps, so it never
handles the new request. As a result, the damos_walk() caller thread
infinitely waits.
Fix this by introducing another damon_ctx field, namely
walk_control_obsolete. It is protected by the
damon_ctx->walk_control_lock, which protects damos_walk() request
registration. Initialize (unset) it in kdamond_fn() before letting
damon_start() returns and set it just before the cancelling of the
remaining damos_walk() request is executed. damos_walk() reads the
obsolete field under the lock and avoids adding a new request.
After this change, only requests that are guaranteed to be handled or
cancelled are registered. Hence the after-registration DAMON context
termination check is no longer needed. Remove it together.
The issue is found by sashiko [1].
Link: https://lore.kernel.org/20260327233319.3528-3-sj@kernel.org
Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org [1]
Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 5129de70e7b7..f2cdb7c3f5e6 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -822,6 +822,7 @@ struct damon_ctx {
struct mutex call_controls_lock;
struct damos_walk_control *walk_control;
+ bool walk_control_obsolete;
struct mutex walk_control_lock;
/*
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 9bcda2765ac9..ddabb93f2377 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1637,6 +1637,10 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
* passed at least one &damos->apply_interval_us, kdamond marks the request as
* completed so that damos_walk() can wakeup and return.
*
+ * Note that this function should be called only after damon_start() with the
+ * @ctx has succeeded. Otherwise, this function could fall into an indefinite
+ * wait.
+ *
* Return: 0 on success, negative error code otherwise.
*/
int damos_walk(struct damon_ctx *ctx, struct damos_walk_control *control)
@@ -1644,19 +1648,16 @@ int damos_walk(struct damon_ctx *ctx, struct damos_walk_control *control)
init_completion(&control->completion);
control->canceled = false;
mutex_lock(&ctx->walk_control_lock);
+ if (ctx->walk_control_obsolete) {
+ mutex_unlock(&ctx->walk_control_lock);
+ return -ECANCELED;
+ }
if (ctx->walk_control) {
mutex_unlock(&ctx->walk_control_lock);
return -EBUSY;
}
ctx->walk_control = control;
mutex_unlock(&ctx->walk_control_lock);
- if (!damon_is_running(ctx)) {
- mutex_lock(&ctx->walk_control_lock);
- if (ctx->walk_control == control)
- ctx->walk_control = NULL;
- mutex_unlock(&ctx->walk_control_lock);
- return -EINVAL;
- }
wait_for_completion(&control->completion);
if (control->canceled)
return -ECANCELED;
@@ -2932,6 +2933,9 @@ static int kdamond_fn(void *data)
mutex_lock(&ctx->call_controls_lock);
ctx->call_controls_obsolete = false;
mutex_unlock(&ctx->call_controls_lock);
+ mutex_lock(&ctx->walk_control_lock);
+ ctx->walk_control_obsolete = false;
+ mutex_unlock(&ctx->walk_control_lock);
complete(&ctx->kdamond_started);
kdamond_init_ctx(ctx);
@@ -3046,6 +3050,9 @@ static int kdamond_fn(void *data)
ctx->call_controls_obsolete = true;
mutex_unlock(&ctx->call_controls_lock);
kdamond_call(ctx, true);
+ mutex_lock(&ctx->walk_control_lock);
+ ctx->walk_control_obsolete = true;
+ mutex_unlock(&ctx->walk_control_lock);
damos_walk_cancel(ctx);
pr_debug("kdamond (%d) finishes\n", current->pid);
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race
2026-05-01 11:01 FAILED: patch "[PATCH] mm/damon/core: fix damos_walk() vs kdamond_fn() exit race" failed to apply to 6.18-stable tree gregkh
@ 2026-05-02 3:02 ` SeongJae Park
0 siblings, 0 replies; 7+ messages in thread
From: SeongJae Park @ 2026-05-02 3:02 UTC (permalink / raw)
To: stable; +Cc: damon, SeongJae Park, Andrew Morton
Patch series "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit
race".
damon_call() and damos_walk() can leak memory and/or deadlock when they
race with kdamond terminations. Fix those.
This patch (of 2);
When kdamond_fn() main loop is finished, the function cancels all
remaining damon_call() requests and unset the damon_ctx->kdamond so that
API callers and API functions themselves can know the context is
terminated. damon_call() adds the caller's request to the queue first.
After that, it shows if the kdamond of the damon_ctx is still running
(damon_ctx->kdamond is set). Only if the kdamond is running, damon_call()
starts waiting for the kdamond's handling of the newly added request.
The damon_call() requests registration and damon_ctx->kdamond unset are
protected by different mutexes, though. Hence, damon_call() could race
with damon_ctx->kdamond unset, and result in deadlocks.
For example, let's suppose kdamond successfully finished the damon_call()
requests cancelling. Right after that, damon_call() is called for the
context. It registers the new request, and shows the context is still
running, because damon_ctx->kdamond unset is not yet done. Hence the
damon_call() caller starts waiting for the handling of the request.
However, the kdamond is already on the termination steps, so it never
handles the new request. As a result, the damon_call() caller threads
infinitely waits.
Fix this by introducing another damon_ctx field, namely
call_controls_obsolete. It is protected by the
damon_ctx->call_controls_lock, which protects damon_call() requests
registration. Initialize (unset) it in kdamond_fn() before letting
damon_start() returns and set it just before the cancelling of remaining
damon_call() requests is executed. damon_call() reads the obsolete field
under the lock and avoids adding a new request.
After this change, only requests that are guaranteed to be handled or
cancelled are registered. Hence the after-registration DAMON context
termination check is no longer needed. Remove it together.
Note that the deadlock will not happen when damon_call() is called for
repeat mode request. In tis case, damon_call() returns instead of waiting
for the handling when the request registration succeeds and it shows the
kdamond is running. However, if the request also has dealloc_on_cancel,
the request memory would be leaked.
The issue is found by sashiko [1].
Link: https://lore.kernel.org/20260327233319.3528-1-sj@kernel.org
Link: https://lore.kernel.org/20260327233319.3528-2-sj@kernel.org
Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org [1]
Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> # 6.14.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 55da81663b9642dd046b26dd6f1baddbcf337c1e)
Signed-off-by: SeongJae Park <sj@kernel.org>
---
include/linux/damon.h | 1 +
mm/damon/core.c | 45 ++++++++++++++-----------------------------
2 files changed, 15 insertions(+), 31 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 1a8a79d7e4e8d..6fe6f7fcf83d8 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -781,6 +781,7 @@ struct damon_ctx {
/* lists of &struct damon_call_control */
struct list_head call_controls;
+ bool call_controls_obsolete;
struct mutex call_controls_lock;
struct damos_walk_control *walk_control;
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 87b6c9c2d6471..3fb199a47fa9e 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1431,35 +1431,6 @@ bool damon_is_running(struct damon_ctx *ctx)
return running;
}
-/*
- * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
- * an inactive context.
- * @ctx: The inactive DAMON context.
- * @control: Control variable of the call request.
- *
- * This function is called in a case that @control is added to @ctx but @ctx is
- * not running (inactive). See if @ctx handled @control or not, and cleanup
- * @control if it was not handled.
- *
- * Returns 0 if @control was handled by @ctx, negative error code otherwise.
- */
-static int damon_call_handle_inactive_ctx(
- struct damon_ctx *ctx, struct damon_call_control *control)
-{
- struct damon_call_control *c;
-
- mutex_lock(&ctx->call_controls_lock);
- list_for_each_entry(c, &ctx->call_controls, list) {
- if (c == control) {
- list_del(&control->list);
- mutex_unlock(&ctx->call_controls_lock);
- return -EINVAL;
- }
- }
- mutex_unlock(&ctx->call_controls_lock);
- return 0;
-}
-
/**
* damon_call() - Invoke a given function on DAMON worker thread (kdamond).
* @ctx: DAMON context to call the function for.
@@ -1477,6 +1448,10 @@ static int damon_call_handle_inactive_ctx(
* synchronization. The return value of the function will be saved in
* &damon_call_control->return_code.
*
+ * Note that this function should be called only after damon_start() with the
+ * @ctx has succeeded. Otherwise, this function could fall into an indefinite
+ * wait.
+ *
* Return: 0 on success, negative error code otherwise.
*/
int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
@@ -1487,10 +1462,12 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
INIT_LIST_HEAD(&control->list);
mutex_lock(&ctx->call_controls_lock);
+ if (ctx->call_controls_obsolete) {
+ mutex_unlock(&ctx->call_controls_lock);
+ return -ECANCELED;
+ }
list_add_tail(&control->list, &ctx->call_controls);
mutex_unlock(&ctx->call_controls_lock);
- if (!damon_is_running(ctx))
- return damon_call_handle_inactive_ctx(ctx, control);
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
@@ -2640,6 +2617,9 @@ static int kdamond_fn(void *data)
pr_debug("kdamond (%d) starts\n", current->pid);
+ mutex_lock(&ctx->call_controls_lock);
+ ctx->call_controls_obsolete = false;
+ mutex_unlock(&ctx->call_controls_lock);
complete(&ctx->kdamond_started);
kdamond_init_ctx(ctx);
@@ -2749,6 +2729,9 @@ static int kdamond_fn(void *data)
if (ctx->ops.cleanup)
ctx->ops.cleanup(ctx);
kfree(ctx->regions_score_histogram);
+ mutex_lock(&ctx->call_controls_lock);
+ ctx->call_controls_obsolete = true;
+ mutex_unlock(&ctx->call_controls_lock);
kdamond_call(ctx, true);
pr_debug("kdamond (%d) finishes\n", current->pid);
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-02 3:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 23:33 [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race SeongJae Park
2026-03-27 23:33 ` [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() " SeongJae Park
2026-03-28 0:43 ` (sashiko review) " SeongJae Park
2026-03-28 0:51 ` SeongJae Park
2026-03-27 23:33 ` [PATCH 2/2] mm/damon/core: fix damos_walk() " SeongJae Park
2026-03-28 0:42 ` (sashiko status) [PATCH 0/2] mm/damon/core: fix damon_call()/damos_walk() vs kdmond " SeongJae Park
-- strict thread matches above, loose matches on Subject: below --
2026-05-01 11:01 FAILED: patch "[PATCH] mm/damon/core: fix damos_walk() vs kdamond_fn() exit race" failed to apply to 6.18-stable tree gregkh
2026-05-02 3:02 ` [PATCH 1/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race SeongJae Park
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.