* [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock @ 2026-03-26 6:23 SeongJae Park 2026-03-26 6:23 ` [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock SeongJae Park 2026-03-26 13:42 ` (sashiko status) [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock SeongJae Park 0 siblings, 2 replies; 6+ messages in thread From: SeongJae Park @ 2026-03-26 6:23 UTC (permalink / raw) Cc: SeongJae Park, # 6 . 14 . x, Andrew Morton, damon, linux-kernel, linux-mm DAMON_SYSFS can leak memory due to improper handling of damon_call() failure. It can also cause a deadlock due to a race between damon_call() and kdamond_fn() termination. Fix those. SeongJae Park (2): mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock include/linux/damon.h | 1 + mm/damon/core.c | 41 ++++++++++------------------------------- mm/damon/sysfs.c | 3 ++- 3 files changed, 13 insertions(+), 32 deletions(-) base-commit: 2b3fbc1796d335685d9b7a825c621914a1c97d1d -- 2.47.3 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock 2026-03-26 6:23 [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock SeongJae Park @ 2026-03-26 6:23 ` SeongJae Park 2026-03-26 13:43 ` (sashiko review) " SeongJae Park 2026-03-26 13:42 ` (sashiko status) [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock SeongJae Park 1 sibling, 1 reply; 6+ messages in thread From: SeongJae Park @ 2026-03-26 6:23 UTC (permalink / raw) Cc: SeongJae Park, # 6 . 14 . x, Andrew Morton, damon, linux-kernel, linux-mm When kdamond_fn() main loop is finished, the function cancels all remaining damon_call() requests and unset the damon_ctx->kdamond so that API callers can show the context is terminated. damon_call() adds the caller's request to the queue first. After that, it shows if the kdamond is of the damon_ctx is still running (damon_ctx->kdamond is set). Only if the kdamond is running, damon_call() starts waiting for the kdamond's handling of the newly added request. The damon_call() requests registration and damon_ctx->kdamond unset are protected by different mutexes, though. Hence, damon_call() could race with damon_ctx->kdamond unset, and results in deadlocks. For example, let's suppose kdamond successfully finished the damon_call() requests cancelling. Right after that, damon_call() is called for the context. It registers the new request, and show the context is still running, because damon_ctx->kdamond unset is not yet done. Hence the damon_call() caller starts waiting for the handling of the request. However, the kdamond is already on the termination steps, so never handles the new request. As a result, the damon_call() caller threads infinitely waits. Fix this by introducing another damon_ctx field, namely call_controls_obsolete. It is protected by the damon_ctx->call_controls_lock, which protects damon_call() registration. Initialize (unset) it in kdamond_init_ctx() and set just before the cancelling of remaining damon_call() is executed. damon_call() reads the obsolete field under the lock and avoid adding a new request. After this change, only requests that guaranteed to be handled or cancelled are registered. Hence the after-registration DAMON context termination check is no more needed. Remove it together. The issue is found by sashiko [1]. [1] https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()") Cc: <stable@vger.kernel.org> # 6.14.x Signed-off-by: SeongJae Park <sj@kernel.org> --- include/linux/damon.h | 1 + mm/damon/core.c | 41 ++++++++++------------------------------- 2 files changed, 11 insertions(+), 31 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index d9a3babbafc16..5129de70e7b70 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -818,6 +818,7 @@ struct damon_ctx { /* lists of &struct damon_call_control */ struct list_head call_controls; + bool call_controls_obsolete; struct mutex call_controls_lock; struct damos_walk_control *walk_control; diff --git a/mm/damon/core.c b/mm/damon/core.c index db6c67e52d2b8..a2b553e2c5a81 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -1573,35 +1573,6 @@ int damon_kdamond_pid(struct damon_ctx *ctx) return pid; } -/* - * damon_call_handle_inactive_ctx() - handle DAMON call request that added to - * an inactive context. - * @ctx: The inactive DAMON context. - * @control: Control variable of the call request. - * - * This function is called in a case that @control is added to @ctx but @ctx is - * not running (inactive). See if @ctx handled @control or not, and cleanup - * @control if it was not handled. - * - * Returns 0 if @control was handled by @ctx, negative error code otherwise. - */ -static int damon_call_handle_inactive_ctx( - struct damon_ctx *ctx, struct damon_call_control *control) -{ - struct damon_call_control *c; - - mutex_lock(&ctx->call_controls_lock); - list_for_each_entry(c, &ctx->call_controls, list) { - if (c == control) { - list_del(&control->list); - mutex_unlock(&ctx->call_controls_lock); - return -EINVAL; - } - } - mutex_unlock(&ctx->call_controls_lock); - return 0; -} - /** * damon_call() - Invoke a given function on DAMON worker thread (kdamond). * @ctx: DAMON context to call the function for. @@ -1629,10 +1600,12 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control) INIT_LIST_HEAD(&control->list); mutex_lock(&ctx->call_controls_lock); + if (ctx->call_controls_obsolete) { + mutex_unlock(&ctx->call_controls_lock); + return -ECANCELED; + } list_add_tail(&control->list, &ctx->call_controls); mutex_unlock(&ctx->call_controls_lock); - if (!damon_is_running(ctx)) - return damon_call_handle_inactive_ctx(ctx, control); if (control->repeat) return 0; wait_for_completion(&control->completion); @@ -2939,6 +2912,9 @@ static void kdamond_init_ctx(struct damon_ctx *ctx) damos_set_next_apply_sis(scheme, ctx); damos_set_filters_default_reject(scheme); } + mutex_lock(&ctx->call_controls_lock); + ctx->call_controls_obsolete = false; + mutex_unlock(&ctx->call_controls_lock); } /* @@ -3062,6 +3038,9 @@ static int kdamond_fn(void *data) damon_destroy_targets(ctx); kfree(ctx->regions_score_histogram); + mutex_lock(&ctx->call_controls_lock); + ctx->call_controls_obsolete = true; + mutex_unlock(&ctx->call_controls_lock); kdamond_call(ctx, true); damos_walk_cancel(ctx); -- 2.47.3 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: (sashiko review) [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock 2026-03-26 6:23 ` [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock SeongJae Park @ 2026-03-26 13:43 ` SeongJae Park 2026-03-26 13:51 ` SeongJae Park 0 siblings, 1 reply; 6+ messages in thread From: SeongJae Park @ 2026-03-26 13:43 UTC (permalink / raw) To: SeongJae Park; +Cc: # 6 . 14 . x, Andrew Morton, damon, linux-kernel, linux-mm Forwarding Sashiko review for doing discussions via mails. # review url: https://sashiko.dev/#/patchset/20260326062347.88569-3-sj@kernel.org # start of sashiko.dev inline review commit b886bf00a72b1462f29533c91c431cd2ba3fa3a6 Author: SeongJae Park <sj@kernel.org> mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock This commit introduces a call_controls_obsolete flag to protect damon_call() from racing with the kdamond termination sequence. This replaces the previous after-registration damon_is_running() check. Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org > diff --git a/mm/damon/core.c b/mm/damon/core.c > index db6c67e52d2b8..a2b553e2c5a81 100644 > --- a/mm/damon/core.c > +++ b/mm/damon/core.c [ ... ] > @@ -1629,10 +1600,12 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control) > INIT_LIST_HEAD(&control->list); > > mutex_lock(&ctx->call_controls_lock); > + if (ctx->call_controls_obsolete) { > + mutex_unlock(&ctx->call_controls_lock); > + return -ECANCELED; > + } > list_add_tail(&control->list, &ctx->call_controls); > mutex_unlock(&ctx->call_controls_lock); > - if (!damon_is_running(ctx)) > - return damon_call_handle_inactive_ctx(ctx, control); Does this removal cause an indefinite wait if damon_call() is invoked on an unstarted context? For a newly allocated damon_ctx, call_controls_obsolete is zero-initialized to false. Without the damon_is_running() check, if a caller invokes damon_call() before damon_start(), it appears the request will be queued and block on wait_for_completion() indefinitely since no kdamond thread is running to handle or cancel it. > if (control->repeat) > return 0; > wait_for_completion(&control->completion); [ ... ] > @@ -3062,6 +3038,9 @@ static int kdamond_fn(void *data) > damon_destroy_targets(ctx); > > kfree(ctx->regions_score_histogram); > + mutex_lock(&ctx->call_controls_lock); > + ctx->call_controls_obsolete = true; > + mutex_unlock(&ctx->call_controls_lock); > kdamond_call(ctx, true); > damos_walk_cancel(ctx); Can this sequence also cause a spurious -ECANCELED failure when a context is restarted? When restarting a context, kdamond_fn() signals kdamond_started before calling kdamond_init_ctx(): complete(&ctx->kdamond_started); kdamond_init_ctx(ctx); If damon_start() unblocks and immediately invokes damon_call(), couldn't it observe the stale call_controls_obsolete == true from the previous run and return -ECANCELED, since kdamond_init_ctx() has not yet reset the flag? Also, is damos_walk() vulnerable to the same termination race being fixed here for damon_call()? In kdamond_fn(), damos_walk_cancel() processes cancellations before ctx->kdamond is set to NULL. If damos_walk() is called during this window, it checks damon_is_running() (which still returns true), sets the walk control, and blocks on wait_for_completion(). Since cancellations are already processed, will it wait indefinitely? # end of sashiko.dev inline review # review url: https://sashiko.dev/#/patchset/20260326062347.88569-3-sj@kernel.org # # hkml [1] generated a draft of this mail. It can be regenerated # using below command: # # hkml patch sashiko_dev --for_forwarding \ # 20260326062347.88569-3-sj@kernel.org # # [1] https://github.com/sjp38/hackermail Sent using hkml (https://github.com/sjp38/hackermail) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (sashiko review) [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock 2026-03-26 13:43 ` (sashiko review) " SeongJae Park @ 2026-03-26 13:51 ` SeongJae Park 0 siblings, 0 replies; 6+ messages in thread From: SeongJae Park @ 2026-03-26 13:51 UTC (permalink / raw) To: SeongJae Park; +Cc: # 6 . 14 . x, Andrew Morton, damon, linux-kernel, linux-mm On Thu, 26 Mar 2026 06:43:29 -0700 SeongJae Park <sj@kernel.org> wrote: > Forwarding Sashiko review for doing discussions via mails. > > # review url: https://sashiko.dev/#/patchset/20260326062347.88569-3-sj@kernel.org > # start of sashiko.dev inline review > commit b886bf00a72b1462f29533c91c431cd2ba3fa3a6 > Author: SeongJae Park <sj@kernel.org> > mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock > > This commit introduces a call_controls_obsolete flag to protect damon_call() > from racing with the kdamond termination sequence. This replaces the previous > after-registration damon_is_running() check. > > Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org > > > diff --git a/mm/damon/core.c b/mm/damon/core.c > > index db6c67e52d2b8..a2b553e2c5a81 100644 > > --- a/mm/damon/core.c > > +++ b/mm/damon/core.c > [ ... ] > > @@ -1629,10 +1600,12 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control) > > INIT_LIST_HEAD(&control->list); > > > > mutex_lock(&ctx->call_controls_lock); > > + if (ctx->call_controls_obsolete) { > > + mutex_unlock(&ctx->call_controls_lock); > > + return -ECANCELED; > > + } > > list_add_tail(&control->list, &ctx->call_controls); > > mutex_unlock(&ctx->call_controls_lock); > > - if (!damon_is_running(ctx)) > > - return damon_call_handle_inactive_ctx(ctx, control); > > Does this removal cause an indefinite wait if damon_call() is invoked on an > unstarted context? > > For a newly allocated damon_ctx, call_controls_obsolete is zero-initialized > to false. Without the damon_is_running() check, if a caller invokes > damon_call() before damon_start(), it appears the request will be queued and > block on wait_for_completion() indefinitely since no kdamond thread is > running to handle or cancel it. Yes. But it is clearly wrong usage of this function. I will add a comment clarifying this, e.g., "this function shouldn't be called for unstarted DAMON context. In the case, it could be indefinitely sleep." > > > if (control->repeat) > > return 0; > > wait_for_completion(&control->completion); > [ ... ] > > @@ -3062,6 +3038,9 @@ static int kdamond_fn(void *data) > > damon_destroy_targets(ctx); > > > > kfree(ctx->regions_score_histogram); > > + mutex_lock(&ctx->call_controls_lock); > > + ctx->call_controls_obsolete = true; > > + mutex_unlock(&ctx->call_controls_lock); > > kdamond_call(ctx, true); > > damos_walk_cancel(ctx); > > Can this sequence also cause a spurious -ECANCELED failure when a context is > restarted? > > When restarting a context, kdamond_fn() signals kdamond_started before > calling kdamond_init_ctx(): > > complete(&ctx->kdamond_started); > kdamond_init_ctx(ctx); > > If damon_start() unblocks and immediately invokes damon_call(), couldn't it > observe the stale call_controls_obsolete == true from the previous run and > return -ECANCELED, since kdamond_init_ctx() has not yet reset the flag? Good catch, I will do the call_controls_obsolete unset before the complete() call. > > Also, is damos_walk() vulnerable to the same termination race being fixed > here for damon_call()? > > In kdamond_fn(), damos_walk_cancel() processes cancellations before > ctx->kdamond is set to NULL. If damos_walk() is called during this window, > it checks damon_is_running() (which still returns true), sets the walk > control, and blocks on wait_for_completion(). > > Since cancellations are already processed, will it wait indefinitely? Yes. I'm working on it for another patch. Thanks, SJ [...] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (sashiko status) [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock 2026-03-26 6:23 [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock SeongJae Park 2026-03-26 6:23 ` [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock SeongJae Park @ 2026-03-26 13:42 ` SeongJae Park 2026-03-26 14:53 ` SeongJae Park 1 sibling, 1 reply; 6+ messages in thread From: SeongJae Park @ 2026-03-26 13:42 UTC (permalink / raw) To: SeongJae Park; +Cc: # 6 . 14 . x, Andrew Morton, damon, linux-kernel, linux-mm Forwarding sashiko.dev review status for this thread. # review url: https://sashiko.dev/#/patchset/20260326062347.88569-1-sj@kernel.org - [RFC PATCH 1/2] mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails - status: Reviewed - review: No issues found. - [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock - status: Reviewed - review: ISSUES MAY FOUND # hkml [1] generated a draft of this mail. It can be regenerated # using below command: # # hkml patch sashiko_dev --thread_status --for_forwarding \ # 20260326062347.88569-1-sj@kernel.org # # [1] https://github.com/sjp38/hackermail Sent using hkml (https://github.com/sjp38/hackermail) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: (sashiko status) [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock 2026-03-26 13:42 ` (sashiko status) [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock SeongJae Park @ 2026-03-26 14:53 ` SeongJae Park 0 siblings, 0 replies; 6+ messages in thread From: SeongJae Park @ 2026-03-26 14:53 UTC (permalink / raw) To: SeongJae Park; +Cc: # 6 . 14 . x, Andrew Morton, damon, linux-kernel, linux-mm On Thu, 26 Mar 2026 06:42:08 -0700 SeongJae Park <sj@kernel.org> wrote: > Forwarding sashiko.dev review status for this thread. > > # review url: https://sashiko.dev/#/patchset/20260326062347.88569-1-sj@kernel.org > > - [RFC PATCH 1/2] mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails > - status: Reviewed > - review: No issues found. > - [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock > - status: Reviewed > - review: ISSUES MAY FOUND The patches are hotfixes, and have no reason to be one series. I will send the first patch as an individual one without RFC, soon. For the second patch, I will go one more RFC round. Thanks, SJ [...] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-26 14:53 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-26 6:23 [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock SeongJae Park 2026-03-26 6:23 ` [RFC PATCH 2/2] mm/damon/core: fix damon_call() vs kdamond_fn() exit race deadlock SeongJae Park 2026-03-26 13:43 ` (sashiko review) " SeongJae Park 2026-03-26 13:51 ` SeongJae Park 2026-03-26 13:42 ` (sashiko status) [RFC PATCH 0/2] mm/damon: fix damon_call()-related leak and deadlock SeongJae Park 2026-03-26 14:53 ` SeongJae Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox